Palo Alto Networks Knowledgebase: DotW: What is Peer-Split-Brain?

DotW: What is Peer-Split-Brain?

10547
Created On 07/29/19 17:24 PM - Last Updated 07/29/19 17:52 PM
Resolution

I'm sure a lot of you have encountered this problem.  Seeing a particular error message but it's not clear what exactly it means.  It's especially confusing when you see a rather concerning message but you are not experiencing any issues.  This is exactly what happened to community member @zewwy a while ago.  The discussion was reactivated last week because new users were having similar and/or follow-up questions on the topic :

 

Screen Shot 2016-08-29 at 14.07.38.pngDiscussion of the Week

 

First of all let's explain what 'Split brain' is exactly.  

 

Split brain conditions occur when HA members can no longer communicate with each other to exchange HA monitoring information. Each HA member will assume the other member is in a non-functional state and will take over as the Active (A/P) or Active-Primary (A/A).

 

Split brain conditions can be prevented by configuring an HA1 Backup link and/or enabling Heartbeat Backup.

 

User @mivaldi gave a great explanation and added that the reason for a HA1 link failure is not limited to physical problems.  

It can also happen if the ha_agent process is busy and can't process HA1 functions. In that case, it's useful to have the backup be Heartbeat Backup through the MGMT port, since the Heartbeat function sends out ICMP probes and these are processed by the system kernel, and not the ha_agent process. 

 

The recommended configuration for the HA control link connection is to use the dedicated HA1 link between the two devices and use the management port as the Control Link (HA Backup) interface. In this scenario, you do not need to enable the Heartbeat Backup option in the Elections Settings page. If you are using a physical HA1 port for the Control Link HA link and a data port for Control Link (HA Backup), it is recommended to enable the Heartbeat Backup option in the Election Settings Page.  An example of the first scenario can be seen in the screenshot below:

 

Screen Shot 2016-08-29 at 15.06.46.pngHA example

 

For devices that do not have a dedicated HA port, such as the PA-200 firewall, you should configure the management port for the Control Link HA connection and have a data port interface configured with type HA for the Control Link HA1 Backup connection. Since in this case the management port is being used, there is no need to enable the Heartbeat Backup option in the Elections Settings page because the heartbeat backups will already occur through the management interface connection.
 
On the VM-Series firewall in AWS, the management port is used as the HA1 link.
 
When using a data port for the HA control link, you should be aware that since the control messages have to communicate from the dataplane to the management plane, if a failure occurs in the dataplane, HA control link information cannot communicate between devices and a failover will occur. It is best to use the dedicated HA ports, or on devices that do not have a dedicated HA port, use the management port.

 

The discussion was reactivated by user @TranceforLife:  What happens if HA1 communication is restored ?

 

Our very own @jdelio answered: 

If you want to control which member becomes active first, you have to configure the Preemtive settings in Device > High Availability > General > Election Settings

 

Screen Shot 2016-08-29 at 14.44.18.pngPreemption

 

We already have a great set of documents explaining everything you'll need to know on HA and how to configure it.  Make sure to check these out if you have further questions.

 

High Availability Synchronization

High Availability Failover Optimization

How to Configure High Availability on PAN-OS

 

You can follow the complete discussion here :

https://live.paloaltonetworks.com/t5/General-Topics/What-is-Peer-Split-Brain/m-p/19825#U19825



Attachments
Actions
  • Print
  • Copy Link

    https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000ClSuCAK&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail

Attachments
Choose Language