High-Availability - Split Brain

High-Availability - Split Brain

23729
Created On 04/28/22 17:26 PM - Last Modified 08/23/23 22:35 PM


Symptom


  • Detection of split brain between the firewalls in HA setup:
    • For A/P setup both firewalls have the active status.
    • For A/A setup both firewalls have the active-primary status.


Environment


  • PAN-OS


Cause


Could be one of the following:
  • HA1 link is down and there is no HA1 backup link configured.
  • Both HA1 link and HA1 backup links are down.
  • Switch/router connecting HA1 (HA1 backup) link is down or not passing the heartbeat message between the firewalls in HA setup.
  • HA1 link is up but the active (active-primary) firewall management resources are very busy or firewall not functioning properly that it fails to send and/or process the heartbeat messages via HA1 link.
  • HA1 link is up but the passive (active-secondary) firewall management resources are very busy or firewall not functioning properly that it fails to process or respond to the heartbeat messages sent by the peer firewall via HA1 link.
  • If encryption on the HA1 link ( HA1 backup link) is not setup properly or its configuration needs to be validated or updated.


Resolution


  1. Login to the backup firewall which in other normal circumstances would be in passive state in the A/P setup (active-secondary in A/A setup). 
  2. Suspend your backup firewall to stabilize your production network before proceeding to troubleshoot
    1. Device > High Availability > Operational Commands and by clicking on "Suspend local device for high availability".
  3. Login to both HA devices and check the High Availability widget under Device > Dashboard to validate which HA link is down.
  4. If HA1 link is down and there is no HA1 backup link configured see below. (Otherwise skip to #5.)
    1. Troubleshoot using steps listed in KB "High-Availability - HA links status"
    2. Configure a HA1 backup link. refer to HA Ports on Palo Alto Networks FirewallsHA Links and Backup Links and HA General Settings .
  5. If both HA1 link and HA1 backup links are down, then troubleshoot HA links down using steps mentioned in KB article "High-Availability - HA links status".
  6. If the switch/router connecting HA1 (HA1 backup) control links that are dataplane ports or dedicated ports is down or not passing the heartbeat message between the firewalls in HA setup, then a possible temporary or permanent solution to bring back the control communication between the peers in HA is either to:
    1. Use the management port for HA1 link or HA1 backup link.
      1. Go under Device > High Availability > HA Communications
      2. Then Control Links > HA1 or Control Links > HA1 Backup
      3. Select management ( if not already selected) for the relative port.
      4. This solution is only possible if a firewall doesn't have a dedicated auxiliary (AUX) HA port.
    1. or Enable the HA Heartbeat Backup :
      1. Go under Device > High Availability > General
      2. Then Election Settings 
      3. Click the Heartbeat Backup checkbox.
      4. This solution is possible in both cases of HA1 control links using in-band ports (i.e. dataplane ports) or dedicated auxiliary (AUX) HA ports.
    2. Connect directly the HA1 ports (similarly HA1 backup ports) to each other.
      1. If HA1 link is up but the active (active-primary) firewall management resources are very busy, or firewall is not functioning properly, that it fails to send and/or process the heartbeat messages via HA1 link; Or if HA1 link is up but the passive (active-secondary) firewall management resources are very busy or firewall not functioning properly that it fails to process or respond to the heartbeat messages sent by the peer firewall via HA1 link, then check the firewall(s) resources use KB articles Tips & Tricks: Reducing Management Plane Load and Tips & Tricks: Reducing Management Plane Load—Part 2 to identify the reason for high management resource utilization and reduce the load on the management plane.
      2. If the encryption on the HA1 link is enabled and the above other reasons of which HA1 link is showing down have been checked consider disabling the encryption on the HA1 link.
      3. If having issue with the encryption keys or simply want to renew them, then use the recommendation in KB article HOW TO ENABLE ENCRYPTION ON HA1 IN HIGH AVAILABILITY.
      Notes for 6.a
      • or is an "exclusive or".
      • management can be selected for either HA1 link or HA1 back-up link but not for both since the IP addresses of the main and backup HA1 link must not overlap each other.
      • High-availability HA1 IP address is not allowed to be in the same subnet as the device management port for firewalls that have dedicated or auxiliary (AUX) HA links.


      Additional Information


      For more information check KB articles:

      DotW: What is Peer-Split-Brain?
      How to setup AUX Port as high-availability port.
      Commit failed with error : High-availability ha1-backup interface configuration requires a peer-ip-backup address to be configured(Module: ha_agent).
      Heartbeat Backup is Enabled on Both Devices but Status is Showing Down.
      How To Avoid HA Split-Brain due to Missed Heartbeats.



      Actions
      • Print
      • Copy Link

        https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u0000004OPJCA2&lang=en_US&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail

      Choose Language