HA split-brain after upgrading from PAN-OS 10.2.2 to 10.2.3 or above

HA split-brain after upgrading from PAN-OS 10.2.2 to 10.2.3 or above

7785
Created On 02/17/23 23:21 PM - Last Modified 06/26/25 19:58 PM


Symptom


  • In PAN-OS 10.2.2 and earlier there was an issue where the HA1-A and HA1-B ports seemed to be reversed so that
  • If you disconnected the HA1-A cable, then the HA1-B port shows as being down:
> show high-availability all

Interface: ha1-b        
      Link State: Down; Setting: 1Gb/s-full  <<<<<<<<<<<< Down
  • If you disconnected the HA1-B cable, then the HA1-A port shows as being down:
> show high-availability all

Interface: ha1-a       
      Link State: Down; Setting: 1Gb/s-full   <<<<<<<<<<<<  Down
  • The fix for the HA1 port mappings made in PAN-OS 10.2.3 may cause HA split-brain 
>-->2023-01-25 12:45:39  System   System restart requested by admin
>-->2023-01-26 10:54:22  SWM      Installed panos 10.2.3-h2
>   2023-01-26 11:06:37  HA       state transit to Non-Functional
>   2023-01-26 11:06:42  HA       state transit to Active <=== split brain


Environment


  • PAN-OS 10.2.3
  • PA-5410, PA-5420, PA-5430
  • High-Availability configured between Next Gen Firewall


Cause


  • The fix introduced in PanOS 10.2.3 to fix the HA1 port mismatch can cause the HA split-brain scenario due to the HA1 connection going down when the firewall is upgraded to PanOS 10.2.3
ha_agent.log 2023-02-09 22:08:57 2023-02-09 22:08:57.052 -0800 Group 1 (HA1-MAIN): peer is closing connection
ha_agent.log 2023-02-09 22:08:57 2023-02-09 22:08:57.052 -0800 Error: ha_peer_callback(src/ha_peer.c:3003): Group 1 (HA1-MAIN): Connection lost to peer
ha_agent.log 2023-02-09 22:08:57 2023-02-09 22:08:57.053 -0800 debug: ha_peer_recv_error(src/ha_peer.c:5781): Group 1 (HA1-BKUP): Receiving error message




Resolution


Workaround: The following one-time workaround can be used to resolve the issue:
  1. Suspend the HA passive to avoid any HA issues while making changes to the HA configuration. Go to Device > High Availability > Operational Commands  > Suspend local device
  2. Configure one data port as “HA” on both active and passive FW
image.png
  1. Go to the HA setting to set up an ip address for HA communication. Make sure the ip address with a route reachable to each other. 
image.png
  1. Commit the config change and verify on the dashboard that the HA1 connection is up.
  2. Verify via the system log the HA1 connection is stable
  3. Make the passive firewall functional by going to Device > High Availability > Operational Commands  > Make Local Device Functional
  4. Complete the upgrade to PanOS 10.2.3 as per Upgrade an HA firewall pair
  5. After upgrading both firewalls, suspend the passive firewall again by going to Device > High Availability > Operational Commands  > Suspend local device
  6. Reconfigure the HA1 dedicated links on both firewalls as Configure HA Settings
  7. Commit the changes and verify that the HA links are stable again.
  8. Make the passive device active again by going to Device > High Availability > Operational Commands  > Make Local Device Functional
  9. Both FWs should be on PanOS10.2.3 and back on the dedicated HA links. 
 


Actions
  • Print
  • Copy Link

    https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000kH19CAE&lang=en_US&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail

Choose Language