HA split-brain after upgrading from PAN-OS 10.2.2 to 10.2.3 or above
7785
Created On 02/17/23 23:21 PM - Last Modified 06/26/25 19:58 PM
Symptom
- In PAN-OS 10.2.2 and earlier there was an issue where the HA1-A and HA1-B ports seemed to be reversed so that
- If you disconnected the HA1-A cable, then the HA1-B port shows as being down:
> show high-availability all
Interface: ha1-b
Link State: Down; Setting: 1Gb/s-full <<<<<<<<<<<< Down
- If you disconnected the HA1-B cable, then the HA1-A port shows as being down:
> show high-availability all
Interface: ha1-a
Link State: Down; Setting: 1Gb/s-full <<<<<<<<<<<< Down
- The fix for the HA1 port mappings made in PAN-OS 10.2.3 may cause HA split-brain
>-->2023-01-25 12:45:39 System System restart requested by admin >-->2023-01-26 10:54:22 SWM Installed panos 10.2.3-h2 > 2023-01-26 11:06:37 HA state transit to Non-Functional > 2023-01-26 11:06:42 HA state transit to Active <=== split brain
Environment
- PAN-OS 10.2.3
- PA-5410, PA-5420, PA-5430
- High-Availability configured between Next Gen Firewall
Cause
- The fix introduced in PanOS 10.2.3 to fix the HA1 port mismatch can cause the HA split-brain scenario due to the HA1 connection going down when the firewall is upgraded to PanOS 10.2.3
ha_agent.log 2023-02-09 22:08:57 2023-02-09 22:08:57.052 -0800 Group 1 (HA1-MAIN): peer is closing connection ha_agent.log 2023-02-09 22:08:57 2023-02-09 22:08:57.052 -0800 Error: ha_peer_callback(src/ha_peer.c:3003): Group 1 (HA1-MAIN): Connection lost to peer ha_agent.log 2023-02-09 22:08:57 2023-02-09 22:08:57.053 -0800 debug: ha_peer_recv_error(src/ha_peer.c:5781): Group 1 (HA1-BKUP): Receiving error message
Resolution
Workaround: The following one-time workaround can be used to resolve the issue:
- Suspend the HA passive to avoid any HA issues while making changes to the HA configuration. Go to Device > High Availability > Operational Commands > Suspend local device
- Configure one data port as “HA” on both active and passive FW
- Go to the HA setting to set up an ip address for HA communication. Make sure the ip address with a route reachable to each other.
- Commit the config change and verify on the dashboard that the HA1 connection is up.
- Verify via the system log the HA1 connection is stable
- Make the passive firewall functional by going to Device > High Availability > Operational Commands > Make Local Device Functional
- Complete the upgrade to PanOS 10.2.3 as per Upgrade an HA firewall pair
- After upgrading both firewalls, suspend the passive firewall again by going to Device > High Availability > Operational Commands > Suspend local device
- Reconfigure the HA1 dedicated links on both firewalls as Configure HA Settings
- Commit the changes and verify that the HA links are stable again.
- Make the passive device active again by going to Device > High Availability > Operational Commands > Make Local Device Functional
- Both FWs should be on PanOS10.2.3 and back on the dedicated HA links.