Delay In HA Failover When Active Firewall Goes Non-Functional

Delay In HA Failover When Active Firewall Goes Non-Functional

6044
Created On 09/28/22 21:27 PM - Last Modified 04/02/24 22:59 PM


Symptom


  • After the Primary-Active firewall has gone to non-functional, there's a traffic outage for 5-10min until the Primary boots up again.
  • Secondary-Passive firewall show HA status changed to Active but the traffic outage still exists.
  • System logs (show log system)  on both firewalls indicate the message "Ignoring session synchronization due to HA2-unavailable".
  • After the HA2 links on both HA peers are Up, session synchronization will complete and the traffic will start passing successfully through the Secondary-Active firewall.
  • There are no logs related to keep-alive in the System logs.
  • 'ha_agent' (less mp-log ha_agent.log) logs indicate that keep-alive setting is turned off.
System logs:
info     ha             session 0  HA Group 1: Completed session synchronization with peer
info     ha             session 0  HA Group 1: Starting session synchronization with peer on slots 1 
info     ha             ha2-lin 0  HA2 peer link up
info     ha             ha2-lin 0  HA2 link up
info     port    HA2    link-ch 0  Port HA2: Up   40Gb/s-full duplex
high     ha             session 0  HA Group 1: Ignoring session synchronization due to HA2-unavailable
critical ha             ha2-lin 0  All HA2 links down
critical ha             ha2-lin 0  HA2 link down
info     port    HA2    link-ch 0  Port HA2: Down 40Gb/s-full duplex
critical ha             ha2-lin 0  All HA2 links down
high     ha             session 0  HA Group 1: Ignoring session synchronization due to HA2-unavailable
high     ha             ha2-lin 0  HA2 peer link down
high     ha             state-c 0  HA Group 1: Moved from state Passive to state Active 
ha_agent logs
0700 debug: ha_dpmon_peer_action_set(src/ha_dpmon.c:155): Setting peer keep-alive setting to off. <<----

 



Environment


  • Palo Alto Networks firewalls 
  • PAN-OS (All)
  • High Availability Active/Passive


Cause


  • HA2 keep-alive is not enabled on HA2 link.
  • HA2 keep-alive is a mechanism to validate the health of the HA state synchronization path (HA2).


Resolution


  1. Enable HA2 Keep-alive on HA2 link. 
    • When enabled, the peers will use keep-alive messages to monitor the HA2 connection to detect a failure based on the Threshold set (default is 10,000 ms).
    • Once enabled, the HA2 Keep-alive recovery Action will be taken. 
  2. HA2 keep-alive option can be configured on both firewalls, or just one firewall in the HA pair.
    • If the option is only enabled on one firewall, only that firewall will send the keep-alive messages. The other firewall will be notified if a failure occurs.
  3. To enable go to GUI Device > High Availability > General, edit the Data Link (HA2) section.
  4. To check HA2 keep-alive settings, use > show high-availability ha2_keepalive


Additional Information


Please refer the following articles for better understanding of HA2-keep-alive feature:

Actions
  • Print
  • Copy Link

    https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000sZINCA2&lang=en_US&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail