HA Failover Hold Timers

HA Failover Hold Timers

36969
Created On 09/26/18 13:53 PM - Last Modified 06/05/23 20:26 PM


Resolution


Issue

After the failover of one of the devices in a HA active/passive cluster, the newly active device does not go down even if one of the monitoring interfaces goes down for a minute.

 

Resolution

The one minute "monitor hold timer" just after failover, is a pre-set timer to prevent unnecessary fail over flaps. After a fail over, the process will not allow another failover if it detects the traffic link down within the one minute timer limit. A link down after the timer expires will subsequently cause a failover.   This timer is not configurable.

 

In the following scenario, ethernet1/2 is disconnected at 21:53:10 once after the device became Active at 21:53:00.
But the link down was not detected due to the  monitor hold timer. At 21:54:00, the link-monitor detected an interface down at the same time the monitor hold timer ends.

 

- ha_agent.log
Nov 21 21:53:00 HA Group 15: Moved from state Passive to state Active  <--- this box became active!!

Nov 21 21:53:00 ha_sysd_dev_state_update(ha_sysd.c:1402): Set dev state to Active

Nov 21 21:53:00 ha_state_start_preemption_hold(ha_state.c:1705): Group 15: no need for preemption waiting

Nov 21 21:53:00 ha_state_start_monitor_hold(ha_state.c:940): Starting monitor hold for group 15; linkmon not monitored   <---- monitor hold timer started!!!

 

<-- around 21:53:10 ethernet1/2 went down for flapping, but it's not detected due to monitor hold timer.

Nov 21 21:54:00 ha_state_monitor_hold_callback(ha_state.c:1539): Group 15: ending monitor hold  <--- ending monitor hold timer!!!

Nov 21 21:54:00 Warning: ha_event_log(ha_event.c:47): HA Group 15: Link group 'VW-monitor' link 'ethernet1/2' is down

Nov 21 21:54:00 Warning: ha_event_log(ha_event.c:47): HA Group 15: Link group 'VW-monitor' failure; one or more links are down
<-- Link monitor (VW-monitor) detected link down just after monitor hold timer.

Nov 21 21:54:00 ha_state_transition(ha_state.c:982): Group 15: transition to state Non-Functional

 

Nov 21 21:54:30 ha_state_start_nonfunc_hold(ha_state.c:2021): Starting NonFunc holdtime for group 15
<--- then "monitor fail hold timer" started!!!

 

Another NonFunc timer is known as the "monitor fail hold timer".
It is the amount of time for a device to stay in a non-functional state after after a downgrade from an active state.

 

CLI command:

# set deviceconfig high-availability group xx mode active-passive monitor-fail-hold-down-time

  <value>  <1-60> Interval in minutes to stay in non-functional state following a link/path monitor failure, default 1

 

owner:  yogihara



Actions
  • Print
  • Copy Link

    https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000Cly2CAC&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail

Choose Language