Traffic drops for 120 seconds during BGP link failover with BFD and Graceful Restart enabled

Traffic drops for 120 seconds during BGP link failover with BFD and Graceful Restart enabled

1782
Created On 02/23/26 18:51 PM - Last Modified 02/23/26 19:10 PM


Symptom


  • Experiencing a ~2-minute (120 seconds) traffic drop or blackhole when a physical link or interface between the firewall and a BGP peer (e.g., downstream router) goes down.

  • Outbound traffic from the downstream router fails over instantly to a secondary path.

  • Return traffic from the firewall is dropped for the duration of the Graceful Restart timer.

  • BFD (Bidirectional Forwarding Detection) is configured and tears down the BGP session immediately, but sub-second routing failover is not achieved.



Environment


  • Palo Alto Networks NGFW

  • PAN-OS

  • BGP (Border Gateway Protocol)

  • BFD (Bidirectional Forwarding Detection) enabled on the peering

  • BGP Graceful Restart (GR) enabled on both peers



Cause


This issue is caused by a design conflict between the fast-failover mechanism of BFD and the stability-focused mechanism of BGP Graceful Restart.

When the link fails, BFD detects the failure in milliseconds and signals BGP to tear down the session. Because Graceful Restart is negotiated between the peers, the firewall interprets the BFD session drop as a temporary control-plane restart of the peer router.

The firewall immediately enters Graceful Restart Helper Mode. It deliberately retains the stale BGP routes pointing to the down interface for the configured Graceful Restart timer (default 120 seconds) to prevent routing churn. While the downstream router instantly reroutes outbound traffic, the firewall forwards return traffic back to the dead interface until the timer expires, causing an asymmetric traffic blackhole.



Resolution


To achieve sub-second failover during a physical link failure, BFD and BGP Graceful Restart should not be used together for the same peering without adjusting the helper mode.

Implement one of the following configuration changes:

Option 1: Adjust Graceful Restart Timers (Recommended) You can achieve a 1 second failover by lowering the peer-facing timers to 1 second. This forces the firewall to purge stale routes in 1 second when a peer drops, while still protecting the firewall during its own local control-plane restarts (e.g., during a PAN-OS upgrade).

  1. Log in to the Palo Alto Networks firewall web interface.

  2. Navigate to Network > Virtual Routers.

  3. Click the name of the Virtual Router handling the BGP peering.

  4. On the left pane, click BGP, then select the Advanced tab at the top of the BGP configuration window.

  5. Locate the Graceful Restart section.

  6. Ensure the main Enable checkbox is checked (to keep Graceful Restart active).

  7. Set the Local Restart Time to 120 (Leave as default. This dictates how long peers will wait for the firewall to restart).

  8. Set the Max Peer Restart Time to 1 (This forces the firewall to only wait 1 second for the peer's session to return before giving up).

  9. Set the Stale Route Time to 1 (This forces the firewall to purge stale routes 1 second after they are marked stale).

  10. Click OK, and then Commit the changes.

(Note: Because BGP Graceful Restart capabilities are negotiated during the initial BGP OPEN message, making this change may require the BGP session to flap/reset once to negotiate the new capabilities and take effect.)

Option 2: Disable Graceful Restart Entirely for the Peering If sub-second failover is the only priority and maintaining traffic forwarding during a control-plane reboot is not required, disable BGP Graceful Restart completely on the specific BGP Peer Group. Once disabled, BFD will have sole authority to tear down the session. Routes will be flushed instantly upon a link down event.

  1. Log in to the Palo Alto Networks firewall web interface.

  2. Navigate to Network > Virtual Routers.

  3. Click on the name of the Virtual Router handling the BGP peering.

  4. On the left pane, click BGP, then select Peer Group.

  5. Click on the specific Peer Group (or the individual Peer) connecting to the router.

  6. Navigate to the Advanced tab within the Peer or Peer Group configuration window.

  7. Locate the Graceful Restart settings.

  8. Uncheck the Enable box for Graceful Restart to disable it entirely for this specific peer/group. (Note: If it is currently inheriting the setting, you may need to override the inheritance first).

  9. Click OK, and then Commit the changes.



Additional Information


Increasing BFD timers (Desired Minimum Tx Interval and Detection Time Multiplier) will not resolve this specific issue, as the firewall will still enter Graceful Restart Helper Mode once the delayed BFD session drop occurs.



Actions
  • Print
  • Copy Link

    https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA1Ki000000sYKmKAM&lang=en_US&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail