Troubleshooting IPsec VPN Tunnel Instability: Rekeying Race Condition
Symptom
The VPN tunnel disconnects and following traces related to Security Association (SA) management will be seen on PA FW ikemgr.log, specifically:
[PWRN]: { <ID> }: can't find sa for proto ESP spi <SPI_VALUE>
This indicates the firewall received an encrypted ESP packet but could not locate the corresponding SA, often because the SA was just deleted or is in the process of being deleted.
Log Trace Example:
The following ikemgr.log entries demonstrate the race condition:
2025-07-06 03:32:18.287 -0400 [INFO]: { 7: }: ikev2_request_initiator_start: SA state ESTABLISHED type 3 caller ikev2_child_delete
2025-07-06 03:32:18.287 -0400 [INFO]: { 7: }: IKEv2 INFO transmit: gateway awspanw-panwfwx, message_id: 0x00000344, type 3 SA state ESTABLISHED
2025-07-06 03:32:18.287 -0400 [PNTF]: { 7: 60}: ====> IPSEC KEY DELETED; tunnel awspanw-panwfwx:panwon_proxy <==== <<< PA-FW initiates deletion of SA (SPI D6DDF300)
====> Deleted SA: 10.18.152.5[4500]-192.104.67.227[4500] SPI:0xD3XXXD45/0xD6XXX300 <====
2025-07-06 03:32:18.287 -0400 [INFO]: { 7: 60}: SADB_DELETE proto=255 src=192.168.47.227[0] dst=10.10.132.5[0] ESP spi=0xD3XXXD45 <<< SA Database updated locally.
2025-07-06 03:32:18.319 -0400 [INFO]: { 7: }: received DELETE payload, protocol ESP, num of SPI: 1 IKE SA state ESTABLISHED <<< Remote peer sends DELETE payload for the same SA.
2025-07-06 03:32:18.319 -0400 [INFO]: { 7: }: delete proto ESP spi 0xD6XXX300
2025-07-06 03:32:18.319 -0400 [PWRN]: { 7: }: can't find sa for proto ESP spi 0xD6XXX300 <<< Since PA-FW already deleted the SA, it cannot find it, resulting in an error.
Environment
This issue commonly arises in IPsec VPN tunnels where:
-
Both VPN peers (Palo Alto Networks firewall and remote device) are configured with identical Phase 1 (IKE SA) and Phase 2 (IPsec SA) lifetimes.
-
For example, both peers might have Phase 1 lifetime as 24 hours and Phase 2 lifetime as 8 hours.
-
Phase 1 negotiation is successful, but Phase 2 experiences failure due to this "race condition".
Cause
The root cause is a race condition during the Phase 2 (IPsec SA) rekeying process. When both VPN peers have identical Phase 2 lifetimes, they often attempt to initiate rekeying or delete the old SA at nearly the same moment. This synchronization overlap causes a conflict:
-
One peer (e.g., the PA-FW) initiates the deletion of its current Phase 2 SA.
-
Almost simultaneously, the other peer sends an explicit DELETE payload for the same SA.
-
The receiving firewall (PA-FW in the example) then attempts to process a deletion request for an SA it has already (or is in the process of) removing, leading to the "can't find SA" error and tunnel disruption.
Resolution
To mitigate this rekeying race condition and stabilize the IPsec VPN tunnel, the primary solution is to stagger the Phase 2 (IPsec SA) lifetimes between the two VPN peers.
-
Adjust Phase 2 Lifetime on One Peer:
-
Modify the Phase 2 IPsec Crypto Profile lifetime on either the Palo Alto Networks firewall or the remote VPN peer.
-
Goal: Ensure the Phase 2 lifetime on one side is slightly different from the other.
-
Example: If the remote peer's Phase 2 lifetime is set to 8 hours (28,800 seconds), set the Phase 2 lifetime on your Palo Alto Networks firewall to 7 hours and 55 minutes (28,500 seconds) or 8 hours and 5 minutes (29,100 seconds). A difference of 60 to 300 seconds (1-5 minutes) is typically sufficient.
-
Why it works: This ensures one peer initiates the rekeying process slightly before the other, allowing a smooth transition to a new SA without simultaneous deletions.
Steps on Palo Alto Networks Firewall:
-
Navigate to Network > Network Profiles > IPSec Crypto.
-
Edit the IPSec Crypto Profile associated with the problematic VPN tunnel.
-
In the Lifetime section, adjust the
Seconds(orMinutes/Hours) value. -
Commit the changes to the firewall.
-
-
Clear SAs and Monitor:
-
After adjusting the lifetime, clear the existing IKE and IPsec Security Associations on both VPN peers (if possible) to force a fresh negotiation.
-
Palo Alto CLI:
clear vpn ike-sa gateway <your-ike-gateway-name> clear vpn ipsec-sa tunnel <your-ipsec-tunnel-name>
-
-
Initiate "interesting traffic" over the VPN tunnel to trigger its re-establishment.
-
Monitor the tunnel status in the GUI (
Network > IPSec Tunnels) and relevant logs (Monitor > Logs > System, filtered forvpn,ike,ipsec) to confirm stability and successful rekeying without errors.
-
Note: Toggling the tunnel from one endpoint (clear vpn commands) can temporarily re-negotiate the tunnel. However, without addressing the lifetime mismatch, the race condition will recur when the identical lifetimes expire again. Therefore, lifetime staggering is the recommended long-term solution.