High DP CPU due to internal loop in PA-5200 series
5860
Created On 06/01/23 07:05 AM - Last Modified 09/12/23 06:35 AM
Symptom
- Data Plane status shows a high CPU utilization; you can use > show running resource-monitor to verify
admin@PA-5250(active)> show running resource-monitor
:Resource monitoring sampling data (per second):
:
:CPU load sampling by group:
:flow_lookup : 100%
:flow_fastpath : 100%
:flow_slowpath : 100%
:flow_forwarding : 100%
:flow_mgmt : 100%
:flow_ctrl : 100%
:nac_result : 100%
:flow_np : 100%
:dfa_result : 100%
:module_internal : 100%
:aho_result : 100%
:zip_result : 100%
:pktlog_forwarding : 99%
:lwm : 0%
:flow_host : 100%
:fpga_result : 0%
:
:CPU load (%) during last 15 seconds:
:core 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
: 0 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
---SNIPPED---
:core 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
: 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
---SNIPPED---
:core 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
: 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 99
---SNIPPED---
:
:Resource utilization (%) during last 15 seconds:
:session:
: 66 66 66 66 66 66 66 66 66 66 66 66 66 66 66
:
:packet buffer:
: 31 32 33 31 31 31 31 31 32 33 31 31 31 32 32
:
:packet descriptor:
: 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
:
:packet descriptor (on-chip):
: 26 26 28 26 25 26 25 25 27 29 25 26 26 28 26
- FIRST SYMPTOM: Unusual values obtained from ingress-backlog output and session state stuck in Opening State, Using Identify Sessions That Use Too Much of the On-Chip Packet Descriptor as a guide.
- Below is the output of ingress-backlog; highlighted is that SESS-ID (session ID) 2965526, which is consuming 37% of the Packet Descriptor's resource with a packet COUNT of 1516 and under SESSION DETAILS both the IGR-IF (ingress interface) and EGR-IF (egress interface) have a value of unknown.
admin@PA-5250(active)> show running resource-monitor ingress-backlogs
-- SLOT: s1, DP: dp0 --
USAGE - ATOMIC: 56% TOTAL: 68%
TOP SESSIONS:
SESS-ID PCT GRP-ID COUNT
2965526 37% 1 1516
3438752 8% 1 332
1306279 3% 1 151
3350962 3% 1 142
SESSION DETAILS
SESS-ID PROTO SZONE SRC SPORT DST DPORT IGR-IF EGR-IF TYPE APP
1306279 17 TP-GP 10.142.106.185 38211 10.250.50.136 5062 unknown unknown FORW undecided
2965526 17 Internet 186.154.32.60 13237 170.80.96.17 4501 unknown unknown FORW undecided
3350962 17 TP-GP 10.142.78.209 38211 10.250.50.136 5062 unknown unknown FORW undecided
3438752 17 TP-GP 10.159.155.106 38211 10.250.50.137 5062 unknown unknown FORW undecided
-- SLOT: s1, DP: dp1 --
USAGE - ATOMIC: 2% TOTAL: 9%
- From the output of show session id for details, take note of the highlighted state of OPENING, which can get stuck in the time range of 60 seconds and above; you'll have to take note of the start time and your current time when you executed the command to determine the loop duration since when did the packet arrive and processed by the First Packet Processor.
admin@PA-5250(active)> show session id 2965526
Session 2965526
c2s flow:
source: 186.154.32.60 [Internet]
dst: 170.80.96.17
proto: 17
sport: 13237 dport: 4501
state: OPENING type: FORW
src user: unknown
dst user: unknown
Slot : 1
DP : 0
index(local): : 2965526
start time : Wed Dec 21 11:57:52 2022
timeout : 10 sec
time to live : 10 sec
total byte count(c2s) : 0
total byte count(s2c) : 0
layer7 packet count(c2s) : 0
layer7 packet count(s2c) : 0
vsys : vsys1
application : undecided
application db : 0
app.id : c2s node (0, 0) s2s node (0, 0)
session to be logged at end : False
session in session ager : True
session updated by HA peer : False
end-reason : unknown
- SECOND SYMPTOM: The offending packet will survive the deletion of the session.
- Deleting the offending traffic from PAN-OS doesn't alleviate the issue
admin@PA-5250(active)> clear session id 2965526
session 2965526 cleared
- The connection will re-spawn in the ingress-backlog having the exact same tuple but with a different session ID, even if the source stopped sending the traffic.
admin@PA-5250(active)> show running resource-monitor ingress-backlogs
-- SLOT: s1, DP: dp0 --
USAGE - ATOMIC: 75% TOTAL: 89%
TOP SESSIONS:
SESS-ID PCT GRP-ID COUNT
1296096 42% 1 1727
3350962 12% 1 505
3438752 10% 1 411
1306279 3% 1 148
SESSION DETAILS
SESS-ID PROTO SZONE SRC SPORT DST DPORT IGR-IF EGR-IF TYPE APP
1296096 17 Internet 186.154.32.60 13237 170.80.96.17 4501 unknown unknown FORW undecided
1306279 17 TP-GP 10.142.106.185 38211 10.250.50.136 5062 unknown unknown FORW undecided
3350962 17 TP-GP 10.142.78.209 38211 10.250.50.136 5062 unknown unknown FORW undecided
3438752 17 TP-GP 10.159.155.106 38211 10.250.50.137 5062 unknown unknown FORW undecided
-- SLOT: s1, DP: dp1 --
USAGE - ATOMIC: 2% TOTAL: 9%
Environment
- PA-5200 series
- PAN-OS versions 10.1.7, 10.1.8
- High-Availability setup is Active-Passive
- High-Availability setup is Active-Active
Cause
First Packet Processor (FPP) and Data Plane (DP) sessions are out of sync, FPP has flow entry, and DP doesn't have the active flow session.
Resolution
- Due to PAN-210327 upgrade the PAN-OS to versions 11.0.1+, 10.2.4+,10.1.9-h1+
- Workarounds
- In an Active-Passive setup, failover
- Rebooting the firewall will delete the looping packet
Additional Information
- PA-5200's DP CPU processing is running at 100% even with no traffic flow
- PA-5200's DP CPU processing is running at 100% even with Data Plane ports removed
- PA-5200's DP CPU processing runs at 100% with an incorrect and high packet rate. Comparing the packet rate on the switch port versus the firewall's packet rate, the latter provides a higher value.