High DP CPU due to internal loop in PA-5200 series

High DP CPU due to internal loop in PA-5200 series

5860
Created On 06/01/23 07:05 AM - Last Modified 09/12/23 06:35 AM


Symptom


 
  • Data Plane status shows a high CPU utilization; you can use > show running resource-monitor to verify
admin@PA-5250(active)> show running resource-monitor
:Resource monitoring sampling data (per second):
:
:CPU load sampling by group:
:flow_lookup                    :   100%
:flow_fastpath                  :   100%
:flow_slowpath                  :   100%
:flow_forwarding                :   100%
:flow_mgmt                      :   100%
:flow_ctrl                      :   100%
:nac_result                     :   100%
:flow_np                        :   100%
:dfa_result                     :   100%
:module_internal                :   100%
:aho_result                     :   100%
:zip_result                     :   100%
:pktlog_forwarding              :    99%
:lwm                            :     0%
:flow_host                      :   100%
:fpga_result                    :     0%
:
:CPU load (%) during last 15 seconds: 
:core   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15
:       0 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
---SNIPPED---
:core  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31
:     100 100 100 100 100 100 100 100 100 100 100 100 100 100 100 100
---SNIPPED---
:core  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47
:     100 100 100 100 100 100 100 100 100 100 100 100 100 100 100  99
---SNIPPED---
:
:Resource utilization (%) during last 15 seconds: 
:session:
: 66  66  66  66  66  66  66  66  66  66  66  66  66  66  66 
:
:packet buffer:
: 31  32  33  31  31  31  31  31  32  33  31  31  31  32  32 
:
:packet descriptor:
:  1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
:
:packet descriptor (on-chip):
: 26  26  28  26  25  26  25  25  27  29  25  26  26  28  26   
  • FIRST SYMPTOM: Unusual values obtained from ingress-backlog output and session state stuck in Opening State, Using Identify Sessions That Use Too Much of the On-Chip Packet Descriptor as a guide.
    1. Below is the output of ingress-backlog; highlighted is that SESS-ID (session ID) 2965526, which is consuming 37% of the Packet Descriptor's resource with a packet COUNT of 1516 and under SESSION DETAILS both the IGR-IF (ingress interface) and EGR-IF (egress interface) have a value of unknown.
admin@PA-5250(active)> show running resource-monitor ingress-backlogs

-- SLOT: s1, DP: dp0 --
USAGE - ATOMIC: 56% TOTAL: 68%

TOP SESSIONS:
SESS-ID        PCT      GRP-ID    COUNT
2965526        37%      1         1516
3438752        8%       1         332
1306279        3%       1         151
3350962        3%       1         142

SESSION DETAILS
SESS-ID        PROTO  SZONE      SRC              SPORT   DST             DPORT   IGR-IF      EGR-IF   TYPE    APP
1306279        17     TP-GP      10.142.106.185   38211   10.250.50.136   5062    unknown     unknown  FORW    undecided
2965526        17     Internet   186.154.32.60    13237   170.80.96.17    4501    unknown     unknown  FORW    undecided
3350962        17     TP-GP      10.142.78.209    38211   10.250.50.136   5062    unknown     unknown  FORW    undecided
3438752        17     TP-GP      10.159.155.106   38211   10.250.50.137   5062    unknown     unknown  FORW    undecided

-- SLOT: s1, DP: dp1 --
USAGE - ATOMIC: 2% TOTAL: 9%
 
  1. From the output of show session id for details, take note of the highlighted state of OPENING, which can get stuck in the time range of 60 seconds and above; you'll have to take note of the start time and your current time when you executed the command to determine the loop duration since when did the packet arrive and processed by the First Packet Processor.
 
admin@PA-5250(active)> show session id 2965526
Session                  2965526
                c2s flow:
                                source:        186.154.32.60 [Internet]
                                dst:           170.80.96.17
                                proto:         17
                                sport:         13237                    dport:            4501
                                state:         OPENING                  type:              FORW
                                src user:      unknown
                                dst user:      unknown

                Slot                                      : 1
                DP                                        : 0
                index(local):                             : 2965526
                start time                                : Wed Dec 21 11:57:52 2022
                timeout                                   : 10 sec
                time to live                              : 10 sec
                total byte count(c2s)                     : 0
                total byte count(s2c)                     : 0
                layer7 packet count(c2s)                  : 0
                layer7 packet count(s2c)                  : 0
                vsys                                      : vsys1
                application                               : undecided
                application db                            : 0
                app.id                                    : c2s node (0, 0)        s2s node (0, 0)
                session to be logged at end               : False
                session in session ager                   : True
                session updated by HA peer                : False
                end-reason                                : unknown

 
  • SECOND SYMPTOM: The offending packet will survive the deletion of the session.
    1. Deleting the offending traffic from PAN-OS doesn't alleviate the issue
admin@PA-5250(active)> clear session id 2965526
session 2965526 cleared

 
  1. The connection will re-spawn in the ingress-backlog having the exact same tuple but with a different session ID, even if the source stopped sending the traffic. 
admin@PA-5250(active)> show running resource-monitor ingress-backlogs

-- SLOT: s1, DP: dp0 --
USAGE - ATOMIC: 75% TOTAL: 89%

TOP SESSIONS:
SESS-ID        PCT       GRP-ID   COUNT
1296096        42%       1        1727
3350962        12%       1        505
3438752        10%       1        411
1306279        3%        1        148

SESSION DETAILS
SESS-ID        PROTO    SZONE      SRC             SPORT   DST             DPORT  IGR-IF   EGR-IF   TYPE    APP
1296096        17       Internet   186.154.32.60   13237   170.80.96.17    4501   unknown  unknown  FORW    undecided
1306279        17       TP-GP      10.142.106.185  38211   10.250.50.136   5062   unknown  unknown  FORW    undecided
3350962        17       TP-GP      10.142.78.209   38211   10.250.50.136   5062   unknown  unknown  FORW    undecided
3438752        17       TP-GP      10.159.155.106  38211   10.250.50.137   5062   unknown  unknown  FORW    undecided

-- SLOT: s1, DP: dp1 --
USAGE - ATOMIC: 2% TOTAL: 9%

 


Environment


  • PA-5200 series
  • PAN-OS versions 10.1.7, 10.1.8
  • High-Availability setup is Active-Passive
  • High-Availability setup is Active-Active


Cause


First Packet Processor (FPP) and Data Plane (DP) sessions are out of sync, FPP has flow entry, and DP doesn't have the active flow session.

Resolution


  1. Due to PAN-210327 upgrade the PAN-OS to versions  11.0.1+, 10.2.4+,10.1.9-h1+
  2. Workarounds
    1. In an Active-Passive setup, failover 
    2. Rebooting the firewall will delete the looping packet


Additional Information


  • PA-5200's DP CPU processing is running at 100% even with no traffic flow
  • PA-5200's DP CPU processing is running at 100% even with Data Plane ports removed
  • PA-5200's DP CPU processing runs at 100% with an incorrect and high packet rate. Comparing the packet rate on the switch port versus the firewall's packet rate, the latter provides a higher value.


Actions
  • Print
  • Copy Link

    https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000bprGCAQ&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail