How To Troubleshoot High Management Plane CPU Usage

How To Troubleshoot High Management Plane CPU Usage

59689
Created On 02/17/22 23:17 PM - Last Modified 04/03/24 21:21 PM


Objective


High MP CPU can cause issues with regular firewall/Panorama operations, below is a general guidance on troubleshooting a PAN-OS device that is hitting high Management Plane CPU usage.

Environment


  • PAN-OS (NGFW and VM)
  • AIOps
  • MP CPU


Procedure


Logging and Reporting

One of the major causes of High management plane CPU is excessive Logging and Reporting on Customer firewall/Panorama. The following steps are recommended to alleviate the load on the management plane caused by those two functions:
  1. Restrict the logging to the security rules that handle interesting traffic:
    1. It might not be interesting to monitor some non-user traffic like DNS, NetBIOS, Dynamic Routing protocols, SNMP, ICMP, etc.... if so create a separate security rule for that traffic and disable the logging for it.
    2. Create a separate rules for the chatty applications and disable the logging for those rules. Chatty applications create a lot of sessions: if those sessions are logged this would increase the load on the MP CPU. Check the ACC tab to see which applications create a lot of sessions.
    3. Make sure that the default security rules have the logging disabled: The intrazone-default and interzone-default security rules are not logged by default as those are considered the catch-all rules and usually handle the traffic that firewall admin is not interested in monitoring.
    4. Disable Log at Session Start and only Log at Session End for the security rules which handle the traffic that is important to monitor for your organization.
  2. Minimize the logs that are being forwarded to an external destination.
  3. Reduce the retention time of your device logs by setting a value for Max Days. under Device > Setup > Management > Logging and Reporting Settings > Log Storage.
  4. Check if the debug level for all services is the default otherwise restore the debug level of all services to their default. The general command is available only for the FW
    debug software logging-level show level service all-services
    for Panorama you need to use the individual command specific for each process.
    > debug <daemon name> show
  5. Report generation can introduce CPU load on the device. 
    1. Avoid scheduled custom reports that cover a long duration period and reduce the number of those reports. Whenever possible, generate the reports on Panorama instead of the Firewall: On Panoramas, specify “Panorama Data” as the source of the report. 
  1. For ACC reports, avoid querying over a long duration period.
  2. Disable unnecessary predefined reports on the device from Device > Setup > Logging and Reporting Settings > Pre-Defined Reports. By default, all pre-defined reports are enabled, while many of them may not be useful to the organization.
  3. During times of high load, do not keep ACC or log monitoring UI tab open and set to auto refresh. This queries the log database and recompiles the output periodically. Instead set the page refresh to Manual under Monitor > LogsManual

Automated Processes
 Another top contributor to elevated MP CPU is automated processes running in the backend.

  1. Run backup processes on the device during non-peak hours.
  2. Minimize admin logins during business hours (log refreshes and ACC views consume resources).
  3. When possible (see note b), consider reducing the frequency of the FQDN refresh time by increasing the refresh time value (in seconds) under Device (for FW) / Panorama (for Panorama) > Setup > Services > DNS settings > Minimum FQDN Refresh Time (sec) or using CLI:
    # set deviceconfig system fqdn-refresh-time <value> <0-14399> (in seconds)
    # commit
    1. Note: By default it is set to 30s.
    2. Important to note: Be careful in the case of Dynamic IP environment since this approach can cause traffic drops if the IP address associated with the FQDN object is subject to change before the refresh interval.
  4. User Identification and more specifically, Group Mapping, can put a significant strain when large amounts of group objects are loaded onto the firewall.
    1. Limit the number of groups by configuring the Group Include List. To configure, Device > User Identification > Group Mapping Settings > Group Include List. You can also use Group filters.
    2. User-ID, IP mapping unknow can cause high CPU. Excluding User-IP mapping on unwanted zones can help: UNKNOWN IP RATE LIMIT MITIGATION FOR USER-ID MAPPINGS

Processes

  1. Check which process is consuming the large amount of cpu:
    show system resources follow
    depending on which process is the culprit, conduct the proper investigation to mitigate the problem.
General health checks
  1. Check if the disk-space on root is high and if so reduce its usage to an acceptable percentage.
  2. Check if all the process are running:
    show system software status
Capacity planner
  1. If it's not possible to reduce activity on the device, then consider an upgrade to a higher capability platform with a larger Management CPU capacity. Refer to product selection to compare your firewall to other platforms.
       2. For VM verify if it has the correct capacity (size) per our https://docs.paloaltonetworks.com/vm-series/10-1/vm-series-performance-capacity/vm-series-performance-capacity/vm-series-on-azure-models-and-vms

For further help on how to mitigate this high MP CPU issue, open a support ticket. check HOW TO OPEN A CASE FOR HIGH MANAGEMENT PLANE CPU.



Actions
  • Print
  • Copy Link

    https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000oNDrCAM&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail

Choose Language