How to troubleshoot CTD processing values spiked or running high in > debug dataplane pow performance

How to troubleshoot CTD processing values spiked or running high in > debug dataplane pow performance

15902
Created On 06/06/23 21:01 PM - Last Modified 08/23/23 18:40 PM


Objective


The CLI Command > debug dataplane pow performance shows how long (in microseconds) the CPU is taking to perform various software functions on to process traffic as it passes through the firewall. When CTD resources are approaching maximum usage in the firewall, values in this output may go higher or spike compared to low, normal, working times, and symptoms that may be seen are: high CPU %, traffic/application slowness or failures, high packet descriptor (on-chip) and packet buffer usage, etc. The below functions are commonly responsible for performing CTD inspections (Content & Threat Detection) on traffic as it passes through the firewall:
> debug dataplane pow performance
func                                  max-μs   avg-μs        count     total-μs  ac-max-μs  ac-avg-μs         ac-count      ac-total-μs
sml_vm                                     0      276491782.0  0            0         65        1.1           312208           369019
ctd_token                                  0      36.0           27197221  0        180       20.8            18000           374440
detector_run_p1                            0      0.0            0            0         66        4.9             8509            42053
detector_run_p2                            0      0.0            0            0         27        1.5             8786            13669
regex_lookup                               0      0.0            0            0       1093        7.0            11494            81464
Note: (μs in the above output stands for microseconds)

Example: If count goes high suddenly but avg-μs stays low, the CPU is still inspecting the packets quickly, just the amount of traffic (count) coming into the firewall was higher at that moment. For this reason, it is important to interpret the above values individually and also to compare them to a working time when an issue was not present to determine if they are the root cause of the issue or not. (i.e. if they went high only during the issue occurring, they may have something to do with causing the issue). In the example where count goes high but avg-μs stays low, you would simply assess incoming traffic to see which new traffic flows came in that caused this spike in count and see if that was the root cause of the issue.

It is very important to take a baseline of what the above values are at under normal, working, non-issue conditions so that if an issue does arise that is caused by the values above, abnormally higher values than normal will stand out, be noticed, and be understood ( to see historical values for this output to compare working vs. non-working times, view dp-monitorX.log using the command: > less mp-log dp-monitor<1-5>.log ).


Environment


  • PAN-OS
  • CTD (Content & Threat Detection)


Procedure


The most common scenarios that can cause CTD resource usage to spike or run high involve recent changes in the network, traffic profile/pattern, or a specific new large traffic flow, such as:
  • A certain new traffic flow was introduced that is large or heavy on the firewall's inspection engine such as: unknown-udp, unknown-tcp, ms-ds-smb, mssql-db, etc.
  • An abnormally high quantity or rate of a heavier type of traffic is going through the firewall such as: unknown-udp, unknown-tcp, ms-ds-smb, mssql-db, etc.
  • All or a large amount of traffic goes through one Security Policy allow rule with 'strict' Security Profile(s) attached (especially when the amount of traffic is approaching that model's maximum Performance and Capacity - in this case, sml_vm, ctd_token, detector_run_p1, and detector_run_p2 being high would be expected).
  • All or most traffic goes through a single broad Security Profile rule that has many heavy or strict Security Profiles configured on it, especially if there are high amounts of unclassified traffic (unknown-udp, unknown-tcp) or high data rate traffic (ms-ds-smb, mssql-db, etc.) passing through the firewall
  • An inefficient Custom URL Filtering, Custom Application, or Custom Threat signature/object was configured which uses too many wildcards, uses nothing but a single wildcard, is too broad, etc. (this misconfiguration will often cause the function regex_lookup to go high specifically). To resolve this, remove any performance-impacting, inefficient wildcards or patterns in URL Filtering Objects, Custom Applications, or Custom Signatures - see Nested Wildcard(*) in URLs May Severely Affect Performance for more details
If CTD processing values were low but then suddenly went abnormally high (and you are having an issue such as high CPU, packet drops, or latency/traffic slowness) perform the steps below:
  1. Take a baseline of these values before the issue occurred (or check historical values). If the current values for sml_vm, ctd_token, detector_run_p1, and detector_run_p2 are much higher than previous values seen, then they might be the culprit of the high CPU or traffic issue. If the current values are around the same as old values when the issue was not occurring, then they might not be the culprit or root cause of the issue. If the value(s) are abnormally higher than the baseline values before the issue started occurring, then proceed to Step 2 below.
  2. Identify which new traffic flow (by Source IP, Source Port, Destination IP, Destination Port, Application) was introduced which started causing these value(s) to increase
    1. Navigate to ACC > Network Activity tab > view the Source IP Activity, Destination IP Activity, and Application Usage charts
    2. Review and assess any traffic flows in the above charts that stand out as outliers taking up a high amount of: bytes, sessions, packet rate, etc. (especially new traffic flows that have shown up only since the issue began occurring). Look for one specific traffic flow or application which is showing up in the charts only when the issue is occurring and is not showing up when the issue is not occurring
    3. Run the CLI Command: > show running application statistics to check if any applications have an abnormally high value for any column (especially for heavier data rate applications such as: web-browsing, unknown-udp, unknown-tcp, dns, ms-ds-smb, ms-ds-smbv2, ms-ds-smbv3, and mssql-db, or any Custom Applications)
    4. If any traffic flows or applications stand out as possible offenders in the above steps i.e. where their values are higher than at normal, working, non-issue times when there is not a traffic/high CPU issue), then temporarily halt that traffic flow through the firewall and see if the CTD-related values in > debug dataplane pow performance (sml_vm, ctd_token, detector_run_p1, or detector_run_p2) go back down to normal levels and if the traffic issue is now gone
  3. Reduce any inspections done on that specific traffic flow by:
    1. Create a Custom Application for that traffic (so that it does not get classified as unknown-udp or unknown-tcp)
    2. Send that traffic instead through a Security Policy rule which does not have any Security Profile(s) attached (or use a less strict profile with less signatures enabled)
    3. Enable DSRI on the Security Policy rule which that traffic takes
    4. Create an Application Override for that traffic (only use if the traffic is trusted by your organization)
Note: Options B, C, and D above will reduce the amount of security inspection done on the traffic. Use only Option A above whenever possible
 


Additional Information


How to interpret output of "debug dataplane pow performance" during troubleshooting high DP CPU
How to mitigate High DP CPU issue due to High Application Usage
How to Troubleshoot High Dataplane CPU
Nested Wildcard(*) in URLs May Severely Affect Performance
Tips & Tricks: Custom Vulnerability
 


Actions
  • Print
  • Copy Link

    https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000kI3LCAU&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail

Choose Language