How to troubleshoot high DP CPU because of IPSEC traffic
34429
Created On 12/13/18 08:58 AM - Last Modified 04/11/22 05:11 AM
Objective
Sometimes high amount of IPSEC traffic can cause DP resource issue. IPSEC Traffic is handled differently based on platforms and the type of IPSEC traffic. This article explains how to verify its the IPSEC traffic causing issue.
Procedure
Case 1: IPSEC Passthrough
- For IPSEC Passthrough on PA-3000 and PA-5000, the traffic will always end up on DP since the offload processor cannot match session with the incoming ESP packets which have SPI values but no Ports.
- For IPSEC Passthrough on PA-7000, PA-5200 and PA-3200, the offload processors can handle ESP traffic as they can match SPI Values with the flows and hence offload will work. Same applies for GRE and AH traffic.
- For platforms where offload processor will send ESP traffic to DP, it is easy to run at capacity in case there is high amount of traffic even though the platform has an offload processor to utilize.
- In such cases, the port 20033 sessions maybe very long lived and hence may not appear in traffic logs and thus can elude the notice of network admin who is trying to assess ACC reports for determining top used applications.
- We should check for sessions with long life or huge number of bytes:
> show session all filter min-kb 100000 (to check sessions with 100m data)
Case 2: IPSEC Terminated on Palo Alto
- In this case, the IPSEC Sessions will always have Port Numbers derived from SPIs and all packets will anyway come to DP since the encryption, decryption needs to be performed.
- Additionally you will not always get traffic logs for IPSEC-ESP sessions.
- In case the amount of traffic through IPSEC tunnels is high, then its easy to run into capacity issues. Please refer to the datasheets to know various capacity numbers. Eg: Datasheets
- As in Case 1 traffic logs and ACC may not be very helpful since these logs are only generated once a session ends, and IPSEC tunnels may have a long life. For example take a look at the following ACC Snapshot, where maximum traffic is shown as SSL and encrypted tunnel is only showing as a small amount of traffic comparatively.
- In this case we can see CPU is around 60%:
2018-09-19 16:14:40.823 +1000 --- panio
:
:Resource monitoring sampling data (per second):
:
:CPU load sampling by group:
:flow_lookup : 60%
:flow_fastpath : 58%
:flow_slowpath : 60%
:flow_forwarding : 60%
:flow_mgmt : 49%
:flow_ctrl : 49%
:nac_result : 60%
:flow_np : 58%
:dfa_result : 60%
:module_internal : 60%
:aho_result : 60%
:zip_result : 60%
:pktlog_forwarding : 59%
:lwm : 0%
:flow_host : 57%
:
:CPU load (%) during last 15 seconds:
:core 0 1 2 3 4 5
: 0 48 59 59 59 58
: 0 48 59 60 59 59
: 0 54 64 64 64 64
: 0 57 67 67 67 67
: 0 53 63 64 64 64
: 0 49 61 62 61 61
: 0 46 60 60 60 60
: 0 39 53 54 53 53
: 0 43 55 55 55 55
: 0 49 60 60 60 60
: 0 49 61 60 60 60
: 0 44 57 57 57 57
: 0 38 53 52 52 52
: 0 39 54 54 54 54
: 0 45 59 59 59 59
:
:Resource monitoring sampling data (per second):
:
:CPU load sampling by group:
:flow_lookup : 60%
:flow_fastpath : 58%
:flow_slowpath : 60%
:flow_forwarding : 60%
:flow_mgmt : 49%
:flow_ctrl : 49%
:nac_result : 60%
:flow_np : 58%
:dfa_result : 60%
:module_internal : 60%
:aho_result : 60%
:zip_result : 60%
:pktlog_forwarding : 59%
:lwm : 0%
:flow_host : 57%
:
:CPU load (%) during last 15 seconds:
:core 0 1 2 3 4 5
: 0 48 59 59 59 58
: 0 48 59 60 59 59
: 0 54 64 64 64 64
: 0 57 67 67 67 67
: 0 53 63 64 64 64
: 0 49 61 62 61 61
: 0 46 60 60 60 60
: 0 39 53 54 53 53
: 0 43 55 55 55 55
: 0 49 60 60 60 60
: 0 49 61 60 60 60
: 0 44 57 57 57 57
: 0 38 53 52 52 52
: 0 39 54 54 54 54
: 0 45 59 59 59 59
- The easiest way to find out is check output of "debug dataplane pow performance" which tells a summary of which function is consuming how many CPU cycles and what is the average processing time. In this case look at the output from one instance:
:func max. proc us ave. proc us count
:dfa_match 0 0 0
:policy_lookup 21343 15 2605518
:user_group_policy 5639 1 876821
:get_gid 5038 0 1257441
:regex_lookup 3215 10 4028529
:regex_postfpga 462 6 827546
:reg_expr_match 0 0 0
:sml_vm 199 4 44650282 <<<<<<<<<<< Related to Layer 7
:detector_run_p1 1203 13 1336837
:detector_run_p2 11610 19 1369137
:ssl_proxy_proc 8512 17 158104
:ssl_encode 76 27 98267
:rsa_operation_pub_enc 0 0 0
:rsa_operation_pub_dec 0 0 0
:rsa_operation_priv_enc 0 0 0
:rsa_operation_priv_dec 8282 8179 56
:rsa_key_gen 0 0 0
:rsa_RSA_sign 0 0 0
:rsa_RSA_verify 0 0 0
:ecdhe_key_gen 0 0 0
:ecdhe_key_gen_mul 0 0 0
:ecdhe_key_compute_mul 0 0 0
:ecdhe_key_compute 0 0 0
:ecdhe_generate_xchg_key 0 0 0
:ecdhe_gen_key_exchange_msg 0 0 0
:ecdhe_get_key_exchange_msg 0 0 0
:ecdhe_parse_server_key_exchange_msg 0 0 0
:ecdhe_gen_server_key_exchange_msg 0 0 0
:dhe_gen_para 0 0 0
:dhe_gen_key 0 0 0
:dhe_key_compute 0 0 0
:bn_mod_exp 8252 8151 56
:cipher_enc 51 4 211819
:zip_deflate 36 1 164908
:pktlog_log 13 12 2
:pktlog_send 0 0 0
:tunnel_encap 67 10 14498723 <<<<<<<<<<< Related to tunnel
:tunnel_decap 605 28 10978906 <<<<<<<<<<< Related to tunnel
:tunnel_esp 51 2 8831828
:tunnel_esp_decap 587 13 10705421 <<<<<<<<<<< Related to tunnel
:tunnel_ah_decap 0 0 0
:tunnel_ah 61 4 8978189 <<<<<<<<<<< Related to tunnel
:tunnel_fwd 17 2 6426
:tunnel_prepare 44 1 9010166 <<<<<<<<<<< Related to tunnel
:tunnel_post 51 1 8905132
:appid_result 0 0 0
:ctd_token 16291 79 3910573
:dos_update 0 0 399222
:urlcache_update 10010 225 411
:urlcache_lookup 2433 89 100202
:urlcache_insert 0 0 0
:urlcache_delete 0 0 0
:urlcache_lru 0 0 0
:session_install 53 6 399309
:session_age 39 0 1639088
:session_delete 64 9 399215
:session_purge 47 3 395786
:inline_switch 0 0 0
:age_arp 15 3 600
:age_pbf_ret_mac 1 0 600
:stack_trace 0 0 0
:ctd_pw_check 0 0 0
:ctd_pw_md4 0 0 0
:ctd_pw_extract 0 0 0
:prl_rewrite 0 0 0
:pkt_rewrite 0 0 0
:vpn_send_to_client 0 0 0
:vpn_cookie_response 0 0 0
:raven_match 39 3 16839
:raven_match_header 7 3 2272
:raven_match_body 8 3 32
:raven_sig_get_action 0 0 0
:age_mfib 1 0 600
:dfa_match 0 0 0
:policy_lookup 21343 15 2605518
:user_group_policy 5639 1 876821
:get_gid 5038 0 1257441
:regex_lookup 3215 10 4028529
:regex_postfpga 462 6 827546
:reg_expr_match 0 0 0
:sml_vm 199 4 44650282 <<<<<<<<<<< Related to Layer 7
:detector_run_p1 1203 13 1336837
:detector_run_p2 11610 19 1369137
:ssl_proxy_proc 8512 17 158104
:ssl_encode 76 27 98267
:rsa_operation_pub_enc 0 0 0
:rsa_operation_pub_dec 0 0 0
:rsa_operation_priv_enc 0 0 0
:rsa_operation_priv_dec 8282 8179 56
:rsa_key_gen 0 0 0
:rsa_RSA_sign 0 0 0
:rsa_RSA_verify 0 0 0
:ecdhe_key_gen 0 0 0
:ecdhe_key_gen_mul 0 0 0
:ecdhe_key_compute_mul 0 0 0
:ecdhe_key_compute 0 0 0
:ecdhe_generate_xchg_key 0 0 0
:ecdhe_gen_key_exchange_msg 0 0 0
:ecdhe_get_key_exchange_msg 0 0 0
:ecdhe_parse_server_key_exchange_msg 0 0 0
:ecdhe_gen_server_key_exchange_msg 0 0 0
:dhe_gen_para 0 0 0
:dhe_gen_key 0 0 0
:dhe_key_compute 0 0 0
:bn_mod_exp 8252 8151 56
:cipher_enc 51 4 211819
:zip_deflate 36 1 164908
:pktlog_log 13 12 2
:pktlog_send 0 0 0
:tunnel_encap 67 10 14498723 <<<<<<<<<<< Related to tunnel
:tunnel_decap 605 28 10978906 <<<<<<<<<<< Related to tunnel
:tunnel_esp 51 2 8831828
:tunnel_esp_decap 587 13 10705421 <<<<<<<<<<< Related to tunnel
:tunnel_ah_decap 0 0 0
:tunnel_ah 61 4 8978189 <<<<<<<<<<< Related to tunnel
:tunnel_fwd 17 2 6426
:tunnel_prepare 44 1 9010166 <<<<<<<<<<< Related to tunnel
:tunnel_post 51 1 8905132
:appid_result 0 0 0
:ctd_token 16291 79 3910573
:dos_update 0 0 399222
:urlcache_update 10010 225 411
:urlcache_lookup 2433 89 100202
:urlcache_insert 0 0 0
:urlcache_delete 0 0 0
:urlcache_lru 0 0 0
:session_install 53 6 399309
:session_age 39 0 1639088
:session_delete 64 9 399215
:session_purge 47 3 395786
:inline_switch 0 0 0
:age_arp 15 3 600
:age_pbf_ret_mac 1 0 600
:stack_trace 0 0 0
:ctd_pw_check 0 0 0
:ctd_pw_md4 0 0 0
:ctd_pw_extract 0 0 0
:prl_rewrite 0 0 0
:pkt_rewrite 0 0 0
:vpn_send_to_client 0 0 0
:vpn_cookie_response 0 0 0
:raven_match 39 3 16839
:raven_match_header 7 3 2272
:raven_match_body 8 3 32
:raven_sig_get_action 0 0 0
:age_mfib 1 0 600
- From here we can get an idea that maximum CPU Cycles are going into tunnel handling along with Layer 7.
- On further checking it was found that majority of SSL traffic for this case, was going into IPSEC tunnels, so it would suffice to count the amount of application traffic going into tunnel and then compare with the capacity numbers.
> show session all filter to proxy count yes <<<<<< proxy here is the name of zone for tunnel interfaces in this case. Filter may differ as required
Number of sessions that match filter: 42118
> show session all filter count yes
Number of sessions that match filter: 53523
- We can see above that around 80% of total sessions were going via IPSEC.
- Throughput as checked from "show session info"
:target-dp: *.dp0
:--------------------------------------------------------------------------------
:Number of sessions supported: 524286
:Number of allocated sessions: 57011
:Number of active TCP sessions: 51355
:Number of active UDP sessions: 3736
:Number of active ICMP sessions: 111
:Number of active GTPc sessions: 0
:Number of active GTPu sessions: 0
:Number of pending GTPu sessions: 0
:Number of active BCAST sessions: 0
:Number of active MCAST sessions: 0
:Number of active predict sessions: 317
:Session table utilization: 10%
:Number of sessions created since bootup: 178684165
:Packet rate: 93096/s
:Throughput: 703278 kbps <<<<<<< i.e. around 700 mbps
:New connection establish rate: 542 cps
- Since ~80% sessions are via IPSEC, we can approximate that around 500 to 550 mbps data maybe be going via IPSEC. If we compare with PA-3060 capacity numbers IPSEC VPN Throughput is around 500 mbps.
- In this case there is Layer 7 inspection and other processing also going on.
Additional Information
Read Processing IPSec pass-through traffic on the Palo Alto Networks firewall and What do the port numbers in an IPSEC-ESP session represent?