GlobalProtect users are seeing connectivity issues to the GP Portal or Gateway and the appweb3 (sslvpn) process on PAN-OS is abnormally causing high MP CPU
Symptom
PAN-OS WebUI shows Management Plane (MP) CPU is high and CLI commands show the appweb3 (sslvpn) process is top in the list with abnormally high CPU utilization.
For example, the show system resources follow command shows the appweb3 process with the Process ID (PID: 12267) shows abnormally high value under the %CPU which is consistent over a period of some time:
> show system resources follow 2021-03-10 08:43:51.992 top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2021-03-10 08:43:51.992 top 12267 nobody 20 0 968920 371696 7920 D 475.0 1.1 12903:41 appweb3 2021-03-10 09:03:52.039 top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2021-03-10 09:03:52.039 top 12267 nobody 20 0 970072 361596 8328 S 1265 1.1 13165:59 appweb3 2021-03-10 09:23:52.572 top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2021-03-10 09:23:52.572 top 12267 nobody 20 0 970468 355832 8384 S 1550 1.1 13440:29 appweb3 2021-03-10 09:43:52.570 top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2021-03-10 09:43:52.570 top 12267 nobody 20 0 970412 359812 8432 S 1256 1.1 13718:35 appweb3
With the help of show system software status command, the PID: 12267 shows it's actually sslvpn process which is used by GlobalProtect Portal and Gateway
> show system software status | match "12267" Process sslvpn running (pid: 12267)
Environment
PAN-OS
GlobalProtect App
Cause
The appweb3 (sslvpn) process is used by GlobalProtect Portal and Gateway on the Management Plan (MP) of PAN-OS, a high %CPU value (i.e. high CPU utilization) for appweb3 means it's processing a high rate of requests coming to it. For example,
Scenario#1: In a large GlobalProtect deployment with 25000 users, the appweb3 process shows a very high value for %CPU when a business day starts but none of the users get impacted and everything works well.
Scenario#2: In a large GlobalProtect deployment with 25000 users, the appweb3 process shows a very high value for %CPU when a business day starts and some of the users get unable to connect to the GP Portal or Gateway.
By design, the appweb3 (sslvpn) process supports a maximum rate of 100 requests/second.
When users get impacted due to abnormally high appweb3 CPU utilization, it could be due to the fact that more than 100 requests are coming per second. For example, 120 requests in a second come to appweb3 (sslvpn) process, 100 requests would exhaust the process (hence, high CPU utilization) and the remaining 20 requests would be queued. Now any of the requests time out, the user of that request will be impacted.
In this scenario, the reason for the high request rate should be narrowed down, which could be:
a. High number of legit users trying to connect to the GP Portal/Gateway at the same time at the start of business day
b. Network change event on the client-side (e.g. a branch office with 300 users & all sends a login request to GP Portal at the same time due to network change)
c. Internal users to GP Internal Gateway and network changes. For Internal Gateway, logout and re-login of user happen on network changes (e.g. route changes)
Scenrio#3: Small to medium scale GlobalProtect deployment with 50 users but the appweb3 process shows a very high value for %CPU due to high number of requests
One of the causes could be an attack on appweb3 with lots of requests in a second.
Scenrio#4: Small to medium scale GlobalProtect deployment with 50 users but the appweb3 process shows a very high value for %CPU even when the rate of requests to appweb3 is very low (e.g. 1/second)
This would be considered an anomaly and process memory dump and trace would help understand the reason for the issue
Resolution
Scenario#1: This is a normal scenario where appweb3 caused the high MP CPU but nothing gets impacted
Scenario#2:
a. Expected behavior of hitting the capacity limit of more than 100 requests/second
b. The client-side network instability should be resolved
c. Deploy GP Portals and Internal Gateways on multiple PAN Firewalls to divide the user load
Scenario#3: If the IP Addresses of requests & the type of requests seen on appweb3 are not legit, block the source addresses on the Data Plane (DP) to deny the traffic.
Scenrio#4: Taking the core and backtrace using the following steps would help:
> debug software core sslvpn-web-server > show system files /opt/dpfs/var/cores/: total 4.0K drwxr-xr-x 2 root root 4.0K Aug 26 2020 crashinfo /opt/dpfs/var/cores/crashinfo: total 0 /opt/panlogs/cores/: total 4.0K drwxrwxrwx 2 root root 4.0K Nov 26 11:32 crashinfo /opt/panlogs/cores/crashinfo: total 0 /var/cores/: total 218M -rw-rw-rw- 1 root root 217M Apr 13 14:44 sslvpn_10.0.0_0.core drwxr-xr-x 2 root root 4.0K Apr 13 14:45 crashinfo /var/cores/crashinfo: total 32K -rw-rw-rw- 1 root root 29K Apr 13 14:45 sslvpn_10.0.0_0.info