How to mitigate an abnormal increase in "tcp_alloc_wqe_failed" global counter
6349
Created On 11/03/23 20:27 PM - Last Modified 04/22/24 04:34 AM
Objective
To mitigate an abnormal increase in tcp_alloc_wqe_failed global counter.
Counter's description:
This counter tcp_alloc_wqe_failed increments when the allocation of a TCP (Transmission Control Protocol) WQE (Work Queue Entry ) fails probably due to a depletion of WQE Pool of the dataplane (DP).
This allocation failure could be due to various reasons such dataplane resource constraints, memory limitations or software issues.
Environment
- Next Generation Firewall
- tcp_alloc_wqe_failed
Procedure
- Check the current dataplane resources and in particular the WQE Pool:
> debug dataplane pool statistics Hardware Pools [21] Blast Pool : 1024/1024 0x800000040f21d600 0 [22] LWM Pool : 1024/1024 0x800000040f25e900 0 [23] Timer Pool : 4093/4096 0x800000040f29fc00 0 [24] DFA Pool : 4096/4096 0x800000040f6a4800 0 [25] Output Buffer Po : 1024/1024 0x800000040faa9400 0 [26] WQE Pool : 60103/552960 0x800000038b0b1f80 0 <<<<<<<<< [27] ZIP Pool : 1023/1024 0x800000040fdfe800 0 [28] Dma Cmd Buffers : 1016/1024 0x800000038f440000 0 [29] PKI POOL DFLT : 53246/53248 0x800000038f568800 0 [30] PKO3 AURA : 34926/35506 0x8000000395dbd000 0 [31] SSO AURA : 1794/1890 0x800000039e896000 0
Note: This command returns the status of all the buffers being used by the system and their status:
The number on the left indicates how much buffer is still available.
The number on the right indicates the total size.
If the number on the left drops to 0, the buffer is depleted. - Track the WQE pool usage using CLI:
grep pattern "Work Queue Entries" dp-log dp-monitor.log
Where dp-log can be replaced with mp-log, dp0-log, dp1-log or dp2.log depending on the location of dp-monitor.log and the type of the NGFW platform.- This will help in isolating the issue if there is any resource leakage, if high WQE utilization is during periods of heavy traffic or when specific types of traffic is being received.
- Use the ACC tab to check the type of traffic received on the firewall during the time of WQE Pool high usage.
- Use the Traffic logs: Monitor > Logs > Traffic tab to check the type of traffic received on the firewall during the time of WQE Pool high usage.
- Review the overall dataplane resource usage and monitor the number of sessions received and the type of traffic received:
show running resource-monitor show session info show system statistics session
- Check for PA-VMs if VM-Series System Requirements are met and for NGFW HW platform consider upgrading to a higher capacity platform if necessary.
- Ensure that your network traffic is properly distributed across your network devices and across your firewall dataplanes for multi-dp platforms
- Optimize the flow of your network traffic to reduce the out-of-order TCP packets and packets fragmentation as these two can be the main contributors to the WQE Pool depletion.
- Ensure that the firewall is running one of the PAN-OS versions indicated as the preferred release refer to Support PAN-OS Software Release Guidance.
- If the firewall continues to experience issues with WQE allocation failures despite following the steps above, then collect the firewall's techsupport file and consider contacting the Palo Alto Networks customer support team.
Additional Information
Work Queue Entry (WQE)
This data structure is extremely important; it represents a unit of work and will contain information to determine how the packet will be scheduled and processed by the CPU cores.