How to troubleshoot LACP going down or flap issue
30674
Created On 08/30/22 18:37 PM - Last Modified 07/29/24 15:25 PM
Objective
Troubleshooting LACP going down or flap issue
Environment
- Palo Alto Firewall
- LACP Configured
Procedure
- Check the system logs with filter set to (subtype eq lacp) under UI: Monitor > Logs > System
show log system direction equal backward subtype equal lacp
- Check the l2ctrld.log during the timestamp of the issue gathered from step 1.
less mp-log l2ctrld.log
- If ethernet interface moved out of the aggregated interface and you see similar messages as below:
mp l2ctrld.log ethernet1/1 idx 64, current_while expired. mp l2ctrld.log ethernet1/1 idx 64, rx state change CURRENT=>EXPIRED mp l2ctrld.log ethernet1/1 idx 64 mux state change RX_TX=>ATTACHED, select_state Selected, partner state 0x37
state 0x37 is the hexadecimal value of 00110111 in binary. Based on the Additional Information notes that should tell you that the partner is active, has a short timeout, is aggregatable, is out of sync, is collecting incoming frames, is distributing outgoing frames, that the partner info is taken from received lacpdu and that the actor rxm is in not in expired state.
0x37 (00110111) ; bit0(1=active); bit1(1=short); bit2 (1=aggregatable); bit3 (0=out of sync); bit4=(1=collecting incoming frames), bit5=(1=distributing outgoing frames), bit6(0=partner info taken from received lacpdu),bit7(0=actor rxm is not in expired state)
- If below messages are seen:
mp l2ctrld.log ethernet2/13 idx 140 received pdu partner does not match local actor mp l2ctrld.log Recved LACPDU actor: mp l2ctrld.log sys_pri 4000, system_mac 00:23:04:ee:be:78, key 32793, port_pri 32768, port_num 313, state 0x45 partner
state 0x45 is the hexadecimal value of 01000101 in binary. Based on the Additional Information notes that should tell you that the partner believes that the local firewall is active, has long time-out, is aggregatable, is out of sync, is not collecting incoming frames, is not distributing outgoing frames, partner info is default and the actor rxm is not in expired state. This is because the l2ctrld message shows that the state 0x45 partner is in the received LACPDU.

- Check the output of the CLI:
show lacp aggregate-ethernet all
Note:- At least one side needs to be active
- If the transmission rate is selected to be slow that means that the LACP query and response exchange is every 30 seconds which is the default.
- If the transmission rate is selected to be fast that means that the LACP query and exchange response is every second.
- For other checks refer to Configure an Aggregate Interface Group.
- Verify whether the physical link went down before the LACP going down, leading the interface to be moved out of the aggregated group.
less mp-log brdagent.log show log system direction equal backward show log system subtype equal port eventid equal link-change direction equal backward
- If issue is on-going, then enable the debugs during the troubleshooting window:
debug l2ctrld global on debug debug l2ctrld lacp on debug
and collect a packet capture on the dataplane using CLI:debug dataplane packet-diag set filter match lacp debug dataplane packet-diag set filter on debug dataplane packet-diag set capture on
Once done collecting the packet capture level down the debugs:debug l2ctrld global on info debug l2ctrld lacp on info
- To identify when and where the LACP packets are missing, use the global counters:
show clock show counter global | match lacp
Collect the above at least twice per 1 second for FAST transmission. Collect the above at least twice per 30 seconds for SLOW transmission. - If, based on the collected information, you determine that the issue lies on the Palo Alto Networks firewall side, for instance, it is not sending LACPDU packets in a timely manner, then proceed to check the firewall resources: MP CPU, DP CPU, Packet descriptor and Buffer to see if at the time of the issue the resources utilization on the firewall was high.
- Additional Information: LACP state is an 8-bit field. It’s in each lacpdu. Actor state is local state. Partner state is peer state.
/* state in lacpdu */
#define PAN_LACP_ACTIVITY 0x1 /* 1= active, 0 = passive */
#define PAN_LACP_TIMEOUT 0X2 /* 1 = short, 0 = long */
#define PAN_LACP_AGGREGATION 0X4 /* 1 = aggregatable, 0 = individual */
#define PAN_LACP_SYNCHRONIZATION 0X8 /* 1 = in sync, 0 = out of sync */
#define PAN_LACP_COLLECTING 0X10/* 1 = collecting incoming frames */
#define PAN_LACP_DISTRIBUTING 0X20/* 1 = distributing outgoing frames */
#define PAN_LACP_DEFAULTED 0X40/* 1 = partner info is default, 0 = partner info taken from received lacpdu */
#define PAN_LACP_EXPIRED 0X80/* 1 = actor rxm in EXPIRED state. 0 otherwise */