How to troubleshoot LACP going down or flap issue

How to troubleshoot LACP going down or flap issue

30674
Created On 08/30/22 18:37 PM - Last Modified 07/29/24 15:25 PM


Objective


 Troubleshooting LACP going down or flap issue

Environment


  • Palo Alto Firewall
  • LACP Configured


Procedure


  1. Check the system logs with filter set to (subtype eq lacp) under UI: Monitor > Logs > System
    show log system direction equal backward subtype equal lacp
  2. Check the l2ctrld.log during the timestamp of the issue gathered from step 1.
    less mp-log l2ctrld.log
    1. If ethernet interface moved out of the aggregated interface and you see similar messages as below:
mp        l2ctrld.log         ethernet1/1 idx 64, current_while expired.
mp        l2ctrld.log         ethernet1/1 idx 64, rx state change CURRENT=>EXPIRED
mp        l2ctrld.log         ethernet1/1 idx 64 mux state change RX_TX=>ATTACHED, select_state Selected, partner state 0x37

state 0x37 is the hexadecimal value of 00110111 in binary. Based on the Additional Information notes that should tell you that the partner is active, has a short timeout, is aggregatable, is out of sync, is collecting incoming frames, is distributing outgoing frames, that the partner info is taken from received lacpdu and that the actor rxm is in not in expired state.
State of 0x37

0x37 (00110111) ; bit0(1=active); bit1(1=short); bit2 (1=aggregatable); bit3 (0=out of sync); bit4=(1=collecting incoming frames), bit5=(1=distributing outgoing frames), bit6(0=partner info taken from received lacpdu),bit7(0=actor rxm is not in expired state)
  1. If below messages are seen:
mp l2ctrld.log  ethernet2/13 idx 140 received pdu partner does not match local actor
mp l2ctrld.log  Recved LACPDU actor:
mp l2ctrld.log  sys_pri 4000, system_mac 00:23:04:ee:be:78, key 32793, port_pri 32768, port_num 313, state 0x45
partner
state 0x45 is the hexadecimal value of 01000101 in binary. Based on the Additional Information notes that should tell you that the partner believes that the local firewall is active, has long time-out, is aggregatable, is out of sync, is not collecting incoming frames, is not distributing outgoing frames, partner info is default and the actor rxm is not in expired state. This is because the l2ctrld message shows that the state 0x45 partner is in the received LACPDU.
State of 0x45
  1. Check the output of the CLI:
     show lacp aggregate-ethernet all
    
    Note:
    1. At least one side needs to be active
    2. If the transmission rate is selected to be slow that means that the LACP query and response exchange is every 30 seconds which is the default.
    3. If the transmission rate is selected to be fast that means that the LACP query and exchange response is every second.
    4. For other checks refer to Configure an Aggregate Interface Group.
  2. Verify whether the physical link went down before the LACP going down, leading the interface to be moved out of the aggregated group.
    less mp-log brdagent.log
    show log system direction equal backward
    show log system subtype equal port eventid equal link-change direction equal backward
  3. If issue is on-going, then enable the debugs during the troubleshooting window:
    debug l2ctrld global on debug
    debug l2ctrld lacp on debug
    and collect a packet capture on the dataplane using CLI:
    debug dataplane packet-diag set filter match lacp
    debug dataplane packet-diag set filter on
    debug dataplane packet-diag set capture on
    Once done collecting the packet capture level down the debugs:
    debug l2ctrld global on info
    debug l2ctrld lacp on info
  4. To identify when and where the LACP packets are missing, use the global counters:
    show clock
    show counter global | match lacp
    
    Collect the above at least twice per 1 second for FAST transmission. Collect the above at least twice per 30 seconds for SLOW transmission.
  5. If, based on the collected information, you determine that the issue lies on the Palo Alto Networks firewall side, for instance, it is not sending LACPDU packets in a timely manner, then proceed to check the firewall resources: MP CPU, DP CPU, Packet descriptor and Buffer to see if at the time of the issue the resources utilization on the firewall was high. 
  6. Additional Information: LACP state is an 8-bit field. It’s in each lacpdu. Actor state is local state. Partner state is peer state.

    /* state in lacpdu */

    #define PAN_LACP_ACTIVITY        0x1 /* 1= active, 0 = passive */

    #define PAN_LACP_TIMEOUT         0X2 /* 1 = short, 0 = long */

    #define PAN_LACP_AGGREGATION     0X4 /* 1 = aggregatable, 0 = individual */

    #define PAN_LACP_SYNCHRONIZATION 0X8 /* 1 = in sync, 0 = out of sync */

    #define PAN_LACP_COLLECTING      0X10/* 1 = collecting incoming frames */

    #define PAN_LACP_DISTRIBUTING    0X20/* 1 = distributing outgoing frames */

    #define PAN_LACP_DEFAULTED       0X40/* 1 = partner info is default, 0 = partner info taken from received lacpdu */

    #define PAN_LACP_EXPIRED         0X80/* 1 = actor rxm in EXPIRED state. 0 otherwise */


Actions
  • Print
  • Copy Link

    https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000sYvYCAU&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail

Choose Language