How to troubleshoot OSPF adjacency stuck in INIT or EXSTART or EXCHANGE or LOADING States

How to troubleshoot OSPF adjacency stuck in INIT or EXSTART or EXCHANGE or LOADING States

123677
Created On 04/13/19 06:48 AM - Last Modified 01/17/24 17:04 PM


Objective


An OSPF Neighbourship goes through multiple stages before being fully adjacent with neighbor. Following document explains the stages it goes through and the troubleshooting procedure that can be followed if the neighborship is not established correctly.

 


Environment


  • PANOS
  • OSPF


Procedure


(Assume the PaloAlto firewall is trying to establish adjacency with a peer router and the configuration has been verified to be correct)

PA ===== Switch ====== OSPF Router

User-added image



The above diagram provides information on the steps that occur before Palo Alto Firewall becomes OSPF neighbor with another router.

1. OSPF Process starts and firewall starts sending broadcast Hello Packets.

2. At this point there is no OSPF Neighbour Listed in list of neighbours. Firewall has yet not received peer's Hello Packets

3. If at this point, you do not see any progress for long time, that means OSPF Hello Packets from the peer are either getting dropped before reaching the routed process on MP, or are being ignored by routed process because of parameter mismatch.
 
Things to check:
  1. Check if the layer3 connectivity between these two routers are working (ping source x host y).
  2. Check if the devices are in the same OSPF area.
  3. Check if the devices have the same authentication configuration (e.g., username, password, or authentication type).
  4. Check if the devices are on the same subnet.
  5. Check if the hello and dead intervals match.
  6. Check if the devices have matching stub flags.
  7. Once parameters are matched at both sides, check routed logs to validate if it is receiving any OSPF Hellos or any mismatch is seen in logs by running the CLI command  “tail follow yes mp-log routed.log”.
Example:

tail follow yes mp-log routed.log:

**** EXCEPTION 0x3e02 - 50 (0000) **** I:000009d2 F:00000010
qonmhllo.c 306 :at 10:55:42, 23 September 2018 (137584 ms)
OSPF 1 Hello packet with mismatched dead interval received from router 10.3.0.128.
My Dead Interval = 200
Neighboring Dead Interval = 20
 
h. If no activity is seen for peer's hello in routed logs check on PA firewall if there is a policy to allow OSPF communication both ways. Remember this is a intrazone policy (Same Zone to Same zone). If you have not blocked specifically, then default intrazone-default policy should allow this traffic.
i. If policy review cannot reveal anything take packet captures, global counters for the OSPF traffic (Use specific filters with interface and protocol number as 89) : GETTING STARTED: PACKET CAPTURE  
j. If you cannot see OSPF packets from peer on the Receive Stage also, check on the Peer router and Switch in the middle to see if its is dropping hello multicast packets to destination 224.0.0.5.

4. If the peer's Hello packets reach the routed process and parameters match, the adjacency formation will start and firewall will move the neighbour state to "INIT". INIT State means, I have received the Hello from Peer, but my Router ID is not listed in the Peer's Hello Message (Meaning my Hello has not reached the Peer)

5. If your adjacency with the Peer is stuck in INIT State for long, we need to find why our Hello's are not reaching the Peer. 
 
Things to check:

a) Use same steps as in 3 above to check firewall is dropping the outbound OSPF packets.
b) If firewall is seen transmitting OSPF Hellos out, then we should check the peer router to see if it is receiving these hellos on its interface.
c) If the packets are not seen on the peer, then the switch could be dropping (maybe a vwire device in between), this needs to be checked.
d) If the packets are being received on the peer then we should check Peer's logs to see if its dropping them due to any reason (Normally parameters should be matching since we accepted Peer's Hello)

6. Once Hellos are received both ways with each other's Router ID in Hello Packet's Neighbour List, OSPF with move to 2-WAY. If you are seeing neighborship stuck in 2-Way state, there could be two possibilities:

a)  In network segments with multiple OSPF routers, two routers are chosen as DR and BDR respectively based on their priority or Router ID. All other OSPF routers are supposed to form adjacency with DR and BDRs. So for example if there are four routers in a network segment, two will be DR and BDR, and the other two will form FULL neighborship with DR and BDR. These two routers will see each other in 2-Way state as they are still receiving Hellos from each other, but will not commence with DBD exchange. So it is expected to see these neighbors always in 2-Way state with each other.

b)  If the neighborship with DR/BDR is stuck in 2-way state, then we should check Unicast communication between these routers should not be blocked as DBD exchanges will use unicast communication.

7. After 2-way now the peers will exchange DBD Packets (Data Base Descriptors). Now the peers will establish master slave relationship. The router with the highest router-ID becomes the master and starts the exchange. These are Unicast packets between the two peers. At this point neighbourship will move to EXSTART State.

8. One of the parameters which should match in DBD Headers is MTU. If the MTU does not match, the neighbourship can remain at EXSTART State. OSPF NEIGHBORSHIP STUCK IN EXTSTART STATE 

9. Once DR/BDR election is done, Peers start exchanging DBD packets which contain LSA Headers which describe the contents of link state database. This state is EXCHANGE.

10. Post Exchange, OSPF Process will identify what LSAs are missing, and then starts the process of LSA Exchange. This state is called as LOADING. 

11. If you see the the state is stuck in EXCHANGE or LOADING State. It means that either LSA Headers are not getting exchanged or LSA Updates are not getting exchanged.
 
Things to Check:
a) We should make sure MTU matches on the OSPF Peers and also on the device (switch/transparent firewall) in between.
b) We should make sure bidirectional communication is allowed all the way i.e. on both PA and OSPF Peer and also on device in between. There should not be any ACLs blocking any such packets.
c) Fragmentation could be an issue with LSA updates are large and device in between (Or firewall interface itself) has smaller MTU, so we should make sure there are no settings to block fragments.
d) On Palo Alto debugs, pcaps and global counters can again help to verify if we are sending out and receiving packets, but simultaneous captures on the peer are important as well.

12. The last state is the “FULL” state. In this state, the routers are fully adjacent with each other. All the router and network LSAs are exchanged. The routers database are also fully synchronized. OSPF Process will then calculate the best routes using Dijkstra's algorithm. Based off this information, the devices will install the calculated routes in their respective Routing Table

The above procedure should help to resolve most of the issues with OSPF neighborship formation. 


Actions
  • Print
  • Copy Link

    https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000PLZjCAO&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail

Choose Language