Palo Alto Networks VM-Series Firewall suddenly restarts unexpectedly due to disk I/O performance

Palo Alto Networks VM-Series Firewall suddenly restarts unexpectedly due to disk I/O performance

11183
Created On 12/02/19 08:56 AM - Last Modified 01/21/20 21:33 PM


Symptom


  • VM-Series Firewall suddenly restarts unexpectedly. The restart may happen repeatedly.
  • The system may have crashinfo and core files of crashed processes.
  • /var/log/messages in tech-support file shows the following messages with trace logs.
/var/log/messages: 

Aug 21 15:33:15 mgmt kernel: INFO: task kjournald:654 blocked for more than 120 seconds.
Aug 21 15:33:15 mgmt kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Aug 21 15:33:15 mgmt kernel: kjournald D 0000000000000000 0 654 2 0x00000000
Aug 21 15:33:15 mgmt kernel: ffff8803eaeb7c68 0000000000000046 ffff8803eaeb7fd8 ffff8803eaeb7fd8
Aug 21 15:33:15 mgmt kernel: 0000000000012340 ffff8803fb3a0cc0 ffff8803eaeb7ba8 ffffffff8120f2c9
Aug 21 15:33:15 mgmt kernel: ffff8803eaeb7bb8 ffffffff8108759a ffff8803eaeb7bc8 ffffffff81089d0a
Aug 21 15:33:15 mgmt kernel: Call Trace:
Aug 21 15:33:15 mgmt kernel: [<ffffffff8120f2c9>] ? radix_tree_lookup+0xb/0xd
Aug 21 15:33:15 mgmt kernel: [<ffffffff8108759a>] ? irq_to_desc+0x12/0x14
Aug 21 15:33:15 mgmt kernel: [<ffffffff81089d0a>] ? irq_get_irq_data+0x9/0xb
Aug 21 15:33:15 mgmt kernel: [<ffffffff812935cd>] ? info_for_irq+0x9/0x18
Aug 21 15:33:15 mgmt kernel: [<ffffffff810087c8>] ? xen_clocksource_read+0x20/0x22
Aug 21 15:33:15 mgmt kernel: [<ffffffff810087d3>] ? xen_clocksource_get_cycles+0x9/0xb
Aug 21 15:33:56 mgmt kernel: [<ffffffff8106da7f>] ? ktime_get_ts+0x4f/0xb5
Aug 21 15:33:56 mgmt kernel: [<ffffffff810087c8>] ? xen_clocksource_read+0x20/0x22
Aug 21 15:33:56 mgmt kernel: [<ffffffff810087d3>] ? xen_clocksource_get_cycles+0x9/0xb
Aug 21 15:33:56 mgmt kernel: [<ffffffff81116694>] ? generic_block_bmap+0x40/0x40
Aug 21 15:33:56 mgmt kernel: [<ffffffff814a1458>] schedule+0x64/0x66
Aug 21 15:33:56 mgmt kernel: [<ffffffff814a1642>] io_schedule+0x8a/0xc8
Aug 21 15:33:56 mgmt kernel: [<ffffffff8111669d>] sleep_on_buffer+0x9/0xd
Aug 21 15:33:56 mgmt kernel: [<ffffffff8149fcd1>] __wait_on_bit+0x41/0x71
Aug 21 15:33:56 mgmt kernel: [<ffffffff8149fd77>] out_of_line_wait_on_bit+0x76/0x81
Aug 21 15:33:56 mgmt kernel: [<ffffffff81116694>] ? generic_block_bmap+0x40/0x40
Aug 21 15:33:56 mgmt kernel: [<ffffffff810596da>] ? autoremove_wake_function+0x2f/0x2f
Aug 21 15:33:56 mgmt kernel: [<ffffffff81116735>] __wait_on_buffer+0x21/0x23
Aug 21 15:33:56 mgmt kernel: [<ffffffff811a4109>] journal_commit_transaction+0x94a/0xf4d
Aug 21 15:33:56 mgmt kernel: [<ffffffff811a752a>] kjournald+0xd7/0x23f
Aug 21 15:33:56 mgmt kernel: [<ffffffff810596ab>] ? wake_up_bit+0x25/0x25
Aug 21 15:33:56 mgmt kernel: [<ffffffff811a7453>] ? commit_timeout+0xb/0xb
Aug 21 15:33:56 mgmt kernel: [<ffffffff81058c8a>] kthread+0xb5/0xbd
Aug 21 15:33:56 mgmt kernel: [<ffffffff81058bd5>] ? kthread_create_on_node+0x10e/0x10e
Aug 21 15:33:56 mgmt kernel: [<ffffffff814a8c98>] ret_from_fork+0x58/0x90
Aug 21 15:33:56 mgmt kernel: [<ffffffff81058bd5>] ? kthread_create_on_node+0x10e/0x10e

And then the firewall restarted : 
/var/log/messages: 

Aug 21 15:39:55 mgmt shutdown[31324]: shutting down for system reboot
Aug 21 15:39:55 mgmt init: Switching to runlevel: 6
Aug 21 15:40:38 mgmt mountd[1892]: Caught signal 15, un-registering and exiting.
Aug 21 15:40:42 mgmt kernel: nfsd: last server has exited, flushing export cache
Aug 21 15:40:42 mgmt xinetd[1858]: Exiting...
Aug 21 15:40:42 mgmt rpc.statd[1734]: Caught signal 15, un-registering and exiting.
Aug 21 15:40:42 mgmt kernel: Kernel logging (proc) stopped.


Environment


  • Palo Alto Networks VM-Series Firewall


Cause


The above log message in /var/log/messages implies the process is stuck waiting for I/O.
In Palo Alto Networks VM-Series Firewall, it is related to disk I/O performance. 


Resolution


  1. Check the host disk logs and check if it has any errors or warnings. If there is, please fix the disk environment situation.
  2. Change the host to a new different host, and check if it works.
  3. If the environment is a public cloud, please ask the disk environment situation to the cloud provider to check if it has any errors or failures.
  4. In some public cloud providers, just stop/start the VM instance will move the host it running and it may fix the problem.


Actions
  • Print
  • Copy Link

    https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000PNh6CAG&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail