Palo Alto Networks VM-Series Firewall suddenly restarts unexpectedly due to disk I/O performance
14530
Created On 12/02/19 08:56 AM - Last Modified 01/21/20 21:33 PM
Symptom
- VM-Series Firewall suddenly restarts unexpectedly. The restart may happen repeatedly.
- The system may have crashinfo and core files of crashed processes.
- /var/log/messages in tech-support file shows the following messages with trace logs.
/var/log/messages: Aug 21 15:33:15 mgmt kernel: INFO: task kjournald:654 blocked for more than 120 seconds. Aug 21 15:33:15 mgmt kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Aug 21 15:33:15 mgmt kernel: kjournald D 0000000000000000 0 654 2 0x00000000 Aug 21 15:33:15 mgmt kernel: ffff8803eaeb7c68 0000000000000046 ffff8803eaeb7fd8 ffff8803eaeb7fd8 Aug 21 15:33:15 mgmt kernel: 0000000000012340 ffff8803fb3a0cc0 ffff8803eaeb7ba8 ffffffff8120f2c9 Aug 21 15:33:15 mgmt kernel: ffff8803eaeb7bb8 ffffffff8108759a ffff8803eaeb7bc8 ffffffff81089d0a Aug 21 15:33:15 mgmt kernel: Call Trace: Aug 21 15:33:15 mgmt kernel: [<ffffffff8120f2c9>] ? radix_tree_lookup+0xb/0xd Aug 21 15:33:15 mgmt kernel: [<ffffffff8108759a>] ? irq_to_desc+0x12/0x14 Aug 21 15:33:15 mgmt kernel: [<ffffffff81089d0a>] ? irq_get_irq_data+0x9/0xb Aug 21 15:33:15 mgmt kernel: [<ffffffff812935cd>] ? info_for_irq+0x9/0x18 Aug 21 15:33:15 mgmt kernel: [<ffffffff810087c8>] ? xen_clocksource_read+0x20/0x22 Aug 21 15:33:15 mgmt kernel: [<ffffffff810087d3>] ? xen_clocksource_get_cycles+0x9/0xb Aug 21 15:33:56 mgmt kernel: [<ffffffff8106da7f>] ? ktime_get_ts+0x4f/0xb5 Aug 21 15:33:56 mgmt kernel: [<ffffffff810087c8>] ? xen_clocksource_read+0x20/0x22 Aug 21 15:33:56 mgmt kernel: [<ffffffff810087d3>] ? xen_clocksource_get_cycles+0x9/0xb Aug 21 15:33:56 mgmt kernel: [<ffffffff81116694>] ? generic_block_bmap+0x40/0x40 Aug 21 15:33:56 mgmt kernel: [<ffffffff814a1458>] schedule+0x64/0x66 Aug 21 15:33:56 mgmt kernel: [<ffffffff814a1642>] io_schedule+0x8a/0xc8 Aug 21 15:33:56 mgmt kernel: [<ffffffff8111669d>] sleep_on_buffer+0x9/0xd Aug 21 15:33:56 mgmt kernel: [<ffffffff8149fcd1>] __wait_on_bit+0x41/0x71 Aug 21 15:33:56 mgmt kernel: [<ffffffff8149fd77>] out_of_line_wait_on_bit+0x76/0x81 Aug 21 15:33:56 mgmt kernel: [<ffffffff81116694>] ? generic_block_bmap+0x40/0x40 Aug 21 15:33:56 mgmt kernel: [<ffffffff810596da>] ? autoremove_wake_function+0x2f/0x2f Aug 21 15:33:56 mgmt kernel: [<ffffffff81116735>] __wait_on_buffer+0x21/0x23 Aug 21 15:33:56 mgmt kernel: [<ffffffff811a4109>] journal_commit_transaction+0x94a/0xf4d Aug 21 15:33:56 mgmt kernel: [<ffffffff811a752a>] kjournald+0xd7/0x23f Aug 21 15:33:56 mgmt kernel: [<ffffffff810596ab>] ? wake_up_bit+0x25/0x25 Aug 21 15:33:56 mgmt kernel: [<ffffffff811a7453>] ? commit_timeout+0xb/0xb Aug 21 15:33:56 mgmt kernel: [<ffffffff81058c8a>] kthread+0xb5/0xbd Aug 21 15:33:56 mgmt kernel: [<ffffffff81058bd5>] ? kthread_create_on_node+0x10e/0x10e Aug 21 15:33:56 mgmt kernel: [<ffffffff814a8c98>] ret_from_fork+0x58/0x90 Aug 21 15:33:56 mgmt kernel: [<ffffffff81058bd5>] ? kthread_create_on_node+0x10e/0x10e
And then the firewall restarted :
/var/log/messages: Aug 21 15:39:55 mgmt shutdown[31324]: shutting down for system reboot Aug 21 15:39:55 mgmt init: Switching to runlevel: 6 Aug 21 15:40:38 mgmt mountd[1892]: Caught signal 15, un-registering and exiting. Aug 21 15:40:42 mgmt kernel: nfsd: last server has exited, flushing export cache Aug 21 15:40:42 mgmt xinetd[1858]: Exiting... Aug 21 15:40:42 mgmt rpc.statd[1734]: Caught signal 15, un-registering and exiting. Aug 21 15:40:42 mgmt kernel: Kernel logging (proc) stopped.
Environment
- Palo Alto Networks VM-Series Firewall
Cause
The above log message in /var/log/messages implies the process is stuck waiting for I/O.
In Palo Alto Networks VM-Series Firewall, it is related to disk I/O performance.
Resolution
- Check the host disk logs and check if it has any errors or warnings. If there is, please fix the disk environment situation.
- Change the host to a new different host, and check if it works.
- If the environment is a public cloud, please ask the disk environment situation to the cloud provider to check if it has any errors or failures.
- In some public cloud providers, just stop/start the VM instance will move the host it running and it may fix the problem.