Palo Alto Networks Knowledgebase: Procedure to Replace failed M-500 in Hybrid Panorama - Log Collector

Procedure to Replace failed M-500 in Hybrid Panorama - Log Collector

6806
Created On 07/18/19 03:01 AM - Last Updated 07/18/19 03:03 AM
Cortex Data Lake Panorama
Resolution

Procedure applies for PAN-OS versions

  • 8.0 and below


Scenario

  • Standalone M-500 Panorama in Hybrid mode (Panorama device management and local Log Collector configured)  faced a hardware issue that requires chassis replacement.
  • The M-500 uses 8 disk pairs for storing the logs received from its managed devices.


Naming convention

  • Faulty M-500 device to be replaced will be called  "Old-M-500".
  • Newly received replacement device will be called "New-M-500".

Note: These names will be referenced in the procedure for easier understanding of operations


Requirements

  • In order to replace the faulty chassis Old-M-500, the original configuration needs to be exported and then imported into the New-M-500. Configuration can be exported by using the following documentation:
  • The Old-M-500 has 8 disk pairs that will be moved to the New-M-500.


Procedure details

  1. Power down the failed M-500 platform - Old-M-500. Refer to the document Shutdown Panorama Link
  2. Configure the New-M-500
  • Place the New-M-500 in Panorama mode.
  • Install the same PAN-OS version to match the Old-M-500 device.
  • Import the Old-M-500 configuration file into the New-M-500.
  • Load the named imported configuration on the New-M-500.
  • Modify the Hostname from Old-M-500 to New-M-500.
  • Commit the configuration to Panorama.
  1. Take the Primary disks from Old-M-500 ( A1, B1, C1, D1, E1, F1, G1, H1) and move them to the same Primary positions in New-M-500 (A1, B1, C1, D1, E1, F1, G1, H1). Check M-500 Hardware Guide for correct identification of disks.
The picture below shows the physical positioning of the drives inside the M-500 devices.

Screen Shot 2017-03-20 at 20.13.15.png

On New-M-500, add the Primary Log disks to RAID using CLI commands. 

Using the "force" and "no-format" option:

  • The "force" option associates the disk pair that is previously associated with another Log Collector.
  • The “no-format” option keeps the logs by not formatting the disk storage.

In this step, we are going to add the Primary log disks only. Secondary Log Disks will be added towards the end of the procedure. This is done as the Secondary log disks are used as data backup and are not needed until the migration of the logs has been successfully completed.

In the example above, there are 8 Active RAID pairs (A, B, C, D, E, F, G, H). The list of commands to attach the 8 primary disks are:

request system raid add A1 force no-format
request system raid add B1 force no-format
request system raid add C1 force no-format
request system raid add D1 force no-format
request system raid add E1 force no-format
request system raid add F1 force no-format
request system raid add G1 force no-format
request system raid add H1 force no-format

 
  1. Check the status of the disk being added by verifying the RAID status using the CLI command show system raid detail

Example: Output for 8 primary disks inserted after the adding operation has been successfully completed:

admin@New-M-500> show system raid detail

Disk Pair A                           Available
   Status                       clean, degraded
   Disk id A1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id A2                           Missing

Disk Pair B                           Available
   Status                       clean, degraded
   Disk id B1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id B2                           Missing
....

Disk Pair G                           Available
   Status                       clean, degraded
   Disk id G1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id G2                           Missing

Disk Pair H                           Available
   Status                       clean, degraded
   Disk id H1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id H2                           Missing


Ensure that all disks have been successfully added by checking with the CLI command tail lines 120 mp-log raid.log 
This commands shows the last 120 lines generated for the raid process. 
(Note: It is important to wait until ALL disks are finished being added as indicated by the raid.log)

Sample output:

Mar 20 00:01:37 DEBUG: raid_util: argv: ['GetArrayId', 'A1']
Mar 20 00:01:37 DEBUG: raid_util: argv: ['Add', 'A1', 'force', 'no-format', 'verify']
Mar 20 00:01:37 DEBUG: Verifying drive A1 to be added.
Mar 20 00:01:37 DEBUG: create_md 1, sdb
Mar 20 00:01:38 DEBUG: raid_util: argv: ['Add', 'A1', 'force', 'no-format']
Mar 20 00:01:38 INFO: Adding drive A1 (sdb)
Mar 20 00:01:38 DEBUG: create_md 1, sdb
Mar 20 00:01:38 DEBUG: create_md_paired_drive 1, sdb, no_format=True
Mar 20 00:01:38 DEBUG: Mounting Disk Pair A (/dev/md1)
Mar 20 00:01:38 DEBUG: set_drive_pairing_one 1
Mar 20 00:01:38 INFO: New Disk Pair A detected.
Mar 20 00:01:38 DEBUG: Created Disk Pair A (/dev/md1) from A1 (/dev/sdb1)
Mar 20 00:01:38 INFO: Done Adding drive A1
...
Mar 20 00:02:41 DEBUG: raid_util: argv: ['GetArrayId', 'H1']
Mar 20 00:02:41 DEBUG: raid_util: argv: ['Add', 'H1', 'force', 'no-format', 'verify']
Mar 20 00:02:41 DEBUG: Verifying drive H1 to be added.
Mar 20 00:02:41 DEBUG: create_md 8, sdp
Mar 20 00:02:41 DEBUG: raid_util: argv: ['Add', 'H1', 'force', 'no-format']
Mar 20 00:02:41 INFO: Adding drive H1 (sdp)
Mar 20 00:02:41 DEBUG: create_md 8, sdp
Mar 20 00:02:41 DEBUG: create_md_paired_drive 8, sdp, no_format=True
Mar 20 00:02:42 DEBUG: Mounting Disk Pair H (/dev/md8)
Mar 20 00:02:42 DEBUG: set_drive_pairing_one 8
Mar 20 00:02:42 INFO: New Disk Pair H detected.
Mar 20 00:02:42 DEBUG: Created Disk Pair H (/dev/md8) from H1 (/dev/sdp1)
Mar 20 00:02:42 INFO: Done Adding drive H1

 

  1. Next step is to regenerate the Log Disks' Metadata for each RAID disk slot.
request metadata-regenerate slot 1
request metadata-regenerate slot 2
request metadata-regenerate slot 3
request metadata-regenerate slot 4
request metadata-regenerate slot 5
request metadata-regenerate slot 6
request metadata-regenerate slot 7
request metadata-regenerate slot 8

(Note: Depending on the data size stored on the disks, these commands can take a long time to complete as it rebuilds all the log indexes.)

Sample output:

admin@New-M-500> request metadata-regenerate slot 1
Bringing down vld: vld-0-0
Process 'vld-0-0' executing STOP
Removing old metadata from /opt/pancfg/mgmt/vld/vld-0
Process 'vld-0-0' executing START
Done generating metadata for LD:1
....

admin@New-M-500> request metadata-regenerate slot 8
Bringing down vld: vld-7-0
Process 'vld-7-0' executing STOP
Removing old metadata from /opt/pancfg/mgmt/vld/vld-7
Process 'vld-7-0' executing START
Done generating metadata for LD:8


To check the status of the metadata regeneration, open a new CLI session and run the command, tail lines 100 follow yes mp-log vldmgr.log 
This commands shows the last 100 lines and then follows the log file vldmgr.log.

Sample output:

2017-03-19 23:38:42.836 -0700 sysd send 'stop LD:1 became unavailable' to 'vld-0-0' vldmgr:vldmgr
2017-03-19 23:38:43.185 -0700 Error:  _process_fd_event(pan_vld_mgr.c:2113): connection failed on fd:13 for cs:vld-0-0
2017-03-19 23:38:43.185 -0700 Sending to MS new status for slot 0, vldid 1280: not online
2017-03-19 23:38:43.185 -0700 setting LD refcount in var:runtime.ld-refcount.LD1 to 0. create:false
2017-03-19 23:38:46.186 -0700 vldmgr vldmgr diskinfo cb from sysd
....
2017-03-20 00:20:56.792 -0700 setting LD refcount in var:runtime.ld-refcount.LD7 to 2. create:false
2017-03-20 00:20:56.792 -0700 Sending to MS new status for slot 6, vldid 1286: online
2017-03-20 00:20:56.905 -0700 connection failed for err 111 with vld-7-0. Will start retry 3 in 2000
2017-03-20 00:20:58.907 -0700 connection failed for err 111 with vld-7-0. Will start retry 4 in 2000
2017-03-20 00:21:00.908 -0700 Connection to vld-7-0 established
2017-03-20 00:21:00.908 -0700 connect(2) succeeded on fd:20 for cs:vld-7-0
2017-03-20 00:21:00.908 -0700 setting LD refcount in var:runtime.ld-refcount.LD8 to 2. create:false
2017-03-20 00:21:00.908 -0700 Sending to MS new status for slot 7, vldid 1287: online
...
2017-03-19 23:38:42.836 -0700 sysd send 'stop LD:1 became unavailable' to 'vld-0-0' vldmgr:vldmgr
2017-03-19 23:38:43.185 -0700 Error:  _process_fd_event(pan_vld_mgr.c:2113): connection failed on fd:13 for cs:vld-0-0
2017-03-19 23:38:43.185 -0700 Sending to MS new status for slot 0, vldid 1280: not online
2017-03-19 23:38:43.185 -0700 setting LD refcount in var:runtime.ld-refcount.LD1 to 0. create:false
2017-03-19 23:38:46.186 -0700 vldmgr vldmgr diskinfo cb from sysd
....
2017-03-20 00:20:56.792 -0700 setting LD refcount in var:runtime.ld-refcount.LD7 to 2. create:false
2017-03-20 00:20:56.792 -0700 Sending to MS new status for slot 6, vldid 1286: online
2017-03-20 00:20:56.905 -0700 connection failed for err 111 with vld-7-0. Will start retry 3 in 2000
2017-03-20 00:20:58.907 -0700 connection failed for err 111 with vld-7-0. Will start retry 4 in 2000
2017-03-20 00:21:00.908 -0700 Connection to vld-7-0 established
2017-03-20 00:21:00.908 -0700 connect(2) succeeded on fd:20 for cs:vld-7-0
2017-03-20 00:21:00.908 -0700 setting LD refcount in var:runtime.ld-refcount.LD8 to 2. create:false
2017-03-20 00:21:00.908 -0700 Sending to MS new status for slot 7, vldid 1287: online

 

  1. On the New-M-500 add a new Local Collector.
  • Navigate to Panorama > Managed Collectors
  • Click Add
  • Under the General tab, enter the Collector serial number of the New-M-500 device 

Note: We will add the disks to the New-M-500 Log Collector in a later step.

User-added image

  1. Check the status of the new Log Collector using the CLI command show log-collector serial-number <serial-number-of-New-M-500> 
In the output of the command, check for the following:
  • Connected status displaying “yes”
  • Disk capacity displaying the correct size
  • Disk pair displaying as “Disabled” (This is expected behavior at this stage in the RMA process)

Sample output:

admin@New-M-500> show log-collector serial-number 007307000539 

Serial           CID      Hostname           Connected    Config Status    SW Version         IPv4 - IPv6                                                     
---------------------------------------------------------------------------------------------------------
007307000539     0        M-500_LAB          yes          Out of Sync      7.1.7              10.193.81.241 - unknown

Redistribution status:       none
Last commit-all: commit succeeded, >>>>>>>>current ring version 0<<<<<<<<
md5sum  updated at ?

Raid disks
DiskPair A: Disabled,  Status: Present/Available,  Capacity: 870 GB
DiskPair B: Disabled,  Status: Present/Available,  Capacity: 870 GB
DiskPair C: Disabled,  Status: Present/Available,  Capacity: 870 GB
DiskPair D: Disabled,  Status: Present/Available,  Capacity: 870 GB
DiskPair E: Disabled,  Status: Present/Available,  Capacity: 870 GB
DiskPair F: Disabled,  Status: Present/Available,  Capacity: 870 GB
DiskPair G: Disabled,  Status: Present/Available,  Capacity: 870 GB
DiskPair H: Disabled,  Status: Present/Available,  Capacity: 870 GB
 
  1. Add the disks to the New-M-500 Log collector.
  • On the GUI, navigate to Panorama > Managed Collectors
  • Select the name of the Log Collector (i.e. New-M-500)
  • Click on the tab "Disks"
  • Click Add and select all the disks that were moved to the New-M-500 device ( i.e. A, B, C, D, E, F, G, H)
add disks to new LC1.png
added disks to new LC.png

 

  1. Add New-M-500 to the existing Collector Group that the Old-M-500 was a part of.
  • On the GUI, navigate to Panorama > Collector Groups 
  • Select the name of the Collector Group (In the current example, the Old-M-500 log collector was part of the "default" Collector Group)
  • Click on the "Device Log Forwarding" tab
  • Click on Add and select the New-M-500 log collector 
Add new lc to LCG.png
  1. In the same tab as above (step 9), delete the failed Log Collector from the Collector Group.
Delete all references of the serial number of the failed Old-M-500.
clear old lc from lcg.png
 
  1. Issue a Panorama Commit only.
Screen Shot 2017-03-20 at 21.05.07.png
 
  1. Next, issue a Commit to Collector Group only.
Screen Shot 2017-03-20 at 21.05.30.png
 
  1.  Check that the old logs are visible under the Monitor > Logs > Traffic.
Check that logs are present.png  
 
  1.  Add the spare disks to RAID to rebuild the full RAID redundancy once the log migration has successfully completed.
  • Physically move disks from Old-M-500 A2, B2, C2, D2, E2, F2, G2, H2 to the New-M-500 A2, B2, C2, D2, E2, F2, G2, H2.
  • Check that the disks are available to be added to RAID using the CLI command show system raid detail:

The newly added disks will be in the state "Present" and status "Not in use".

admin@New-M-500> show system raid detail 

Sample Output: 
Disk Pair A                           Available
   Status                       clean, degraded
   Disk id A1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id A2                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : not in use
....

Disk Pair H                           Available
   Status                       clean, degraded
   Disk id H1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id H2                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : not in use

 
  1.  Add the secondary disks (A2, B2, C2, D2, E2, F2, G2, H2) to the RAID using the below commands
request system raid add A2 force
request system raid add B2 force 
request system raid add C2 force 
request system raid add D2 force 
request system raid add E2 force 
request system raid add F2 force 
request system raid add G2 force 
request system raid add H2 force

Note: Executing this command may delete all data on the drive being added. Do you want to continue? (y or n)
Press the key "y" to accept.


After running these commands, the RAID goes to "Spare Rebuild" operation. Please note that this may be a lengthy operation and it runs in the background until it ends. During this time logging to the Log Collector Group will be on hold. Once the rebuild operation completes, the log forwarding to the New-M-500 will resume.

To check the status of the rebuild operation, run the CLI command show system raid detail

Sample Output:

> show system raid detail 

Disk Pair A                           Available
   Status     clean, degraded, recovering (2% complete)
   Disk id A1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id A2                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : spare rebuilding
....

Disk Pair H                           Available
   Status     clean, degraded, recovering (0% complete)
   Disk id H1                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : active sync
   Disk id H2                           Present
       model        : ST91000640NS    
       size         : 953869 MB
       status       : spare rebuilding
 
  1. Once the Spare rebuild operation finishes, the New-M-500 is in a fully operational state and the RMA process is complete.


Attachments
Actions
  • Print
  • Copy Link

    https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA10g000000CljxCAC&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail

Attachments
Choose Language