How to Design and Size Panorama Log Collector Environments

How to Design and Size Panorama Log Collector Environments

83438
Created On 12/11/20 22:00 PM - Last Modified 02/14/24 10:55 AM


Objective


The Panorama solution consists of two overall functions:
 

  1. Configuration and Device Management: This includes activities such as configuration management and deployment, deployment of Palo Alto Networks Firewalls, software upgrade and content updates.
 
  1. Log Collection: This includes collecting logs from one or multiple firewalls, either to a single Panorama or to a distributed log collection infrastructure. In addition to collecting logs from deployed firewalls, reports can be generated based on that log data whether it resides locally to the Panorama (e.g. single M-series or VM appliance) for on a distributed logging infrastructure.


The Panorama solution allows for flexibility in design by assigning these functions to different physical pieces of the management infrastructure. For example: Device management may be performed from a VM Panorama, while the firewalls forward their logs to collocated dedicated log collectors:



 User-added image

In the example above, device management function and reporting are performed on a VM Panorama appliance. There are three log collector groups. Group A, contains two log collectors and receives logs from three standalone firewalls. Group B, consists of a single collector and receives logs from a pair of firewalls in an Active/Passive high availability (HA) configuration. Group C contains two log collectors as well, and receives logs from two HA pairs of firewalls. The number of log collectors in any given location is dependent on a number of factors. The design considerations are covered below. Note: any platform can be a Management-Only and also act as a logger including VM and M-Series.

 



Environment


  • Any Physical or Virtual Panorama that supports Log Collection feature.
  • PAN-OS 8.1 and above 


Procedure


Log Collection

Managed Devices

While all current Panorama platforms have an upper limit of 5000 devices for management purposes (5000 firewalls using a single or M-600 since PAN-OS 9.0), it is important for Panorama sizing to understand what the incoming log rate will be from all managed devices. To start with, take an inventory of the total firewall appliances that will be managed by Panorama.
 
Use the following table as an example to take an inventory of your devices that need to forward logs:

ModelPAN-OS (Major Branch)LocationMeasured Avg Log Rate Per Second
(LPS)
PA-70808.1.xUS9800 LPS 
PA-52608.1.xSingapore5200 LPS
PA-32508.1.xNL2660 LPS
PA-2208.1.xHK408 LPS
 

Refer to the following article about how to determine the log rate: How to Determine Log Rate on Panorama Devices with a Log Collector



Logging Requirements

This section will cover the information needed to properly size and deploy Panorama logging infrastructure to support customer requirements. There are three main factors when determining the amount of total storage required and how to allocate that storage via Distributed Log Collectors. These factors are:

  1. Log Ingestion Requirements: This is the total number of logs that will be sent per second to the Panorama infrastructure.
 
  1. Log Storage Requirements: This is the time frame for which the customer needs to retain logs on the management platform. There are different driving factors for this including both policy based and regulatory compliance motivators.
 
  1. Device Location: The physical location of the firewalls can drive the decision to place DLC appliances at remote locations based on WAN bandwidth etc.

Each of these factors are discussed in the sections below:

Log Ingestion Requirements
 
The aggregated log forwarding rate for managed devices needs to be understood in order to avoid a design where more logs are regularly being sent to Panorama than it can receive, process, and write to disk. The table below outlines the maximum number of logs per second that each hardware platform can forward to Panorama and can be used when designing a solution to calculate the maximum number of logs that can be forwarded to Panorama in the customer environment.

Device Log Forwarding Limits

PlatformSupported Logs Per Second (LPS)
PA-200250
PA-2201200
PA-500652
PA-800 Series10,000
PA-3000 Series10,000
PA-32207,000
PA-325015,000
PA-326024,000
PA-5050 / PA-506010,000
PA-522030,000
PA-525055,000
PA-526090,000
PA-7K Series70,000
VM-501,250
VM-100 / VM-2002,500
VM-300 / VM-500 / VM-1000-HV8,000
VM-70010,000


The log ingestion rate on Panorama is influenced by the platform and mode in use (mixed mode vs logger mode). The table below shows the ingestion rates for Panorama on the different available platforms and modes of operation.  The numbers in parenthesis next to VM denote the number of CPUs and Gigabytes of RAM assigned to the VM.


Panorama Supported Log Ingestion Rates

PlatformPanorama (Mixed) ModeLogger Mode
Panorama VM (16 CPU+32GB RAM)*10,000 LPS15,000 LPS
M-10010,000 LPS10,000 LPS
M-20010,000 LPS28,000 LPS
M-30016,500 LPS33,000 LPS
M-50020,000 LPS30,000 LPS
M-60025,000 LPS50,000 LPS
M-70036,500 LPS73,000 LPS


The above numbers are all maximum values. In live deployments, the actual log rate is generally some fraction of the supported maximum. Determining actual log rate is heavily dependent on the customer's traffic mix and isn't necessarily tied to throughput. For example, a single offloaded SMB session will show high throughput but only generate one traffic log. Conversely, you can have a smaller throughput consisting of thousands of UDP DNS queries that each generate a separate traffic log. For sizing, a rough correlation can be drawn between connections per second and logs per second.

*On Panorama VMs, additional capabilities can be achieved with more resource allocations. Please refer to Setup Prerequisites for the Panorama Virtual Appliance for more information.




Methods for Determining Log Rate

New Customers:

  • Leverage information from existing customer sources. Many customers have a third-party logging solution in place such as Splunk, ArcSight, Qradar, etc. The number of logs sent from their existing firewall solution can be pulled from those systems. When using this method, get a log count from the third-party solution for a full day and divide by 86,400 (number of seconds in a day). Do this for several days to get an average. Be sure to include both business and non-business days as there is usually a large variance in log rate between the two..
 
  • Use data from evaluation devices. This information can provide a very useful starting point for sizing purposes and, with input from the customer, data can be extrapolated for other sites in the same design.  This method has the advantage of yielding an average over several days. A script (with instructions) to assist with calculating this information can be found is attached to this document. To use, download the file named  "ts_lps.zip" from Design Log-Collector Documents.rar. Unpack the zip file and reference the README.txt for instructions.
 
  • If no information is available, use the Device Log Forwarding table above as a reference point. This will be the least accurate method for any particular customer.


Existing Customers:

For existing customers, we can leverage data gathered from their existing firewalls and log collectors:

  • To check the log rate of a single firewall, download file named "Device.zip" from Design Log-Collector Documents.rar, unpack the zip file and reference the README.txt file for instructions. This package will query a single firewall over a specified period of time (you can choose how many samples) and give an average number of logs per second for that period. At minimum this script should be run for 24 consecutive hours on a business day. Running the script for a full week will help capture the cyclical ebb and flow of the network. If the customer does not have a log collector, this process will need to be run against each firewall in the environment.
 
  • If the customer has a log collector (or log collectors), download file named "lc_lps.zip" from Design Log-Collector Documents.rar, unpack the zip file and reference the README.txt file for instructions This package will query the log collector MIB to take a sample of the incoming log rate over a specified period.




Log Storage Requirements
 

Factors Affecting Log Storage Requirements:

There are several factors that drive log storage requirements. Most of these requirements are regulatory in nature. Customers may need to meet compliance requirements for HIPAA, PCI, or Sarbanes-Oxley.


There are other governmental and industry standards that may need to be considered. Additionally, some companies have internal requirements. For example: that a certain number of days worth of logs be maintained on the original management platform. Ensure that all of these requirements are addressed with the customer when designing a log storage solution.
 
Focus is on the minimum number of days worth of logs that needs to be stored. If there is a maximum number of days required (due to regulation or policy), you can set the maximum number of days to keep logs in the quota configuration.
 

Calculating Required Storage:

Calculating required storage space based on a given customer's requirements is a fairly straightforward process but can be labor intensive when achieving higher degrees of accuracy. With PAN-OS 9.1, the average size across all log types is 489 Bytes*. This number accounts for total log size stored on the disk. 
 
* Average log size might vary depending on the traffic/logging mix and features enabled.

Note that we may not be the logging solution for long term archival.  In these cases suggest Syslog forwarding for archival purposes. 

The equation to determine the storage requirements for particular log type:

User-added image

Example: Customer wants to be able to keep 30 days worth of traffic logs with a log rate of 1500 logs per second:

User-added image

The result of the above calculation accounts for ElasticSearch detailed logs only. With default quota settings reserve 60% of the available storage for detailed logs. This means that the calculated number represents 60% of the storage used by ElasticSearch. To calculate the total storage required for ElasticSearch, divide this number by .60:
User-added image
One third (~33%) of the available disk space is allocated to logd formatted logs. The logd formatted logs are stored to support upgrade, downgrade and to support in fixing database corruption. To calculate the total storage that will need to be purchased, divide the storage required for ElasticSearch by .66:
User-added image
Default log quotas for Panorama 8.0 and later:

Log Type    % Storage
Detailed Firewall Logs    60
Summary Firewall Logs    30
Infrastructure and Audit Logs    5
Palo Alto Networks Platform Logs    .1
3rd Party External Logs    .1

Please refer to the below article learn more about space allocation on Panorama:

How Disk Space is Allocated on Log Collectors

In addition, the attached worksheet will take into account the default quota on Panorama and provide a total amount of storage required.



Log Availability

There are two methods for achieving redundancy for logs when using a log collector infrastructure (either dedicated or in mixed mode).
 

Log Redundancy:

PAN-OS 8.1 and later include an explicit option to write each log to 2 log collectors in the log collector group. By enabling this option, a device sends it's log to it's primary log collector, which then replicates the log to another collector in the same group:

User-added image


Log duplication ensures that there are two copies of any given log in the log collector group. This is a good option for customers who need to guarantee log availability at all times. Things to consider:

  1. The replication only takes place within a log collector group.
 
  1. The overall available storage space is halved (because each log is written twice).
 
  1. Overall Log ingestion rate will be reduced by up to 50%.
 
  1. Latency should be <10ms between the multiple LCs within the same collector group  to avoid an Inter-LC issue.

 
  
Collector Group Preference List:

The method is to place multiple log collectors into a group. In this scenario, the firewall can be configured with a Preference-List so if the primary log collector goes down, the second collector on the list will receive and store the logs.

The best practice for log forwarding to Log Collectors is to have a Log-Collector Preference List. 

For more information please refer to Caveats for a Collector Group with Multiple Log Collectors.

In the architecture shown below, Firewall A & Firewall B are configured to send their logs to Log Collector 1 primarily, with Log Collector 2 as a backup. If Log Collector 1 becomes unreachable, the devices will send their logs to Log Collector 2.

User-added image




Considerations for Log Collector Group design
 
There are three primary reasons for configuring log collectors in a group:
 

  1. Greater log retention is required for a specific firewall (or set of firewalls) than can be provided by a single log collector (to scale retention).
 
  1. Greater ingestion capacity is required for a specific firewall than can be provided by a single log collector (to scale ingestion).
 
  1. Requirement for log redundancy.

 

When considering the use of log collector groups there are a couple of considerations that need to be addressed at the design stage:
 

  1. Spread ingestion across the available collectors: Multiple device forwarding preference lists can be created. This allows ingestion to be handled by multiple collectors in the collector group. For example, preference list 1 will have half of the firewalls and list collector 1 as the primary and collector 2 as the secondary. Preference list 2 will have the remainder of the firewalls and list collector 2 as the primary and collector 1 as the secondary.
 
  1. Latency matters: Network latency between collectors in a log collector group is an important factor in performance. A general design guideline is to keep all collectors that are members of the same group close together. The following table provides an idea of what you can expect at different latency measurements with redundancy enabled and disabled. In this case, 'Log Delay' is the undesired result of high latency - logs don't show up in the UI until well after they are sent to Panorama.

NOTE: Latency should be <10ms between the multiple LCs within the same collector group  to avoid an Inter-LC issues like the one mentioned here  


 
Using The Sizing Worksheet:

The information that you will need includes desired retention period and average log rate.

User-added image

Retention Period: Number of days that logs need to be kept.

Average Log Rate: The measured or estimated aggregate log rate.

Redundancy Required: Check this box if the log redundancy is required.

Storage for Detailed Logs: The amount of storage (in Gigabytes) required to meet the retention period for detailed logs.

Total Storage Required: The storage (in Gigabytes) to be purchased. This accounts for all logs types at the default quota settings.



EXAMPLE USE CASES

User-added image

User-added image

User-added image

User-added image

 



Additional Information


  • If you have any additional questions or need help with design and deployment of your logging environment, please reach out to the account team. 
 
  • The mentioned documentations are zipped and attached to this article as Design Log-Collector Documents , available for download. 
   


Actions
  • Print
  • Copy Link

    https://knowledgebase.paloaltonetworks.com/kcSArticleDetail?id=kA14u000000HBw7&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FkcSArticleDetail

Choose Language