Panorama Sizing and Design Guide
Panorama Management and Logging Overview
The Panorama solution is comprised of two overall functions: Device Management and Log Collection/Reporting. A brief overview of these two main functions follow:
Device Management: This includes activities such as configuration management and deployment, deployment of PAN-OS and content updates.
Log Collection: This includes collecting logs from one or multiple firewalls, either to a single Panorama or to a distributed log collection infrastructure. In addition to collecting logs from deployed firewalls, reports can be generated based on that log data whether it resides locally to the Panorama (e.g single M-series or VM appliance) for on a distributed logging infrastructure.
The Panorama solution allows for flexibility in design by assigning these functions to different physical pieces of the management infrastructure. For example: Device management may be performed from a VM Panorama, while the firewalls forward their logs to colocated dedicated log collectors:
In the example above, device management function and reporting are performed on a VM Panorama appliance. There are three log collector groups. Group A, contains two log collectors and receives logs from three standalone firewalls. Group B, consists of a single collector and receives logs from a pair of firewalls in an Active/Passive high availability (HA) configuration. Group C contains two log collectors as well, and receives logs from two HA pairs of firewalls. The number of log collectors in any given location is dependent on a number of factors. The design considerations are covered below. Note: any platform can be a dedicated manager, but only M-Series can be a dedicated log collector.
While all current Panorama platforms have an upper limit of 1000 devices for management purposes, it is important for Panorama sizing to understand what the incoming log rate will be from all managed devices. To start with, take an inventory of the total firewall appliances that will be managed by Panorama.
Use the following spreadsheet to take an inventory of your devices that need to store logs:
|MODEL||PAN-OS (Major Branch #)||Location||Measured Average Log Rate|
|Ex: 5060||Ex: 6.1.0||Ex: Main Data Center||Ex. 2500 logs/s|
This section will cover the information needed to properly size and deploy Panorama logging infrastructure to support customer requirements. There are three main factors when determining the amount of total storage required and how to allocate that storage via Distributed Log Collectors. These factors are:
- Log Ingestion Requirements: This is the total number of logs that will be sent per second to the Panorama infrastructure.
- Log Storage Requirements: This is the timeframe for which the customer needs to retain logs on the management platform. There are different driving factors for this including both policy based and regulatory compliance motivators.
- Device Location: The physical location of the firewalls can drive the decision to place DLC appliances at remote locations based on WAN bandwidth etc.
Each of these factors are discussed in the sections below:
Log Ingestion Requirements
The aggregate log forwarding rate for managed devices needs to be understood in order to avoid a design where more logs are regularly being sent to Panorama than it can receive, process, and write to disk. The table below outlines the maximum number of logs per second that each hardware platform can forward to Panorama and can be used when designing a solution to calculate the maximum number of logs that can be forwarded to Panorama in the customer environment.
Device Log Forwarding
|Platform||Supported Logs per Second (LPS)|
|PA-5260||To Be Tested|
The log ingestion rate on Panorama is influenced by the platform and mode in use (mixed mode verses logger mode). The table below shows the ingestion rates for Panorama on the different available platforms and modes of operation. The numbers in parenthesis next to VM denote the number of CPUs and Gigabytes of RAM assigned to the VM.
Panorama Log Ingestion
The above numbers are all maximum values. In live deployments, the actual log rate is generally some fraction of the supported maximum. Determining actual log rate is heavily dependent on the customer's traffic mix and isn't necessarily tied to throughput. For example, a single offloaded SMB session will show high throughput but only generate one traffic log. Conversely, you can have a smaller throughput comprised of thousands of UDP DNS queries that each generate a separate traffic log. For sizing, a rough correlation can be drawn between connections per second and logs per second.
Methods for Determining Log Rate
- Leverage information from existing customer sources. Many customers have a third party logging solution in place such as Splunk, ArcSight, Qradar, etc. The number of logs sent from their existing firewall solution can pulled from those systems. When using this method, get a log count from the third party solution for a full day and divide by 86,400 (number of seconds in a day). Do this for several days to get an average. Be sure to include both business and non-business days as there is usually a large variance in log rate between the two.
- Use data from evaluation device. This information can provide a very useful starting point for sizing purposes and, with input from the customer, data can be extrapolated for other sites in the same design. This method has the advantage of yielding an average over several days. A script (with instructions) to assist with calculating this information can be found is attached to this document. To use, download the file named "ts_lps.zip". Unpack the zip file and reference the README.txt for instructions.
- If no information is available, use the Device Log Forwarding table above as reference point. This will be the least accurate method for any particular customer.
For existing customers, we can leverage data gathered from their existing firewalls and log collectors:
- To check the log rate of a single firewall, download the attached file named "Device.zip", unpack the zip file and reference the README.txt file for instructions. This package will query a single firewall over a specified period of time (you can choose how many samples) and give an average number of logs per second for that period. At minimum this script should be run for 24 consecutive hours on a business day. Running the script for a full week will help capture the cyclical ebb and flow of the network. If the customer does not have a log collector, this process will need to be run against each firewall in the environment.
- If the customer has a log collector (or log collectors), download the attached file named "lc_lps.zip", unpack the zip file and reference the README.txt file for instructions This package will query the log collector MIB to take a sample of the incoming log rate over a specified period.
Log Storage Requirements
Factors Affecting Log Storage Requirements
There are several factors that drive log storage requirements. Most of these requirements are regulatory in nature. Customers may need to meet compliance requirements for HIPAA, PCI, or Sarbanes-Oxely.
There are other governmental and industry standards that may need to be considered. Additionally, some companies have internal requirements. For example: that a certain number of days worth of logs be maintained on the original management platform. Ensure that all of these requirements are addressed with the customer when designing a log storage solution.
Focus is on the minumum number of days worth of logs that needs to be stored. If there is a maximum number of days required (due to regulation or policy), you can set the maximum number of days to keep logs in the quota configuration.
Calculating Required Storage
Calculating required storage space based on a given customer's requirements is fairly straight forward process but can be labor intensive when achieving higher degrees of accuracy. With PAN-OS 8.0, the aggregated size of all log types is 500 Bytes. This number accounts for both the logs themselves as well as the associated indices. The Threat database is the data source for Threat logs as well as URL, Wildfire Submissions, and Data Filtering logs.
Note that we may not be the logging solution for long term archival. In these cases suggest Syslog forwarding for archival purposes.
The equation to determine the storage requirements for particular log type is:
Example: Customer wants to be able to keep 30 days worth of traffic logs with a log rate of 1500 logs per second:
The result of the above calculation accounts for detailed logs only. With default quota settings reserve 60% of the available storage for detailed logs. This means that the calculated number represents 60% of the total storage that will need to be purchased. To calculate the total storage required, devide this number by .60:
Default log quotas for Panorama 8.0 and later are as follows:
|Log Type||% Storage|
|Detailed Firewall Logs||
|Summary Firewall Logs||30|
|Infrastructure and Audit Logs||5|
|Palo Alto Networks Platform Logs||.1|
|3rd Party External Logs||.1|
The attached worksheet will take into account the default quota on Panorama and provide a total amount of storage required.
Calculating Required Storage For Logging Service
There are three different cases for sizing log collection using the Logging Service. For in depth sizing guidance, refer to Sizing Storage For The Logging Service.
- Log collection for Palo Alto Networks Next Generation Firewalls
- Log collection for GlobalProtect Cloud Service Mobile User
- Log collection for GlobalProtect Cloud Service Remote Office
Log Collection for Palo Alto Next Generation Firewalls
The log sizing methodology for firewalls logging to the Logging Service is the same when sizing for on premise log collectors. The only difference is the size of the log on disk. In the Logging Service, both threat and traffic logs can be calculated using a size of 1500 bytes.
Log Collection for GlobalProtect Cloud Service Mobile User
Per user log generation depends heavily on both the type of user as well as the workloads being executed in that environment. On average, 1TB of storage on the Logging Service will provide 30 days retention for 5000 users. An advantage of the logging service is that adding storage is much simpler to do than in a traditional on premise distributed collection environment. This means that if your environment is significantly busier than the average, it is a simple matter to add whatever storage is necessary to meet your retention requirements.
Log Collection for GlobalProtect Cloud Service Remote Office
GlobalProtect Cloud Service (GPCS) for remote offices is sold based on bandwidth. While log rate is largely driven by connection rate and traffic mix, in sample enterprise environments log generation occurs at a rate of approximately 1.5 logs per second per megabit of throughput. The attached sizing work sheet uses this rate and takes into account busy/off hours in order to provide an estimated average log rate.
LogDB Storage Quotas
Storage quotas were simplified starting in PAN-OS version 8.0. Detail and summary logs each have their own quota, regardless of type (traffic/threat):
|Detailed Firewall Logs||60|
|Summary Firewall Logs||30|
|Infrastructure and Audit Logs||5|
|Palo Alto Networks Platform Logs||.1|
|3rd Party External Logs||.1|
The last design consideration for logging infrastructure is location of the firewalls relative to the Panorama platform they are logging to. If the device is separated from Panorama by a low speed network segment (e.g. T1/E1), it is recommended to place a Dedicated Log Collector (DLC) on site with the firewall. This allows log forwarding to be confined to the higher speed LAN segment while allowing Panorama to query the log collector when needed. For reference, the following tables shows bandwidth usage for log forwarding at different log rates. This includes both logs sent to Panorama and the acknowledgement from Panorama to the firewall. Note that for both the 7000 series and 5200 series, logs are compressed during transmission.
Log Forwarding Bandwidth
|Log Rate (LPS)||Bandwidth Used|
|16000||52.8 - 140.8 Mbps (96.8)|
Log Forwarding Bandwidth - 7000 and 5200 Series
|Log Rate (LPS)||Bandwidth Used|
|16000||5 - 10 Mbps|
There are several factors to consider when choosing a platform for a Panorama deployment. Initial factors include:
- Number of concurrent administrators need to be supported?
- Does the Customer have VMWare virtualization infrastructure that the security team has access to?
- Does the customer require dual power supplies?
- What is the estimated configuration size?
- Will the device handle log collection as well?
Panorama Virtual Appliance
This platform operates as a virtual M-100 and shares the same log ingestion rate. Adding additional resources will allow the virtual Panorama appliance to scale both it's ingestion rate as well as management capabilities. The minimum requirements for a Panorama virtual appliance running 8.0 is 8 vCPUs and 16GB vRAM.
When to choose Virtual Appliance?
- The customer has large VMWare Infrastructure that the security has access to
- Customer is using dedicated log collectors and are not in mixed mode
When not to choose Virtual Appliance?
- Server team and Security team are separate and do not want to share
- Customer has no virtual infrastructure
M-100 Hardware Platform
This platform has dedicated hardware and can handle up to concurrent 15 administrators. When in mixed mode, is capable of ingesting 10,000 - 15,000 logs per second.
When to choose M-100?
- The customer needs a dedicated platform, but is very price sensitive
- Customer is using dedicated log collectors and are not in mixed mode but do not have VM infrastructure
When not to choose M-100?
- If dual power supplies are required
- Mixed mode with more than 10k log/s or more than 8TB required for log retention
- Has more than 15 concurrent admins
M-500 Hardware Platform
This platform has the highest log ingestion rate, even when in mixed mode. The higher resource availability will handle larger configurations and more concurrent administrators (15-30). Offers dual power supplies, and has a strong growth roadmap.
When to choose M-500?
- The customer needs a dedicated platform, and has a large or growing deployment
- Customer is using dual mode with more than 10k log/s
- Customer want to future proof their investments
- Customer needs a dedicated appliance but has more than 15 concurrent admins
- Requires dual power supplies
When not to choose M-500?
- If the customer has VM first environment and does not need more than 48 TB of log storage
- The customer is very price sensitive
This section will address design considerations when planning for a high availability deployment. Panorama high availability is Active/Passive only and both appliances need to be fully licensed. There are two aspects to high availability when deploying the Panorama solution. These aspects are Device Management and Logging. The two aspects are closely related, but each has specific design and configuration requirements.
Device Management HA: The ability to retain device management capabilities upon the loss of a Panorama device (either an M-series or virtual appliance).
Logging HA or Log Redundancy: The ability to retain firewall logs upon the loss of a Panorama device (M-series only).
Device Management HA
When deploying the Panorama solution in a high availability design, many customers choose to place HA peers in separate physical locations. From a design perspective, there are two factors to consider when deploying a pair of Panorama appliances in a High Availability configuration. These concerns are network latency and throughput.
The latency of intervening network segments affects the control traffic between the HA members. HA related timers can be adjusted to the need of the customer deployment. The maximum recommended value is 1000 ms.
- Preemption Hold Time: If the Preemptive option is enabled, the Preemption Hold Time is the amount of time the passive device will wait before taking the active role. In this case, both devices are up, and the timer applies to the device with the "Primary" priority.
- Promotion Hold Time: The promotion hold timer specifies the interval that the Secondary device will wait before assuming the active rote. In this case, there has been a failure of the primary device and this timer applies to the Secondary device.
- Hello Interval: This timer defines the number of milliseconds between Hello packets to the peer device. Hello packets are used to verify that the peer device is operational.
- Heartbeat Interval: This timer defines the number of milliseconds between ICMP messages sent to the peer. Heartbeat packets are used to verify that the peer device is reachable.
Relation between network latency and Heartbeat interval
Because the heartbeat is used to determine reachability of the HA peer, the Heartbeat interval should be set higher than the latency of the link between the HA members.
HA Timer Presets
While customers can set their HA timers specifically to suit their environment, Panorama also has two sets of preconfigured timers that the customer can use. These presets cover a majority of customer deployments
|Preemption Hold TIme||1|
|Monitor Fail Hold Up Time||0|
|Additional Master Hold Up Time||7000|
|Preemption Hold TIme||500|
|Monitor Fail Hold Up Time||0|
|Additional Master Hold Up Time||5000|
HA Sync Process
The HA sync process occurs on Panorama when a change is made to the configuration on one of the members in the HA pair. When a change is made and committed on the Active-Primary, it will send a send a message to the Active-Secondary that the configuration needs to be synchronized. The Active-Secondary will send back an acknowledgement that it is ready. The Active-Primary will then send the configuration to the Active-Secondary. The Active-Secondary will merge the configuration sent by the Active-Primary and enqueue a job to commit the changes. This process must complete within three minutes of the HA-Sync message being sent from the Active-Primary Panorama. The main concern is size of the configuration being sent and the effective throughput of the network segment(s) that separate the HA members.
The other piece of the Panorama High Availability solution is providing availability of logs in the event of a hardware failure. There are two methods for achieving this when using a log collector infrastructure (either dedicated or in mixed mode).
PAN-OS 7.0 and later include an explicit option to write each log to 2 log collectors in the log collector group. By enabling this option, a device sends it's log to it's primary log collector, which then replicates the log to another collector in the same group:
Log duplication ensures that there are two copies of any given log in the log collector group. This is a good option for customers who need to guarantee log availability at all times. Things to consider:
1. The replication only takes place within a log collector group.
2. The overall available storage space is halved (because each log is written twice).
3. Overall Log ingestion rate will be reduced by up to 50%.
Firewalls require an acknowledgement from the Panorama platform that they are forwarding logs to. This means that in the event that the firewall's primary log collector becomes unavailable, the logs will be buffered and sent when the collector comes back online. There are two methods to buffer logs. The first method is to configure separate log collector groups for each log collector:
In this situation, if Log Collector 1 goes down, Firewall A & Firewall B will each store their logs on their own local log partition until the collector is brought back up. The local log partition for current firewall models are:
|Model||Log Partition Size (GB)|
The second method is to place multiple log collectors into a group. In this scenario, the firewall can be configured with a priority list so if the primary log collector goes down, the second collector on the list will buffer the logs until all of the collectors in the group know that the primary collector is down at which time, new logs will stop being assigned to the down collector.
In the architecture shown below, Firewall A & Firewall B are configured to send their logs to Log Collector 1 primarily, with Log Collector 2 as a backup. If Log Collector 1 becomes unreachable, the devices will send their logs to Log Collector 2. Collector 2 will buffer logs that are to be stored on Collector 1 until it can pull Collector 1 out of the rotation.
Considerations for Log Collector Group design
There are three primary reasons for configuring log collectors in a group:
- Greater log retention is required for a specific firewall (or set of firewalls) than can be provided by a single log collector (to scale retention).
- Greater ingestion capacity is required for a specific firewall than can be provided by a single log collector (to scale ingestion).
- Requirement for log redundancy.
When considering the use of log collector groups there are a couple of considerations that need to be addressed at the design stage:
- Spread ingestion accross the available collectors: Multiple device forwarding preference lists can be created. This allows ingestion to be handled by multiple collectors in the collector group. For example, preference list 1 will have half of the firewalls and list collector 1 as the primary and collector 2 as the secondary. Preference list 2 will have the remainder of the firewalls and list collector 2 as the primary and collector 1 as the secondary.
- Latency matters: Network latency between collectors in a log collector group is an important factor in performance. A general design guideline is to keep all collectors that are members of the same group close together. The following table provides an idea of what you can expect at different latancy measurements with redundancy enabled and disabled. In this case, 'Log Delay' is the undesired result of high latency - logs don't show up in the UI until well after they are sent to Panorama.
|Inter LC Latency (ms)||Log Rate||Redundancy enabled||Log Delay|
Using The Sizing Worksheet
The information that you will need includes desired retention period and average log rate.
Retention Period: Number of days that logs need to be kept.
Average Log Rate: The measured or estimated aggregate log rate.
Redundancy Required: Check this box if the log redundancy is required.
Storage for Detailed Logs: The amount of storage (in Gigabytes) required to meet the retention period for detailed logs.
Total Storage Required: The storage (in Gigabytes) to be purchased. This accounts for all logs types at the defualt quota settings.
Example Use Cases