Panorama Sizing and Design Guide - Knowledge Base - Palo Alto Networks

Panorama Sizing and Design Guide

455457

Created On 09/25/18 19:43 PM - Last Modified 05/24/24 19:00 PM

Collector Group

Log Collector

Log Forwarding

Logs

M-Series Appliance

Virtual Appliance

Device Management

Hardware

Reporting and Logging

8.1

8.0

7.1

Prisma Access

Cortex Data Lake

Panorama

Symptom

This article contains a brief overview of the Panorama solution, which is comprised of two overall functions: Device Management and Log Collection/Reporting.

Note: For Design and Sizing of Log collector, Newer version of article can be found at How to Design and Size Panorama Log Collector Environments.

Environment

Any Panorama Device
PAN-OS 7.1 and above.
Device Management and Log Collection

Resolution

Panorama Management and Logging Overview

The Panorama solution is comprised of two overall functions: Device Management and Log Collection/Reporting. A brief overview of these two main functions follow:

Device Management: This includes activities such as configuration management and deployment, deployment of PAN-OS and content updates.

Log Collection: This includes collecting logs from one or multiple firewalls, either to a single Panorama or to a distributed log collection infrastructure. In addition to collecting logs from deployed firewalls, reports can be generated based on that log data whether it resides locally to the Panorama (e.g single M-series or VM appliance) for on a distributed logging infrastructure.

The Panorama solution allows for flexibility in design by assigning these functions to different physical pieces of the management infrastructure. For example: Device management may be performed from a VM Panorama, while the firewalls forward their logs to colocated dedicated log collectors:

In the example above, device management function and reporting are performed on a VM Panorama appliance. There are three log collector groups. Group A, contains two log collectors and receives logs from three standalone firewalls. Group B, consists of a single collector and receives logs from a pair of firewalls in an Active/Passive high availability (HA) configuration. Group C contains two log collectors as well, and receives logs from two HA pairs of firewalls. The number of log collectors in any given location is dependent on a number of factors. The design considerations are covered below.

Note:
As of PANOS 8.1, not only can any platform can be configured as a dedicated manager, but also a dedicated log collector. Please reference the following techdoc Admin Guide Setup The Panorama Virtual Appliance as a Log Collector for further details.

Log Collection

Managed Devices

While all current Panorama platforms have an upper limit of 1000 devices for management purposes (5000 firewalls using a single or M-600 since PAN-OS 9.0), it is important for Panorama sizing to understand what the incoming log rate will be from all managed devices. To start with, take an inventory of the total firewall appliances that will be managed by Panorama.

Use the following spreadsheet to take an inventory of your devices that need to store logs:

MODEL	PAN-OS (Major Branch #)	Location	Measured Average Log Rate
Ex: 5060	Ex: 6.1.0	Ex: Main Data Center	Ex. 2500 logs/s
Ex: 3050	Ex: 8.x	Ex: HK	Ex: 98 logs/s
Ex: 3060	Ex: 8.x	Ex: Sing	Ex: 93 logs/s
Ex: 3250	Ex: 8.x	Ex: NL	Ex: 266 logs/s
Ex: 3260	Ex: 8.x	Ex: US	Ex: 408 log/s

Read the following article on how to determine the lograte for yourself:
How to Determine Log Rate on VM Panorama or M-100 with a Log-Collector

Logging Requirements

This section will cover the information needed to properly size and deploy Panorama logging infrastructure to support customer requirements. There are three main factors when determining the amount of total storage required and how to allocate that storage via Distributed Log Collectors. These factors are:

Log Ingestion Requirements: This is the total number of logs that will be sent per second to the Panorama infrastructure.
Log Storage Requirements: This is the timeframe for which the customer needs to retain logs on the management platform. There are different driving factors for this including both policy based and regulatory compliance motivators.
Device Location: The physical location of the firewalls can drive the decision to place DLC appliances at remote locations based on WAN bandwidth etc.

Each of these factors are discussed in the sections below:

Log Ingestion Requirements

The aggregate log forwarding rate for managed devices needs to be understood in order to avoid a design where more logs are regularly being sent to Panorama than it can receive, process, and write to disk. The table below outlines the maximum number of logs per second that each hardware platform can forward to Panorama and can be used when designing a solution to calculate the maximum number of logs that can be forwarded to Panorama in the customer environment.

Device Log Forwarding
The log ingestion rate on Panorama is influenced by the platform and mode in use (mixed mode verses logger mode). The table below shows the ingestion rates for Panorama on the different available platforms and modes of operation. The numbers in parenthesis next to VM denote the number of CPUs and Gigabytes of RAM assigned to the VM.

Platform	Supported Logs per Second (LPS)
PA-200	250
PA-220	1,200
PA-500	625
PA-820/850	10,000
PA-3000 series	10,000
PA-3220	7,000
PA-3250	15,000
PA-3260	24,000
PA-5050/60	10,000
PA-5220	30,000
PA-5250	55,000
PA-5260	To Be Tested
PA-7050/7080	70,000
VM-50	1,250
VM-100/200	2,500
VM-300/1000-HV	8,000
VM-500	8,000
VM-700	10,00

Panorama Log Ingestion

The above numbers are all maximum values. In live deployments, the actual log rate is generally some fraction of the supported maximum. Determining actual log rate is heavily dependent on the customer's traffic mix and isn't necessarily tied to throughput. For example, a single offloaded SMB session will show high throughput but only generate one traffic log. Conversely, you can have a smaller throughput comprised of thousands of UDP DNS queries that each generate a separate traffic log. For sizing, a rough correlation can be drawn between connections per second and logs per second.

Platform	Mixed	Dedicated
VM (8/16)	10,000	18,000
M-200	10,000	28,000
M-300	16,500	33,000
M-500	15,000	30,000
M-600	25,000	50,000
M-700	36,500	77,000

Methods for Determining Log Rate

New Customer:

Leverage information from existing customer sources. Many customers have a third party logging solution in place such as Splunk, ArcSight, Qradar, etc. The number of logs sent from their existing firewall solution can pulled from those systems. When using this method, get a log count from the third party solution for a full day and divide by 86,400 (number of seconds in a day). Do this for several days to get an average. Be sure to include both business and non-business days as there is usually a large variance in log rate between the two.
Use data from evaluation device. This information can provide a very useful starting point for sizing purposes and, with input from the customer, data can be extrapolated for other sites in the same design. This method has the advantage of yielding an average over several days. A script (with instructions) to assist with calculating this information can be found is attached to this document. To use, download the file named "ts_lps.zip". Unpack the zip file and reference the README.txt for instructions.
If no information is available, use the Device Log Forwarding table above as reference point. This will be the least accurate method for any particular customer.

Existing Customer:

~~For existing customers, we can leverage data gathered from their existing firewalls and log collectors:~~

- To check the log rate of a single firewall, download the attached file named "Device.zip", unpack the zip file and reference the README.txt file for instructions. This package will query a single firewall over a specified period of time (you can choose how many samples) and give an average number of logs per second for that period. At minimum this script should be run for 24 consecutive hours on a business day. Running the script for a full week will help capture the cyclical ebb and flow of the network. If the customer does not have a log collector, this process will need to be run against each firewall in the environment.
If the customer has a log collector (or log collectors), download the attached file named "lc_lps.zip", unpack the zip file and reference the README.txt file for instructions This package will query the log collector MIB to take a sample of the incoming log rate over a specified period.

Log Storage Requirements

Factors Affecting Log Storage Requirements

There are several factors that drive log storage requirements. Most of these requirements are regulatory in nature. Customers may need to meet compliance requirements for HIPAA, PCI, or Sarbanes-Oxely.

There are other governmental and industry standards that may need to be considered. Additionally, some companies have internal requirements. For example: that a certain number of days worth of logs be maintained on the original management platform. Ensure that all of these requirements are addressed with the customer when designing a log storage solution.

Focus is on the minimum number of days worth of logs that needs to be stored. If there is a maximum number of days required (due to regulation or policy), you can set the maximum number of days to keep logs in the quota configuration.

Calculating Required Storage

Calculating required storage space based on a given customer's requirements is fairly straight forward process but can be labor intensive when achieving higher degrees of accuracy. With PAN-OS 8.0, the aggregated size of all log types is 500 Bytes. This number accounts for both the logs themselves as well as the associated indices. The Threat database is the data source for Threat logs as well as URL, Wildfire Submissions, and Data Filtering logs.

Note that we may not be the logging solution for long term archival. In these cases suggest Syslog forwarding for archival purposes.

The equation to determine the storage requirements for particular log type is:

Storage Requirement Calculation

Example: Customer wants to be able to keep 30 days worth of traffic logs with a log rate of 1500 logs per second:

Retention Calc Example.png

The result of the above calculation accounts for detailed logs only. With default quota settings reserve 60% of the available storage for detailed logs. This means that the calculated number represents 60% of the total storage that will need to be purchased. To calculate the total storage required, devide this number by .60:

Total Storage Example.png

Default log quotas for Panorama 8.0 and later are as follows:

Log Type	% Storage
Detailed Firewall Logs	60
Summary Firewall Logs	30
Infrastructure and Audit Logs	5
Palo Alto Networks Platform Logs	.1
3rd Party External Logs	.1

The attached worksheet will take into account the default quota on Panorama and provide a total amount of storage required.

Calculating Required Storage For Logging Service

There are three different cases for sizing log collection using the Logging Service. For in depth sizing guidance, refer to Sizing Storage For The Logging Service.

Log collection for Palo Alto Networks Next Generation Firewalls
Log collection for GlobalProtect Cloud Service Mobile User
Log collection for GlobalProtect Cloud Service Remote Office

Log Collection for Palo Alto Next Generation Firewalls

The log sizing methodology for firewalls logging to the Logging Service is the same when sizing for on premise log collectors. The only difference is the size of the log on disk. In the Logging Service, both threat and traffic logs can be calculated using a size of 1500 bytes.

Log Collection for GlobalProtect Cloud Service Mobile User

Per user log generation depends heavily on both the type of user as well as the workloads being executed in that environment. On average, 1TB of storage on the Logging Service will provide 30 days retention for 5000 users. An advantage of the logging service is that adding storage is much simpler to do than in a traditional on premise distributed collection environment. This means that if your environment is significantly busier than the average, it is a simple matter to add whatever storage is necessary to meet your retention requirements.

Log Collection for GlobalProtect Cloud Service Remote Office

GlobalProtect Cloud Service (GPCS) for remote offices is sold based on bandwidth. While log rate is largely driven by connection rate and traffic mix, in sample enterprise environments log generation occurs at a rate of approximately 1.5 logs per second per megabit of throughput. The attached sizing work sheet uses this rate and takes into account busy/off hours in order to provide an estimated average log rate.

LogDB Storage Quotas

Storage quotas were simplified starting in PAN-OS version 8.0. Detail and summary logs each have their own quota, regardless of type (traffic/threat):

Log Type	Quota (%)
Detailed Firewall Logs	60
Summary Firewall Logs	30
Infrastructure and Audit Logs	5
Palo Alto Networks Platform Logs	.1
3rd Party External Logs	.1
Total	95.2

Device Location

The last design consideration for logging infrastructure is location of the firewalls relative to the Panorama platform they are logging to. If the device is separated from Panorama by a low speed network segment (e.g. T1/E1), it is recommended to place a Dedicated Log Collector (DLC) on site with the firewall. This allows log forwarding to be confined to the higher speed LAN segment while allowing Panorama to query the log collector when needed. For reference, the following tables shows bandwidth usage for log forwarding at different log rates. This includes both logs sent to Panorama and the acknowledgement from Panorama to the firewall. Note that for both the 7000 series and 5200 series, logs are compressed during transmission.

Log Forwarding Bandwidth

Log Rate (LPS)	Bandwidth Used
1300	8 Mbps
8000	56 Mbps
10000	64 Mbps
16000	52.8 - 140.8 Mbps (96.8)

Log Forwarding Bandwidth - 7000 and 5200 Series

Log Rate (LPS)	Bandwidth Used
1300	.6 Mbps
8000	4 Mbps
10000	4.5 Mbps
16000	5 - 10 Mbps

Device Management

There are several factors to consider when choosing a platform for a Panorama deployment. Initial factors include:

Number of concurrent administrators need to be supported?
Does the Customer have VMWare virtualization infrastructure that the security team has access to?
Does the customer require dual power supplies?
What is the estimated configuration size?
Will the device handle log collection as well?

Panorama Virtual Appliance

This platform operates as a virtual M-100 and shares the same log ingestion rate. Adding additional resources will allow the virtual Panorama appliance to scale both it's ingestion rate as well as management capabilities. The minimum requirements for a Panorama virtual appliance running 8.1, 9.0 and 9.1 is 16 vCPUs and 32GB vRAM.

When to choose Virtual Appliance?

The customer has large VMWare Infrastructure that the security has access to
Customer is using dedicated log collectors and are not in mixed mode

When not to choose Virtual Appliance?

Server team and Security team are separate and do not want to share
Customer has no virtual infrastructure

M-100 Hardware Platform

This platform has dedicated hardware and can handle up to concurrent 15 administrators. When in mixed mode, is capable of ingesting 10,000 - 15,000 logs per second.

When to choose M-100?

The customer needs a dedicated platform, but is very price sensitive
Customer is using dedicated log collectors and are not in mixed mode but do not have VM infrastructure

When not to choose M-100?

If dual power supplies are required
Mixed mode with more than 10k log/s or more than 8TB required for log retention
Has more than 15 concurrent admins

M-500 Hardware Platform

This platform has the highest log ingestion rate, even when in mixed mode. The higher resource availability will handle larger configurations and more concurrent administrators (15-30). Offers dual power supplies, and has a strong growth roadmap.

When to choose M-500?

The customer needs a dedicated platform, and has a large or growing deployment
Customer is using dual mode with more than 10k log/s
Customer want to future proof their investments
Customer needs a dedicated appliance but has more than 15 concurrent admins
Requires dual power supplies

When not to choose M-500?

If the customer has VM first environment and does not need more than 48 TB of log storage
The customer is very price sensitive

High Availability

This section will address design considerations when planning for a high availability deployment. Panorama high availability is Active/Passive only and both appliances need to be fully licensed. There are two aspects to high availability when deploying the Panorama solution. These aspects are Device Management and Logging. The two aspects are closely related, but each has specific design and configuration requirements.

Device Management HA: The ability to retain device management capabilities upon the loss of a Panorama device (either an M-series or virtual appliance).

Logging HA or Log Redundancy: The ability to retain firewall logs upon the loss of a Panorama device (M-series only).

Device Management HA

When deploying the Panorama solution in a high availability design, many customers choose to place HA peers in separate physical locations. From a design perspective, there are two factors to consider when deploying a pair of Panorama appliances in a High Availability configuration. These concerns are network latency and throughput.

Network Latency

The latency of intervening network segments affects the control traffic between the HA members. HA related timers can be adjusted to the need of the customer deployment. The maximum recommended value is 1000 ms.

Preemption Hold Time: If the Preemptive option is enabled, the Preemption Hold Time is the amount of time the passive device will wait before taking the active role. In this case, both devices are up, and the timer applies to the device with the "Primary" priority.
Promotion Hold Time: The promotion hold timer specifies the interval that the Secondary device will wait before assuming the active rote. In this case, there has been a failure of the primary device and this timer applies to the Secondary device.
Hello Interval: This timer defines the number of milliseconds between Hello packets to the peer device. Hello packets are used to verify that the peer device is operational.
Heartbeat Interval: This timer defines the number of milliseconds between ICMP messages sent to the peer. Heartbeat packets are used to verify that the peer device is reachable.

Relation between network latency and Heartbeat interval

Because the heartbeat is used to determine reachability of the HA peer, the Heartbeat interval should be set higher than the latency of the link between the HA members.

HA Timer Presets

While customers can set their HA timers specifically to suit their environment, Panorama also has two sets of preconfigured timers that the customer can use. These presets cover a majority of customer deployments

Recommended:

Timer	Setting
Preemption Hold TIme	1
Hello Interval	8000
Heartbeat Interval	2000
Monitor Fail Hold Up Time	0
Additional Master Hold Up Time	7000

Aggressive:

Timer	Setting
Preemption Hold TIme	500
Hello Interval	8000
Heartbeat Interval	1000
Monitor Fail Hold Up Time	0
Additional Master Hold Up Time	5000

Configuration Sync

HA Sync Process

HA Config Sync

The HA sync process occurs on Panorama when a change is made to the configuration on one of the members in the HA pair. When a change is made and committed on the Active-Primary, it will send a send a message to the Active-Secondary that the configuration needs to be synchronized. The Active-Secondary will send back an acknowledgement that it is ready. The Active-Primary will then send the configuration to the Active-Secondary. The Active-Secondary will merge the configuration sent by the Active-Primary and enqueue a job to commit the changes. This process must complete within three minutes of the HA-Sync message being sent from the Active-Primary Panorama. The main concern is size of the configuration being sent and the effective throughput of the network segment(s) that separate the HA members.

Log Availability

The other piece of the Panorama High Availability solution is providing availability of logs in the event of a hardware failure. There are two methods for achieving this when using a log collector infrastructure (either dedicated or in mixed mode).

Log Redundancy

PAN-OS 7.0 and later include an explicit option to write each log to 2 log collectors in the log collector group. By enabling this option, a device sends it's log to it's primary log collector, which then replicates the log to another collector in the same group:

Log Redundancy

Log duplication ensures that there are two copies of any given log in the log collector group. This is a good option for customers who need to guarantee log availability at all times. Things to consider:

1. The replication only takes place within a log collector group.

2. The overall available storage space is halved (because each log is written twice).

3. Overall Log ingestion rate will be reduced by up to 50%.

Log Buffering

Firewalls require an acknowledgement from the Panorama platform that they are forwarding logs to. This means that in the event that the firewall's primary log collector becomes unavailable, the logs will be buffered and sent when the collector comes back online. There are two methods to buffer logs. The first method is to configure separate log collector groups for each log collector:

Log Buffering

In this situation, if Log Collector 1 goes down, Firewall A & Firewall B will each store their logs on their own local log partition until the collector is brought back up. The local log partition for current firewall models are:

Model	Log Partition Size (GB)
PA-200	2.4
PA-220	32
PA-800 Series	172
PA-3000 Series	90
PA-3200 Series	125
PA-5000 Series	88
PA-5200 Series	1800

The second method is to place multiple log collectors into a group. In this scenario, the firewall can be configured with a priority list so if the primary log collector goes down, the second collector on the list will buffer the logs until all of the collectors in the group know that the primary collector is down at which time, new logs will stop being assigned to the down collector.

In the architecture shown below, Firewall A & Firewall B are configured to send their logs to Log Collector 1 primarily, with Log Collector 2 as a backup. If Log Collector 1 becomes unreachable, the devices will send their logs to Log Collector 2. Collector 2 will buffer logs that are to be stored on Collector 1 until it can pull Collector 1 out of the rotation.

Collector Group - No Log Redundancy

Considerations for Log Collector Group design

There are three primary reasons for configuring log collectors in a group:

Greater log retention is required for a specific firewall (or set of firewalls) than can be provided by a single log collector (to scale retention).
Greater ingestion capacity is required for a specific firewall than can be provided by a single log collector (to scale ingestion).
Requirement for log redundancy.

When considering the use of log collector groups there are a couple of considerations that need to be addressed at the design stage:

Spread ingestion across the available collectors: Multiple device forwarding preference lists can be created. This allows ingestion to be handled by multiple collectors in the collector group. For example, preference list 1 will have half of the firewalls and list collector 1 as the primary and collector 2 as the secondary. Preference list 2 will have the remainder of the firewalls and list collector 2 as the primary and collector 1 as the secondary.
Latency matters: Network latency between collectors in a log collector group is an important factor in performance. A general design guideline is to keep all collectors that are members of the same group close together. The following table provides an idea of what you can expect at different latency measurements with redundancy enabled and disabled. In this case, 'Log Delay' is the undesired result of high latency - logs don't show up in the UI until well after they are sent to Panorama.

~~Inter LC Latency (ms)~~	~~Log Rate~~	~~Redundancy enabled~~	~~Log Delay~~
50	~~10K~~	No	No
~~100~~	5K	No	No
~~100~~	~~10K~~	No	~~Yes~~
50	5K	~~Yes~~	No
50	~~10K~~	~~Yes~~	~~Yes~~
~~100~~	5K	~~Yes~~	No
~~150~~	3K	~~Yes~~	No
~~150~~	5K	~~Yes~~	~~Yes~~