High Availability Considerations on AWS and Azure
As customers begin using the VM-Series to protect their business critical applications and data in the public cloud, the question “Do you support high availability in AWS or Azure” arises. The original November 2016 post (below) did not answer the question clearly. The answer is yes, you can deploy an architecture with the VM-Series on AWS and Azure that delivers high availability and resiliency required for enterprise application deployments. However, the devil is in the implementation details.
VM-Series Active-Passive High Availability on AWS
On AWS, the VM-Series supports active-passive high availability using two VM-Series firewalls (active and passive) deployed within a single AWS Availability Zone. If a failure occurs, the AWS ENI that is linked to the active VM-Series firewall is moved to the passive VM-Series firewall. The ENI move is done via an API call to AWS that typically takes up to 60 seconds but sometimes longer. The delay is a byproduct of how the AWS fabric functions, and not controlled by the VM-Series. During that time, some sessions may be lost, yet state is maintained.
Using active passive in this manner does deliver high availability in the traditional definition. In addition to the failover lag time, this active passive HA cannot span multiple Availability Zones due to the AWS limitation of not allowing ENI moves to span AZs. Additionally, both VM-Series licenses are active as are the AWS resources required to keep them running, resulting in expense considerations.
VM-Series High Availability on AWS (Inbound using Auto Scaling & ELB Integration)
An alternative approach to deliver data center level high availability is to utilize the cloud fabric to build HA that can span multiple AZs into your deployment. The VM-Series Auto Scaling and ELB integration allows you to accomplish this end goal for inbound traffic.
Auto Scaling for the VM-Series on AWS deploys multiple firewalls across two Availability Zones within a VPC. If any one of the VM-Series firewalls fail, two things happen: First, the AWS Load Balancer detects the failure and diverts traffic to the remaining, healthy VM-Series firewalls – this typically happens in a few seconds depending on the health prove settings. Second, the AWS Auto Scaling Groups will automatically remove unhealthy firewalls and replace them with new, bootstrapped VM-Series firewalls that come up fully configured, licensed, and ready to handle traffic.
Depending on the balance of performance and cost sensitivity, the health checks and Auto Scaling Groups can be tuned to be very aggressive or very conservative about detecting and replacing failed components. This allows you to make your own cost/benefit decision when designing your deployment. The VM-Series not only automatically scales in and out, it also is self-healing providing an overall, highly available solution across multiple Availability Zones.
VM-Series on Azure Active/Passive High Availability
For customers that are moving data center applications to Azure, traditional active/passive high availability for the VM-Series on Azure is supported using PAN-OS 9.0. High availability is achieved using floating IP addresses combined with secondary IP addresses. When the active firewall goes down, the floating IP addresses move from the active to the passive firewall, so the passive firewall can seamlessly secure traffic as soon as it becomes the active peer.
Using active passive in this manner does deliver high availability in the traditional definition. It’s important to note that the movement of the interfaces and traffic redirection are all done on the Azure fabric and as such can take up to three minutes. An added consideration is the fact that both VM-Series licenses are active as are the Azure resources required to keep them running, resulting in increased costs.
VM-Series High Availability on Azure (Inbound using Application Gateway & Load Balancer Integration)
VM-Series high availability on Azure can be achieved using Azure Availability sets combined with Application Gateway and Load Balancer integration. Availability Sets address the need for high availability and resiliency by minimizing or eliminating the negative impact that Azure infrastructure maintenance or system faults may have on your business by distributing the workloads across different hosts. Deployed as a load balancer sandwich, the Application Gateway acts as the external load balancer front ending the application while the Load Balancer acts as the internal traffic distribution mechanism, distributing traffic to your web app.
Traffic is distributed to the two VM-Series firewalls, each assigned to a different availability set. If a VM-Series firewall fails, the traffic is redirected to the remaining healthy VM-Series firewalls by the Azure App Gateway. When the VM-Series is repaired (by Availability Set functionality), traffic is then re-distributed. This architecture not only delivers scalability, but also delivers Resiliency and High Availability through support for Azure Availability Sets.
VM-Series High Availability on Azure (Inbound & Outbound using Application Gateway & Load Balancer Integration)
To address the need for both inbound and outbound high availability on Azure, the community based ARM template can be used to deploy separate load-balanced firewalls for inbound and outbound traffic. Each firewall consists of two or more VM-Series firewalls in an availability set so they can be independently managed and scaled in or out to accommodate load. Inbound traffic from the application gateway is received by the inbound load balancer which distributes the load to an instance of the inbound VM-Series firewall. The firewall applies security policy and routes secure traffic to the backend load balancer which distributes the load to an instance of the backend web workload.
-----------Original Post: November, 2016-----------
As customers look to move their applications and data to the public cloud, it is not uncommon to hear questions around traditional data center constructs such as high availability (HA) arise. The question is often times posed as “how do you support HA in AWS or Azure.” A more cloud-centric way to pose the question would be “do we need HA in the public cloud?”
To answer the question, we first need to precisely define what we mean by HA. If the question is, do we need a fully redundant, highly available solution for securing public cloud applications? Then the answer is definitely yes. But if the question is, do we need PAN-OS stateful HA failover just like we did in the private cloud, then the answer is probably not.
The public cloud is all about leveraging shared resources and deploying applications that can survive a failure anywhere in the architecture. This includes but is not limited to a failure of:
- a virtual router
- a virtual firewall
- a network switch
- an application instance
- a load balancer instance
- an availability zone failure
- even the failure of an entire region
Customers are probably using dozens or even hundreds of applications on your laptop, tablet, and smartphone that use infrastructure that has had a failure of some type. And 99% of the time, they have no idea it happened. Some load balancer or switch or routing process bypassed the failure and the application silently tried again with little or no interruption to the user. So the focus for integrating our VM-Series firewall security into public cloud application should be on native cloud services like auto scaling groups, elastic load balancing, routing, etc and not on PAN-OS HA.
The VM-Series does support HA for AWS but it generally isn't needed if the customer uses the public cloud migration as an opportunity to update their applications to take advantage of native cloud services to build a resilient architecture that maximizes uptime. Many customers will begin their migration to the public cloud adhering to a traditional data center hardware requirements list (redundant switches, routers, firewalls, etc) which may limit the ability to leverage the power of the cloud. Using the requirement for redundancy as the driver, and then leveraging the cloud o achieve it will allow customers to: a) improve application uptime and b) reduce costs. I know this isn't always possible but the try to leave that baggage behind.
For customers that have no choice but to move a legacy application to the public cloud, we do have HA for AWS and we are investigating HA for Azure. But it comes at a cost. Not only will they need a passive firewall up and running at all times (and the bill that goes with that), but HA in the public cloud relies on API calls that can take much longer that what we can do in hardware on dedicated network infrastructure. For example, in AWS, our HA solution relies on an API call to move interfaces (ENI) from a failed firewall to the passive firewall. In practice, this takes 30 - 45 seconds but sometimes longer. Chances are, the sessions that HA was meant to save will already need to be reestablished in that timeframe.
Instead, focus on load balancing and dynamic routing which can converge must faster and let the application deal with session re-establishment. Public clouds are developing new architectural patterns that align well with general trends in IT application architectures:
- Use of horizontal scaling (aka scale out) to deal with larger loads and availability.
- Growth of web/HTTP based architectures that are less stateful overall; any state information like a session cookie can be rebuilt easily or is made redundant
- Use of service oriented architectures (SOA) like micro-services, often built on containers, to decompose the. application stack across multiple tiers that are independently scaled behind load balancers.
In these environments session data is stored in reliable database services, like Amazon DynamoDB or Amazon ElastiCache, and shared by the application servers. For example, in a shopping cart service the user's cookie session may be sync'ed/stored between web servers so that failure of a single web server has no impact on the user experience. The focus of new architectures, in public cloud and on-premises private clouds, is on service reliability and not session reliability.
Using Auto Scaling for the VM-Series on AWS to Achieve HA
As mentioned above, any AWS deployment must be architected for resiliency to eliminate the negative impact that an infrastructure component failure may have. If a failure does occur, the solution must be able to detect and route around a failure. This is true for the security of the solution as well. Auto Scaling for the VM-Series on AWS delivers HA using native cloud services.
Auto Scaling for the VM-Series on AWS deploys multiple firewalls across two or more Availability Zones within a VPC. If any one of the VM-Series firewalls fail, two things happen: First, the AWS Load Balancer detects the failure and diverts traffic to the remaining, healthy VM-Series firewalls. Second, the AWS Auto Scaling Groups will automatically remove unhealthy firewalls and replace them with new, bootstrapped VM-Series firewalls that come up fully configured and ready to handle traffic.
Depending on the balance of performance and cost sensitivity, the health checks and Auto Scaling Groups can be tuned to be very aggressive or very conservative about detecting and replacing failed components. This allows you to make your own cost/benefit decision when designing your deployment. The VM-Series not only automatically scales in and out, independently of workloads, it also is self-healing providing an overall, highly available solution across multiple Availability Zones.
- Watch the Auto Scaling the VM-Series for AWS Lightboard and Demo
- Access the Auto Scaling the VM-Series on AWS Deployment Resources on Github
Using the VM-Series Azure Application Gateway and Load Balancer Integration
to Achieve HA
The VM-Series enables you to deploy a managed scale-out solution for your inbound web application workload traffic using a load balancer “sandwich.” The Application Gateway acts as the external load balancer, front ending the application and serving as an internet gateway for the entire service. It provides application delivery controller (ADC) as a service and includes Layer 7 load balancing for HTTP and HTTPS, along with features such as SSL offload and content-based routing. The VM-Series firewalls deployed behind the Application Gateway will provide the full next-generation security protecting Azure deployments from attacks by known and unknown threats. After security inspection by the firewall, traffic is sent to the Azure Load Balancer acting as the internal load balancer, which distributes traffic to your web applications. This architecture not only delivers scalability, but also delivers Resiliency and High Availability through support for Azure Availability Sets
The application Gateway and Load Balancer deal with any traffic disruptions, Availability Sets provide protection against planned and unplanned maintenance of the Azure infrastructure. This addresses the need for resiliency and availability by minimizing or eliminating the negative impact that Azure infrastructure maintenance or system faults may have on your business by distributing the workloads across different hosts.
- Read the VM-Series Scalability and Resiliency for Azure Tech Brief
- Access the VM-Series Scalability and Resiliency Deployment Resources on Github
- Slide 45 - AWS re:Invent 2013 - High Availability Application Architectures in Amazon VPC (ARC202) | …
- Page 11 - Amazon Web Services - Architecting for The Cloud: Best Practices - https://media.amazonwebservices.com/AWS_Cloud_Best_Practices.pdf