From the Experts: URL filtering implementation and troubleshooting
In this article, we'll present a few methods of implementing URL filtering and troubleshooting any issues related to it in PAN-OS 7.0.
On Palo Alto Networks devices, PAN-DB URL Filtering is applied on 2 major protocols: HTTP and HTTPS (SSL).
PAN-DB is using a URL Filtering database that contains a listing of millions of websites that have been categorized in certain URL categories as documented at https://urlfiltering.paloaltonetworks.com/CategoryList.aspx. In addition to the standard URL categories, there are 3 more system-defined categories:
- not-resolved - The website was not found in the local URL filtering database and cloud connectivity was not possible. We will talk later in this document about the connectivity to the cloud and the way it works.
- private-ip-address - Either the website is a single domain, the IP address is in a private IP range, or the URL Root domain is unknown to the cloud.
- unknown - The website has not been categorized yet.
In order to retrieve a certain category, the firewall must connect to a database cloud in order to populate the database or to identify a category for a website. There are 2 types of clouds:
- Public cloud - Used if the firewall has Internet connectivity.
- Private cloud - Used if the firewall does not have Internet connectivity. A solution in this case is to have a M-500 device running in PAN-URL-DB mode.
URL filtering can be used for 2 major actions:
- Matching certain traffic.
- Blocking or allowing certain traffic (used for security rules).
Matching traffic using URL filtering
Use this option to match a certain type of traffic that goes to a particular website. For example, user can use QoS policies in order to limit the bandwidth for traffic going to any 'streaming-media' website. This option can be found at 'Service/URL Category' when configuring a policy.
Blocking or allowing certain traffic
Use this option to allow or block certain traffic based on a URL profile that specifies an action based on each category. Information about configuration URL profiles can be found in the Administrator's Guide https://paloaltonetworks.com/documentation/70/pan-os/pan-os
As previously mentioned, URL filtering works on 2 major protocols: HTTP and HTTPS(SSL). In order to identify a certain category for a website, the firewall must do a query in the following order:
- It checks its local data plane cache
- If no match is found, it checks its local management plane cache
- If no match is found, it performs a query to the cloud (public or private)
In order for the query to be made and a response received, the firewall must extract the website name from the TCP communication. The following information is being checked or parsed:
For HTTP traffic, the firewall is going to look primarly at the HTTP GET message. In the example below, we have sample traffic for www.paloaltonetworks.com. Observe that inside the HTTP GET message, we have the 'Host' field showing paloaltonetworks.com website. By seeing this, the firewall now knows the website and is going to do a category check for the paloaltonetworks.com website.
For HTTPS traffic, since this protocol is being encrypted, the firewall usually looks at data inside the Server Certificate that is presented to the client during the SSL handshake. In the case of decryption, this traffic will be treated as normal HTTP traffic when it comes to identifying the category.
As we can see in the screenshot below, we have captured traffic when going to https://a.ssl.fastly.net. Inside the Server Certificate field, we can see the common name is 'a.ssl.fastly.net,' therefore, the firewall will try to resolve this website to obtain a category. For SSL, there are more fields and options that the firewall is trying to look for. Common name is one that is highly used.
In order to make an identification of a category, as previously mentioned the firewall will perform the followings queries:
- It check its local data plane cache.
- If no match is found, it checks its local management plane cache.
- If no match is found, it performs a query to the cloud (public or private).
The first 2 queries are local on the device and the last one is done on an external source.
To test for a certain URL website on the firewall's CLI, use the following command, which checks the management plane cache as well as the cloud categorization:
> test url www.google.com
www.google.com search-engines (Base db) expires in 0 seconds
www.google.com cloud-unavailable (Cloud db)
Based db - The response that came from management plane
Cloud db - The response that came from the cloud. In this particular test, we see that the cloud is not available, a scenario that we are going to cover in the next steps.
In order to confirm the status of the cloud connectivity, the following command can be used:
> show url-cloud status
PAN-DB URL Filtering
License : valid
Cloud connection : not connected
URL database version - device : 2016.02.24.236
URL protocol version - device : pan/0.0.2
We can see that, in this case, the license for PAN-DB is valid; however, the cloud connection has not been established. This may be due to multiple reasons related to any of the following:
- DNS resolution failure
- SSL traffic not allowed between the firewall and Cloud
- Routing issues in the network
- Intermediate devices throughout the network dropping traffic
- If configured, possible proxy issues in regards to SSL handshake