Prisma Cloud Compute : Taints and Tolerance - Defenders failed to install on all nodes in the cluster.

Prisma Cloud Compute : Taints and Tolerance - Defenders failed to install on all nodes in the cluster.

1754
Created On 07/21/22 08:36 AM - Last Modified 04/30/24 09:39 AM


Symptom


  • The Twistlock Daemonset Defenders deployment successfully installed defenders on a few nodes in the cluster and failed on other nodes. 


Environment


  • Prisma Cloud Compute (SaaS and Self-Hosted)


Cause


  • The Taints configured on the Kubernetes environment restricts the deployment of Daemonset defenders across all nodes


Resolution


  • Add Toleration configuration in the Daemonset YAML file:
tolerations:
- effect: NoSchedule
operator: Exists
- effect: NoExecute
operator: Exists

 


Additional Information



What is Taints and Toleration?

  • Taints allow a node to repel a set of pods.
  • Tolerations are applied to pods. Tolerations allow the scheduler to schedule pods with matching taints. Tolerations allow scheduling but don't guarantee scheduling: the scheduler also evaluates other parameters as part of its function.
  • Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints.
  • Ref: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/


What is Node Pool (in AKS/GKS) or Node Group (in EKS)?

 

Example:

Below is an example of the Prisma Cloud Compute Daemonset defender installation on AKS with Taints and Tolerations.
  • Environment: 
    • AKS with:
  1. One System node pool named 'agentpool'
  2. One User node pool named 'nodepool1'

Note: Each node pool has only one node (for this example)

  • Below is the taint configured on the User node pool:
    • This places a taint on nodepool1.
    • The taint has key testkey, value palotest, and taint effect NoSchedule
image.png
  • Verify the Taints configured on the Cluster as described below:
1. Check the nodes:
kubectl get nodes  -o wide
cloud@Azure:~$ kubectl get nodes
NAME                                STATUS   ROLES   AGE    VERSION
aks-agentpool-42262098-vmss000000   Ready    agent   144m   v1.22.11
aks-nodepool1-42262098-vmss000000   Ready    agent   144m   v1.22.11

2. Describe the nodes to find the Taint configured for it.
kubectl describe node <NODE_NAME>
cloud@Azure:~$ kubectl describe node aks-nodepool1-42262098-vmss000000 | grep Taints
Taints:             testkey=palotest:NoSchedule
cloud@Azure:~$ kubectl describe node aks-agentpool-42262098-vmss000000 | grep Taints
Taints:             <none>
3. User Node Pool - 'nodepool1' has Taints configured and the System Node Pool - 'agentpool1' has No Taints

4. If the Kubernetes cluster is hosted on any of the Kubernetes services from the Cloud Provider, the Taint configuration can be verified from the GUI. Below is the screenshot from the AKS (Azure Kubernetes Service).
Navigate to Kubernetes Cluster > Node Pools > Select the Node Pool > Taints and labels

image.png

 
  • Deploying the Daemonset Defender using the YAML file from the Prisma Cloud Compute Console fails to install the defenders on the Nodes with Taints configured ('nodepool1' from the above example). It installs only on the nodes that have no Taints.
kubectl apply -f defender.yaml 
 
cloud@Azure:~$ kubectl apply -f defender.yaml 
clusterrole.rbac.authorization.k8s.io/twistlock-view unchanged
clusterrolebinding.rbac.authorization.k8s.io/twistlock-view-binding configured
secret/twistlock-secrets unchanged
serviceaccount/twistlock-service configured
daemonset.apps/twistlock-defender-ds configured
service/defender unchanged
 
  • This installs defender on the node without Taints configured ('agentpool'):
 
cloud@Azure:~$ kubectl get pods -o wide -n twistlock
NAME                          READY   STATUS    RESTARTS   AGE   IP           NODE                                NOMINATED NODE   READINESS GATES
twistlock-defender-ds-bbwg9   1/1     Running   0          12s   10.224.0.4   aks-agentpool-42262098-vmss000000   <none>           <none>
 
  • The Console shows only one defender connected:

image.png
  • The defender is installed on the System node pool ('agentpool') and not on the User node pool('nodepool1'). Because:
    • The Daemonset YAML from the Prisma Console doesn't include any toleration configuration by default. 
    • image.png
    • The System node pool doesn't have any taints. Applying the default yaml (with no tolerations) installs defender successfully on it. 
    • However, it fails to install the defender on the User node pool with taint (testkey=palotest:NoSchedule) since the YAML has no matching toleration configured. 
 
  • To install on all nodes (both with and without taints), configure the below Toleration to the YAML.
      tolerations:
      - effect: NoSchedule
        operator: Exists
      - effect: NoExecute
        operator: Exists
image.png
 
  • Apply this new YAML with the toleration.
cloud@Azure:~$ kubectl apply -f taintsdefender.yaml 
clusterrole.rbac.authorization.k8s.io/twistlock-view unchanged
clusterrolebinding.rbac.authorization.k8s.io/twistlock-view-binding configured
secret/twistlock-secrets unchanged
serviceaccount/twistlock-service configured
daemonset.apps/twistlock-defender-ds configured
service/defender unchanged
  •   This installs defenders on all nodes (System Node pool without taint and User Node pool with taint). 
cloud@Azure:~$ kubectl get pods -o wide -n twistlock
NAME                          READY   STATUS    RESTARTS   AGE   IP           NODE                                NOMINATED NODE   READINESS GATES
twistlock-defender-ds-p6flv   1/1     Running   0          9s    10.224.0.7   aks-nodepool1-42262098-vmss000000   <none>           <none>
twistlock-defender-ds-qfsfl   1/1     Running   0          5s    10.224.0.4   aks-agentpool-42262098-vmss000000   <none>           <none>
  • The Console shows 2 defenders connected. 
image.png








 


 
 


Actions
  • Print
  • Copy Link

    https://knowledgebase.paloaltonetworks.com/KCSArticleDetail?id=kA14u000000CqgdCAC&lang=en_US&refURL=http%3A%2F%2Fknowledgebase.paloaltonetworks.com%2FKCSArticleDetail