Prisma Cloud Compute : Taints and Tolerance - Defenders failed to install on all nodes in the cluster.
1754
Created On 07/21/22 08:36 AM - Last Modified 04/30/24 09:39 AM
Symptom
- The Twistlock Daemonset Defenders deployment successfully installed defenders on a few nodes in the cluster and failed on other nodes.
Environment
- Prisma Cloud Compute (SaaS and Self-Hosted)
Cause
- The Taints configured on the Kubernetes environment restricts the deployment of Daemonset defenders across all nodes
Resolution
- Add Toleration configuration in the Daemonset YAML file:
tolerations: - effect: NoSchedule operator: Exists - effect: NoExecute operator: Exists
- Refer to Control Defender deployments with taint for Openshift Clusters.
Additional Information
What is Taints and Toleration?
- Taints allow a node to repel a set of pods.
- Tolerations are applied to pods. Tolerations allow the scheduler to schedule pods with matching taints. Tolerations allow scheduling but don't guarantee scheduling: the scheduler also evaluates other parameters as part of its function.
- Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints.
- Ref: https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/
What is Node Pool (in AKS/GKS) or Node Group (in EKS)?
- Node Pool is the terminology used in Azure Kubernetes Service (AKS) and Google Kubernetes Service (GKS) .
- Node Group is the terminology used in AWS Elastic Kubernetes Service (EKS) .
- The nodes of the same configuration are grouped together into node pools or node group. These node pools or node group contains the underlying VMs that run your applications.
Example:
Below is an example of the Prisma Cloud Compute Daemonset defender installation on AKS with Taints and Tolerations.- Environment:
- AKS with:
- One System node pool named 'agentpool'
- One User node pool named 'nodepool1'
Note: Each node pool has only one node (for this example)
- Below is the taint configured on the User node pool:
- This places a taint on nodepool1.
- The taint has key
testkey, valuepalotest, and taint effectNoSchedule
- Verify the Taints configured on the Cluster as described below:
kubectl get nodes -o wide
cloud@Azure:~$ kubectl get nodes NAME STATUS ROLES AGE VERSION aks-agentpool-42262098-vmss000000 Ready agent 144m v1.22.11 aks-nodepool1-42262098-vmss000000 Ready agent 144m v1.22.11
2. Describe the nodes to find the Taint configured for it.
kubectl describe node <NODE_NAME>
cloud@Azure:~$ kubectl describe node aks-nodepool1-42262098-vmss000000 | grep Taints
Taints: testkey=palotest:NoSchedule
cloud@Azure:~$ kubectl describe node aks-agentpool-42262098-vmss000000 | grep Taints
Taints: <none>
3. User Node Pool - 'nodepool1' has Taints configured and the System Node Pool - 'agentpool1' has No Taints. 4. If the Kubernetes cluster is hosted on any of the Kubernetes services from the Cloud Provider, the Taint configuration can be verified from the GUI. Below is the screenshot from the AKS (Azure Kubernetes Service).
Navigate to Kubernetes Cluster > Node Pools > Select the Node Pool > Taints and labels
- Deploying the Daemonset Defender using the YAML file from the Prisma Cloud Compute Console fails to install the defenders on the Nodes with Taints configured ('nodepool1' from the above example). It installs only on the nodes that have no Taints.
kubectl apply -f defender.yaml
cloud@Azure:~$ kubectl apply -f defender.yaml clusterrole.rbac.authorization.k8s.io/twistlock-view unchanged clusterrolebinding.rbac.authorization.k8s.io/twistlock-view-binding configured secret/twistlock-secrets unchanged serviceaccount/twistlock-service configured daemonset.apps/twistlock-defender-ds configured service/defender unchanged
- This installs defender on the node without Taints configured ('agentpool'):
cloud@Azure:~$ kubectl get pods -o wide -n twistlock NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES twistlock-defender-ds-bbwg9 1/1 Running 0 12s 10.224.0.4 aks-agentpool-42262098-vmss000000 <none> <none>
- The Console shows only one defender connected:
- The defender is installed on the System node pool ('agentpool') and not on the User node pool('nodepool1'). Because:
- The Daemonset YAML from the Prisma Console doesn't include any toleration configuration by default.
- The System node pool doesn't have any taints. Applying the default yaml (with no tolerations) installs defender successfully on it.
- However, it fails to install the defender on the User node pool with taint (testkey=palotest:NoSchedule) since the YAML has no matching toleration configured.
- To install on all nodes (both with and without taints), configure the below Toleration to the YAML.
tolerations:
- effect: NoSchedule
operator: Exists
- effect: NoExecute
operator: Exists
- Apply this new YAML with the toleration.
cloud@Azure:~$ kubectl apply -f taintsdefender.yaml clusterrole.rbac.authorization.k8s.io/twistlock-view unchanged clusterrolebinding.rbac.authorization.k8s.io/twistlock-view-binding configured secret/twistlock-secrets unchanged serviceaccount/twistlock-service configured daemonset.apps/twistlock-defender-ds configured service/defender unchanged
- This installs defenders on all nodes (System Node pool without taint and User Node pool with taint).
cloud@Azure:~$ kubectl get pods -o wide -n twistlock NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES twistlock-defender-ds-p6flv 1/1 Running 0 9s 10.224.0.7 aks-nodepool1-42262098-vmss000000 <none> <none> twistlock-defender-ds-qfsfl 1/1 Running 0 5s 10.224.0.4 aks-agentpool-42262098-vmss000000 <none> <none>
- The Console shows 2 defenders connected.