Automatic Node Provisioning

STOP Please note: this EKS and Karpenter workshop version is now deprecated since the launch of Karpenter v1beta, and has been updated to a new home on AWS Workshop Studio here: Karpenter: Amazon EKS Best Practice and Cloud Cost Optimization.

This workshop remains here for reference to those who have used this workshop before, or those who want to reference this workshop for running Karpenter on version v1alpha5.

Automatic Node Provisioning

With Karpenter now active, we can begin to explore how Karpenter provisions nodes. In this section we are going to create some pods using a deployment we will watch Karpenter provision nodes in response.

In this part off the workshop we will use a Deployments with the pause image. If you are not familiar with Pause Pods you can read more about them here.

Run the following command and try to answer the questions below:

cat <<EOF > inflate.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: inflate
spec:
  replicas: 0
  selector:
    matchLabels:
      app: inflate
  template:
    metadata:
      labels:
        app: inflate
    spec:
      nodeSelector:
        intent: apps
      containers:
        - name: inflate
          image: public.ecr.aws/eks-distro/kubernetes/pause:3.2
          resources:
            requests:
              cpu: 1
              memory: 1.5Gi
EOF
kubectl apply -f inflate.yaml

As Karpenter provisions nodes based on resource requests and any scheduling constraints it is important to apply accurate resource requests to all workloads. Guidance for configuring and sizing resource requests/limits can be found in the reliability section of the Amazon EKS best practices guide.

Challenge

You can use Kube-ops-view or just plain kubectl cli to visualize the changes and answer the questions below. In the answers we will provide the CLI commands that will help you check the resposnes. Remember: to get the url of kube-ops-view you can run the following command kubectl get svc kube-ops-view | tail -n 1 | awk '{ print "Kube-ops-view URL = http://"$4 }'

Answer the following questions. You can expand each question to get a detailed answer and validate your understanding.

1) Why did Karpenter not scale the cluster after making the initial deployment ?

Click here to show the answer

2) How would you scale the deployment to 1 replicas?

Click here to show the answer

3) Which instance type did Karpenter use when increasing the instances ? Why that instance ?

Click here to show the answer

4) What are the new instance properties and Labels ?

Click here to show the answer

5) Why did the newly created inflate pod was not scheduled into the managed node group ?

Click here to show the answer

6) How would you scale the number of replicas to 10? What do you expect to happen? Which instance types were selected in this case ?

Click here to show the answer

8) How would you scale the number of replicas to 0? what do you expect to happen?

Show me the answers

What Have we learned in this section:

In this section we have learned:

  • Karpenter scales up nodes in a group-less approach. Karpenter select which nodes to scale , based on the number of pending pods and the Provisioner configuration. It selects how the best instances for the workload should look like, and then provisions those instances. This is unlike what Cluster Autoscaler does. In the case of Cluster Autoscaler, first all existing node group are evaluated and to find which one is the best placed to scale, given the Pod constraints.

  • Karpenter can scale-out from zero when applications have available working pods and scale-in to zero when there are no running jobs or pods.

  • Provisioners can be setup to define governance and rules that define how nodes will be provisioned within a cluster partition. We can setup requirements such as karpenter.sh/capacity-type to allow on-demand and spot instances or use karpenter.k8s.aws/instance-size to filter smaller sizes. The full list of supported labels is available here

  • Karpenter uses cordon and drain best practices to terminate nodes. The configuration of when a node is terminated can be controlled with ttlSecondsAfterEmpty

The ability to terminate nodes only when they are completely idle is ideal for clusters or provisioners used by batch workloads. This is controlled by the setting ttlSecondsAfterEmpty. In batch workloads you want to ideally let all the kubernetes jobs to complete and for a node to be idle before removing the node. This behaviour is not ideal in scenarios where the workload are long running stateless micro-services. Under this conditions the best approach is to use Karpenter consolidation functionality. Let’s explore how consolidation works in the next section.