Please note: this EKS and Karpenter workshop version is now deprecated since the launch of Karpenter v1beta, and has been updated to a new home on AWS Workshop Studio here: Karpenter: Amazon EKS Best Practice and Cloud Cost Optimization.
This workshop remains here for reference to those who have used this workshop before, or those who want to reference this workshop for running Karpenter on version v1alpha5.
In the previous section we did set the default provisioner configured with a specific ttlSecondsAfterEmpty
. This instructs Karpenter to remove nodes after ttlSecondsAfterEmpty
of a node being empty. Note Karpenter will take Daemonset into consideration.We also know that nodes can be removed when they reach the ttlSecondsUntilExpired
. This is ideal to force node termination on the cluster while bringing new nodes that will pick up the latest AMI’s.
Automated deprovisioning is configured through the ProvisionerSpec .ttlSecondsAfterEmpty
, .ttlSecondsUntilExpired
and .consolidation.enabled
fields. If these are not configured, Karpenter will not default values for them and will not terminate nodes.
There is another way to configure Karpenter to deprovision nodes called Consolidation. This mode is preferred for workloads such as microservices and is imcompatible with setting up the ttlSecondsAfterEmpty
. When set in consolidation mode Karpenter works to actively reduce cluster cost by identifying when nodes can be removed as their workloads will run on other nodes in the cluster and when nodes can be replaced with cheaper variants due to a change in the workloads.
Before we proceed to see how Consolidation works, let’s change the default provisioner configuration:
cat <<EOF | kubectl apply -f -
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
consolidation:
enabled: true
labels:
intent: apps
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand"]
- key: karpenter.k8s.aws/instance-size
operator: NotIn
values: [nano, micro, small, medium, large]
limits:
resources:
cpu: 1000
memory: 1000Gi
ttlSecondsUntilExpired: 2592000
providerRef:
name: default
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: default
spec:
subnetSelector:
karpenter.sh/discovery: eksspotworkshop
securityGroupSelector:
karpenter.sh/discovery: eksspotworkshop
tags:
KarpenerProvisionerName: "default"
NodeType: "karpenter-workshop"
IntentLabel: "apps"
EOF
When consolidation is enabled it is recommended you configure requests=limits for all non-CPU resources. As an example, pods that have a memory limit that is larger than the memory request can burst above the request. If several pods on the same node burst at the same time, this can cause some of the pods to be terminated due to an out of memory (OOM) condition. Consolidation can make this more likely to occur as it works to pack pods onto nodes only considering their requests.
You can use Kube-ops-view or just plain kubectl cli to visualize the changes and answer the questions below. In the answers we will provide the CLI commands that will help you check the resposnes. Remember: to get the url of kube-ops-view you can run the following command kubectl get svc kube-ops-view | tail -n 1 | awk '{ print "Kube-ops-view URL = http://"$4 }'
.
Note the current version of Kube-ops-view sometimes takes time to reflect the correct state of the cluster.
Before we start to deep dive into consolidation, lets set up the environment in an initial state. First lets scale the inflate
application to 3 replicas, so that we provision a small node. You can do that by running:
kubectl scale deployment inflate --replicas 3
You can then check the number of nodes using the following command kubectl get nodes
; Once that there are 3 nodes in the cluster you can run again:
kubectl scale deployment inflate --replicas 10
That will create up yet another node. In total there should be now 4 nodes, 2 for the managed nodegroup, and 2 on-demand nodes, one holding 3 of our inflate
replicas of size xlarge and a 2xlarge
Answer the following questions. You can expand each question to get a detailed answer and validate your understanding.
inflate
deployment to 6 replicas, What should happen ?Scaling to 6 replicas should be easy:
kubectl scale deployment inflate --replicas 6
As for what should happen, check out karpenter logs, remember you can read karpenter logs using the following command
alias kl='kubectl -n karpenter logs -l app.kubernetes.io/name=karpenter --all-containers=true -f --tail=20'
kl
Karpenter logs will display the following lines
2022-09-05T08:24:35.548Z INFO controller.consolidation Consolidating via Delete, terminating 1 nodes ip-192-168-31-208.eu-west-1.compute.internal/t3a.xlarge {"commit": "b157d45"}
2022-09-05T08:24:35.600Z INFO controller.termination Cordoned node {"commit": "b157d45", "node": "ip-192-168-31-208.eu-west-1.compute.internal"}
2022-09-05T08:24:36.707Z INFO controller.termination Deleted node {"commit": "b157d45", "node": "ip-192-168-31-208.eu-west-1.compute.internal"}
At this point, Karpenter has issued a consolidation via deletion of one of it’s nodes. Karpenter has two mechanisms for cluster consolidation:
When there are multiple nodes that could be potentially deleted or replaced, Karpenter choose to consolidate the node that overall disrupts your workloads the least by preferring to terminate:
In this case 4 pods were termimnated. This resulted in Karpenter deleting the xlarge node and consolidating the 6 pods remaining in the 2xlarge capacity, resulting in the operation that reduced the cost while disrupting less the cluster.
The log does also show something interesting lines that refer to Cordon
and Deletion
. Karpenter sets a Kubernetes finalizer on each node it provisions. The finalizer specifies additional actions the Karpenter controller will take in response to a node deletion request. By adding the finalizer, Karpenter improves the default Kubernetes process of node deletion. When you run kubectl delete node on a node without a finalizer, the node is deleted without triggering the finalization logic. The instance will continue running in EC2, even though there is no longer a node object for it. The kubelet isn’t watching for its own existence, so if a node is deleted the kubelet doesn’t terminate itself. All the pod objects get deleted by a garbage collection process later, because the pods’ node is gone.
To scale up the deployment run the following command:
kubectl scale deployment inflate --replicas 3
Following the description from the previous question, in this case we expect Karpenter to issue a Replace order and use instead of the 2xlarge instance a consolidation into a xlarge instance. You can follow the changes on kube-ops-view
or by running kubectl get nodes
; The steps that follow are, first a new xlarge instance gets created and started and finally, the 2xlarge instance is terminated.
Karpenter logs will show the sequence of events on the output, similar to the ones below :
2022-09-05T08:36:20.652Z INFO controller.consolidation Consolidating via Replace, terminating 1 nodes ip-192-168-15-83.eu-west-1.compute.internal/t3a.2xlarge and replacing with a node from types c6id.xlarge, c5ad.xlarge, r5ad.xlarge, t3a.xlarge, r6i.xlarge and 26 other(s) {"commit": "b157d45"}
2022-09-05T08:36:20.684Z INFO controller.consolidation Launching node with 3 pods requesting {"cpu":"3125m","memory":"4608Mi","pods":"5"} from types t3a.xlarge, c6a.xlarge, c5a.xlarge, t3.xlarge, c6i.xlarge and 26 other(s) {"commit": "b157d45", "provisioner": "default"}
2022-09-05T08:36:20.966Z DEBUG controller.consolidation.cloudprovider Created launch template, Karpenter-eksworkshop-eksctl-10619024032654607850 {"commit": "b157d45", "provisioner": "default"}
2022-09-05T08:36:22.694Z INFO controller.consolidation.cloudprovider Launched instance: i-0a4609a533f1dc157, hostname: ip-192-168-73-120.eu-west-1.compute.internal, type: t3a.xlarge, zone: eu-west-1a, capacityType: on-demand {"commit": "b157d45", "provisioner": "default"}
2022-09-05T08:37:38.649Z INFO controller.termination Deleted node {"commit": "b157d45", "node": "ip-192-168-15-83.eu-west-1.compute.internal"}
on-demand
and spot
?To scale up the deployment bach to 10 ,run the following command:
kubectl scale deployment inflate --replicas 10
There should not be any surprise here, like in previous cases a new 2xlarge node might be selected to place the extra 7 pods. Note, the provisioned instance type can depend on available spot capacity.
To apply the change to the provisioner, we will re-deploy the default provisioner, this time using both on-demand
and spot
. Run the following command
cat <<EOF | kubectl apply -f -
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
name: default
spec:
# Enables consolidation which attempts to reduce cluster cost by both removing un-needed nodes and down-sizing those
# that can't be removed. Mutually exclusive with the ttlSecondsAfterEmpty parameter.
consolidation:
enabled: true
labels:
intent: apps
requirements:
- key: karpenter.sh/capacity-type
operator: In
values: ["on-demand","spot"]
- key: karpenter.k8s.aws/instance-size
operator: NotIn
values: [nano, micro, small, medium, large]
limits:
resources:
cpu: 1000
memory: 1000Gi
ttlSecondsUntilExpired: 2592000
providerRef:
name: default
---
apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
name: default
spec:
subnetSelector:
karpenter.sh/discovery: eksspotworkshop
securityGroupSelector:
karpenter.sh/discovery: eksspotworkshop
tags:
KarpenerProvisionerName: "default"
NodeType: "karpenter-workshop"
IntentLabel: "apps"
EOF
For deployments that do not provide a preference, Karpenter prioritizes Spot offerings if the provisioner allows Spot and on-demand instances. Changes in a Provisioner do also result in a re-evaluation of the consolidation policies. As a result of this changes, we should expect for Karpenter to change the nodes in the cluster from on-demand
to Spot. As we learned in the solution to the first challenge, when there are multiple nodes that can be consolidated, Karpenter will start from the one that causes less disruption. In this case we are expecting the xlarge instance to be Replaced first. The replacement follows the same steps that we have seen previously. First a new spot
instance is created and then the on-demand
instance that was selected to be replaced will be terminated. The same sequence of events happens after that with the 2xlarge instance.
Karpenter logs should show a sequence of events similar to the one below.
2022-09-05T08:54:53.601Z INFO controller.consolidation Consolidating via Replace, terminating 1 nodes ip-192-168-73-120.eu-west-1.compute.internal/t3a.xlarge and replacing with a node from types m5n.xlarge, m5d.xlarge, m6i.xlarge, c6a.2xlarge, m5a.xlarge and 44 other(s) {"commit": "b157d45"}
2022-09-05T08:54:53.636Z INFO controller.consolidation Launching node with 3 pods requesting {"cpu":"3125m","memory":"4608Mi","pods":"5"} from types t3a.xlarge, t3.xlarge, c6a.xlarge, c6i.xlarge, c6id.xlarge and 44 other(s) {"commit": "b157d45", "provisioner": "default"}
2022-09-05T08:54:54.020Z DEBUG controller.consolidation.cloudprovider Created launch template, Karpenter-eksworkshop-eksctl-6351194516503745500 {"commit": "b157d45", "provisioner": "default"}
2022-09-05T08:54:56.072Z INFO controller.consolidation.cloudprovider Launched instance: i-0949483dbf32ca4c2, hostname: ip-192-168-49-94.eu-west-1.compute.internal, type: c5.xlarge, zone: eu-west-1b, capacityType: spot {"commit": "b157d45", "provisioner": "default"}
2022-09-05T08:55:51.936Z INFO controller.termination Deleted node {"commit": "b157d45", "node": "ip-192-168-73-120.eu-west-1.compute.internal"}
2022-09-05T08:56:02.531Z INFO controller.consolidation Consolidating via Replace, terminating 1 nodes ip-192-168-8-39.eu-west-1.compute.internal/t3a.2xlarge and replacing with a node from types r6a.4xlarge, c5d.2xlarge, c6a.4xlarge, c6a.2xlarge, z1d.2xlarge and 49 other(s) {"commit": "b157d45"}
2022-09-05T08:56:02.568Z INFO controller.consolidation Launching node with 7 pods requesting {"cpu":"7125m","memory":"10752Mi","pods":"9"} from types inf1.2xlarge, c3.2xlarge, r3.2xlarge, c5a.2xlarge, t3a.2xlarge and 49 other(s) {"commit": "b157d45", "provisioner": "default"}
2022-09-05T08:56:02.860Z DEBUG controller.consolidation.cloudprovider Discovered launch template Karpenter-eksworkshop-eksctl-6351194516503745500 {"commit": "b157d45", "provisioner": "default"}
2022-09-05T08:56:05.328Z INFO controller.consolidation.cloudprovider Launched instance: i-00e7459e351c4992f, hostname: ip-192-168-29-219.eu-west-1.compute.internal, type: c5a.2xlarge, zone: eu-west-1a, capacityType: spot {"commit": "b157d45", "provisioner": "default"}
2022-09-05T08:57:06.000Z INFO controller.termination Deleted node {"commit": "b157d45", "node": "ip-192-168-8-39.eu-west-1.compute.internal"}
inflate
service to 3 replicas, what should happen ? Click here to show the answer
</span>
</div>
<div class="expand-content" style="display: none;">
<p>Run the following command to set the number of replicas to 3.</p>
</div>
Click here to show the answer
</span>
</div>
<div class="expand-content" style="display: none;">
<p>There are a few cases where requesting to deprovision a Karpenter node will not work. These include <strong>Pod Disruption Budgets</strong> and pods that have the <strong>do-not-evict</strong> annotation set.</p>
</div>
In preparation for the next section, scale replicas to 0 using the following command.
kubectl scale deployment inflate --replicas 0
In this section we have learned:
Karpenter can be configured to consolidate workloads using the . The ttlSecondsAfterEmpty
and .consolidation.enabled
are mutually exclussive within a provisioner.
Consolidation helps to reduce the overal cost of the cluster in two situations. Delete can ocur when the capacity of a node can be safely distributed to other nodes. Replace ocurs when a node can be replace by a smaller node thus reducing the cost in the cluster
Consolidation takes into consideration multiple nodes, but only acts on one node at a time. The node selected is the one that minimises the disruption in the cluster.
Delete Consolidation does also include events where instances are moved from on-demand
to spot
, however karpenter does not trigger Replace to make Spot node smaller as this can have an impact on the level of interruptions.
Karpenter uses cordon and drain best practices to terminate nodes. to make it safer, Karpenter adds a finalizer so that a kubernetes delete node
command, results in a graceful termination that remove the node safely from the cluster.