So far we have scaled the number of replicas manually. We also have built an understanding around how Cluster Autoscaler does scale the cluster. In this section we will deploy the Horizontal Pod Autoscaler (HPA) and a rule to scale our application once it reaches a CPU threshold. The Horizontal Pod Autoscaler automatically scales the number of pods in a replication controller, deployment or replica set based on observed CPU utilization or memory.
For HPA to evaluate metrics we must first deploy Metric Server !
Metrics Server is a cluster-wide aggregator of resource usage data. These metrics will drive the scaling behavior of the deployments. We will deploy the metrics server using
Helm configured earlier in this workshop.
helm install stable/metrics-server \ --name metrics-server \ --version 2.8.3 \ --namespace metrics
Horizontal Pod Autoscaler is more versatile than just scaling on CPU and Memory. There are other projects different from the metric server that can be consider when looking scaling on the back of other metrics. For example prometheus-adapter can be used wit custom metrics imported from prometheus
Return to the terminal in the Cloud9 Environment
kubectl get apiservice v1beta1.metrics.k8s.io -o yaml
If all is well, you should see a status message similar to the one below in the response
status: conditions: - lastTransitionTime: 2018-10-15T15:13:13Z message: all checks passed reason: Passed status: "True" type: Available
We will set up a rule to scales up when CPU exceeds 50% of the allocated container resource.
kubectl autoscale deployment monte-carlo-pi-service --cpu-percent=50 --min=3 --max=100
View the HPA using kubectl. You probably will see
<unknown>/50% for 1-2 minutes and then you should be able to see
kubectl get hpa