Scaling applications is a core aspect of Kubernetes’ ability to manage workloads efficiently. Kubernetes provides built-in mechanisms like Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) to adjust resource usage dynamically based on demand. These tools ensure applications perform reliably during traffic spikes while optimizing resource utilization to reduce costs.
This guide explores Horizontal and Vertical scaling concepts, setting up metrics for HPA, custom autoscaling with metrics, and practical examples.
Table of Contents
- Introduction to Kubernetes Autoscaling
- Setting Up Metrics Server
- HPA with CPU and Memory Metrics
- Autoscaling with Custom Metrics
- Horizontal vs Vertical Scaling
- Final Thoughts
Introduction to Kubernetes Autoscaling
Kubernetes autoscaling adjusts application capacity automatically, ensuring resources match workload requirements dynamically. This prevents over-provisioning (wasting money on unused resources) or under-provisioning (causing downtime).
Horizontal Scaling (HPA)
The Horizontal Pod Autoscaler (HPA) adjusts the number of Pods in a Deployment or ReplicaSet based on utilization metrics like CPU, memory, or custom metrics. HPA adds or removes Pods to handle traffic changes.
When to Use HPA:
- Spiky traffic (e.g., web services)
- Load-balancing workloads across multiple Pods
Example:
If CPU usage exceeds 70%, HPA can scale your Pods from 3 to 10 dynamically.
Vertical Scaling (VPA)
The Vertical Pod Autoscaler (VPA) optimizes resource requests and limits for individual Pods. Instead of adding Pods like HPA, VPA adjusts the CPU and memory allocated to each Pod based on observed usage.
When to Use VPA:
- Resource-efficient workloads with predictable traffic
- Avoid under-provisioned Pods crashing due to resource shortages
Example:
A data-processing Pod might have its memory allocation increased from 512 MiB to 1024 MiB if its consumption grows consistently.
Diagram of HPA vs VPA
Setting Up Metrics Server
The Metrics Server is essential for both HPA and VPA as it provides resource metrics like CPU and memory. Ensure it’s deployed in your cluster before configuring autoscalers.
Step 1 – Installation
Deploy the Metrics Server using the official Helm chart or YAML manifests:
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
Step 2 – Verify Installation
- Check that the Metrics Server is running:
kubectl get pods -n kube-system
- Test metrics availability:
kubectl top nodes
NAME CPU(cores) MEMORY(bytes) node-1 250m 512Mi node-2 300m 768Mi
If no metrics are available, ensure proper configuration of kube-apiserver
:
--enable-aggregator-routing=true
--kubelet-insecure-tls
HPA with CPU and Memory Metrics
The Horizontal Pod Autoscaler dynamically adjusts Pod counts to optimize performance and meet target resource utilization.
Example Configuration
Below is an example of configuring HPA for a Deployment based on CPU usage.
Deployment YAML:
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
spec:
replicas: 3
selector:
matchLabels:
app: web-app
template:
metadata:
labels:
app: web-app
spec:
containers:
- name: web-app
image: nginx
resources:
requests:
cpu: 200m # Requested resources
limits:
cpu: 500m # Upper limit
HPA YAML:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
Applying the Resources
- Deploy the Deployment and HPA:
kubectl apply -f deployment.yaml kubectl apply -f hpa.yaml
- Monitor HPA:
kubectl get hpa web-app-hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS web-app-hpa Deployment/web-app 80%/70% 3 10 5
- Simulate Load: Use tools like
Apache Benchmark
orkubectl scale
to generate traffic:kubectl run -i --tty load-generator --image=busybox -- /bin/sh while true; do wget -q -O- http://web-app.default.svc.cluster.local; done
Autoscaling with Custom Metrics
Custom metrics allow scaling on application-specific parameters, such as request latency, active users, or queue length.
Prerequisite – Metrics Adapter
Install a custom metrics adapter like Kubernetes’ Prometheus Adapter:
helm install prometheus-adapter prometheus-community/prometheus-adapter
Ensure custom metrics are exposed to the cluster.
Example Custom Autoscaler
Below is an HPA scaling based on Prometheus metrics for HTTP request-per-second (RPS).
HPA YAML:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: custom-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 2
maxReplicas: 20
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 200
Simulating Custom Metrics
Expose an application metric through Prometheus instrumentation (e.g., HTTP request count), then observe how HPA adjusts Pod counts based on load.
Horizontal vs Vertical Scaling
Criteria | Horizontal Pod Autoscaling (HPA) | Vertical Pod Autoscaling (VPA) |
---|---|---|
Scaling Mechanism | Adds/removes Pods dynamically. | Adjusts CPU and memory for each Pod. |
Use Case | Stateless apps handling spikes, such as APIs. | Stateful apps with predictable resource use. |
Limitations | Cannot scale individual Pod resources. | May require Pod restarts during adjustment. |
Impact on Downtime | Zero downtime with Rolling Updates. | Risk of restart-induced downtime. |
Both HPA and VPA complement each other in modern Kubernetes environments. Use them together for optimized autoscaling.
Final Thoughts
Autoscaling in Kubernetes, powered by HPA and VPA, ensures your applications always have the resources they need while maintaining cost efficiency. Horizontal scaling is ideal for handling sudden traffic bursts, while vertical scaling is perfect for optimizing resource allocation for steady-state workloads. Combine these strategies with custom metrics for complete autoscaling flexibility.
Use this guide to set up and manage autoscaling in your Kubernetes clusters, ensuring your system scales gracefully as demands evolve!