Complete guide on Auto-Scaling in Kubernetes (HPA/VPA)

Scaling applications is a core aspect of Kubernetes’ ability to manage workloads efficiently. Kubernetes provides built-in mechanisms like Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) to adjust resource usage dynamically based on demand. These tools ensure applications perform reliably during traffic spikes while optimizing resource utilization to reduce costs.

Contents

Table of Contents Introduction to Kubernetes Autoscaling Horizontal Scaling (HPA)Vertical Scaling (VPA)Setting Up Metrics Server Step 1 – Installation Step 2 – Verify Installation HPA with CPU and Memory Metrics Example Configuration Applying the Resources Autoscaling with Custom Metrics Prerequisite – Metrics Adapter Example Custom Autoscaler Simulating Custom Metrics Horizontal vs Vertical Scaling Final Thoughts

This guide explores Horizontal and Vertical scaling concepts, setting up metrics for HPA, custom autoscaling with metrics, and practical examples.

Introduction to Kubernetes Autoscaling

Kubernetes autoscaling adjusts application capacity automatically, ensuring resources match workload requirements dynamically. This prevents over-provisioning (wasting money on unused resources) or under-provisioning (causing downtime).

Horizontal Scaling (HPA)

The Horizontal Pod Autoscaler (HPA) adjusts the number of Pods in a Deployment or ReplicaSet based on utilization metrics like CPU, memory, or custom metrics. HPA adds or removes Pods to handle traffic changes.

When to Use HPA:

Spiky traffic (e.g., web services)
Load-balancing workloads across multiple Pods

Example:
If CPU usage exceeds 70%, HPA can scale your Pods from 3 to 10 dynamically.

Vertical Scaling (VPA)

The Vertical Pod Autoscaler (VPA) optimizes resource requests and limits for individual Pods. Instead of adding Pods like HPA, VPA adjusts the CPU and memory allocated to each Pod based on observed usage.

When to Use VPA:

Resource-efficient workloads with predictable traffic
Avoid under-provisioned Pods crashing due to resource shortages

Example:
A data-processing Pod might have its memory allocation increased from 512 MiB to 1024 MiB if its consumption grows consistently.

Diagram of HPA vs VPA
Horizontal vs Vertical Scaling

Setting Up Metrics Server

The Metrics Server is essential for both HPA and VPA as it provides resource metrics like CPU and memory. Ensure it’s deployed in your cluster before configuring autoscalers.

Step 1 – Installation

Deploy the Metrics Server using the official Helm chart or YAML manifests:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Step 2 – Verify Installation

Check that the Metrics Server is running: kubectl get pods -n kube-system
Test metrics availability: kubectl top nodes NAME CPU(cores) MEMORY(bytes) node-1 250m 512Mi node-2 300m 768Mi

If no metrics are available, ensure proper configuration of kube-apiserver:

--enable-aggregator-routing=true
--kubelet-insecure-tls

HPA with CPU and Memory Metrics

The Horizontal Pod Autoscaler dynamically adjusts Pod counts to optimize performance and meet target resource utilization.

Example Configuration

Below is an example of configuring HPA for a Deployment based on CPU usage.

Deployment YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: nginx
        resources:
          requests:
            cpu: 200m # Requested resources
          limits:
            cpu: 500m # Upper limit

HPA YAML:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Applying the Resources

Deploy the Deployment and HPA: kubectl apply -f deployment.yaml kubectl apply -f hpa.yaml
Monitor HPA: kubectl get hpa web-app-hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS web-app-hpa Deployment/web-app 80%/70% 3 10 5
Simulate Load: Use tools like Apache Benchmark or kubectl scale to generate traffic: kubectl run -i --tty load-generator --image=busybox -- /bin/sh while true; do wget -q -O- http://web-app.default.svc.cluster.local; done

Autoscaling with Custom Metrics

Custom metrics allow scaling on application-specific parameters, such as request latency, active users, or queue length.

Prerequisite – Metrics Adapter

Install a custom metrics adapter like Kubernetes’ Prometheus Adapter:

helm install prometheus-adapter prometheus-community/prometheus-adapter

Ensure custom metrics are exposed to the cluster.

Example Custom Autoscaler

Below is an HPA scaling based on Prometheus metrics for HTTP request-per-second (RPS).

HPA YAML:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: 200

Simulating Custom Metrics

Expose an application metric through Prometheus instrumentation (e.g., HTTP request count), then observe how HPA adjusts Pod counts based on load.

Horizontal vs Vertical Scaling

Criteria	Horizontal Pod Autoscaling (HPA)	Vertical Pod Autoscaling (VPA)
Scaling Mechanism	Adds/removes Pods dynamically.	Adjusts CPU and memory for each Pod.
Use Case	Stateless apps handling spikes, such as APIs.	Stateful apps with predictable resource use.
Limitations	Cannot scale individual Pod resources.	May require Pod restarts during adjustment.
Impact on Downtime	Zero downtime with Rolling Updates.	Risk of restart-induced downtime.

Both HPA and VPA complement each other in modern Kubernetes environments. Use them together for optimized autoscaling.

Final Thoughts

Autoscaling in Kubernetes, powered by HPA and VPA, ensures your applications always have the resources they need while maintaining cost efficiency. Horizontal scaling is ideal for handling sudden traffic bursts, while vertical scaling is perfect for optimizing resource allocation for steady-state workloads. Combine these strategies with custom metrics for complete autoscaling flexibility.

Use this guide to set up and manage autoscaling in your Kubernetes clusters, ensuring your system scales gracefully as demands evolve!

Complete guide on Auto-Scaling in Kubernetes (HPA/VPA)

Table of Contents

Introduction to Kubernetes Autoscaling

Horizontal Scaling (HPA)

Vertical Scaling (VPA)

Setting Up Metrics Server

Step 1 – Installation

Step 2 – Verify Installation

HPA with CPU and Memory Metrics

Example Configuration

Applying the Resources

Autoscaling with Custom Metrics

Prerequisite – Metrics Adapter

Example Custom Autoscaler

Simulating Custom Metrics

Horizontal vs Vertical Scaling

Final Thoughts

Leave a Reply Cancel reply

Empowering Tomorrow's Leaders through Understanding Child Development and Learning

Daily Feed

Zero to Hero Kubernetes Crash Course – Minikube, kubectl, Helm Quickstart

Spring Boot Web Crash Course 2025 – REST APIs, Controllers, Get/Post

K8s Crash Course – Learn Containers to Clusters (Hands-On in 2025)

Spring Data JPA Crash Course 2025 – Repository, Query Methods & Paging

You Might Also Like

Fastest Growing IT Companies in India – 2025 Startups to Watch

Top 20 Learning Resources for DevOps + Spring Boot Developers (2025)

Top 10 IT Companies in India – 2025 Ranking by Revenue & Workforce

Kubernetes Architecture for Beginners | Cluster Master | Node