Shoomer

  • Docker
    DockerShow More
    Monolith to Microservices: A Docker + K8s Migration Story
    8 Min Read
    Docker Security Best Practices | Scout Trivy Scans
    8 Min Read
    CI/CD with Docker and Kubernetes with Examples YML
    10 Min Read
    Docker Networking Deep Dive | Bridge, Host, Overlay
    9 Min Read
    Docker Volumes and Bind Mounts Explained with Examples
    7 Min Read
  • Kubernetes
    KubernetesShow More
    Zero to Hero Kubernetes Crash Course – Minikube, kubectl, Helm Quickstart
    7 Min Read
    Spring Boot Web Crash Course 2025 – REST APIs, Controllers, Get/Post
    7 Min Read
    K8s Crash Course – Learn Containers to Clusters (Hands-On in 2025)
    7 Min Read
    Spring Data JPA Crash Course 2025 – Repository, Query Methods & Paging
    7 Min Read
    Spring Boot for Web Development – Crash Course with Thymeleaf & MVC
    7 Min Read
  • CICD Pipelines
    CICD PipelinesShow More
    What is GitOps with ArgoCD: Deep Dive into Architecture
    10 Min Read
    CI/CD with Docker and Kubernetes with Examples YML
    10 Min Read
  • Pages
    • About Us
    • Contact Us
    • Cookies Policy
    • Disclaimer
    • Privacy Policy
    • Terms of Use
Notification Show More
Font ResizerAa
Font ResizerAa

Shoomer

  • Learning & Education
  • Docker
  • Technology
  • Donate US
Search
  • Home
  • Categories
    • Learning & Education
    • Technology
    • Docker
  • More Foxiz
    • Donate US
    • Complaint
    • Sitemap
Follow US
Home » Complete guide on Auto-Scaling in Kubernetes (HPA/VPA)
Kubernetes

Complete guide on Auto-Scaling in Kubernetes (HPA/VPA)

shoomer
By shoomer
Last updated: June 11, 2025
Share

Scaling applications is a core aspect of Kubernetes’ ability to manage workloads efficiently. Kubernetes provides built-in mechanisms like Horizontal Pod Autoscaler (HPA) and Vertical Pod Autoscaler (VPA) to adjust resource usage dynamically based on demand. These tools ensure applications perform reliably during traffic spikes while optimizing resource utilization to reduce costs.

Contents
Table of ContentsIntroduction to Kubernetes AutoscalingHorizontal Scaling (HPA)Vertical Scaling (VPA)Setting Up Metrics ServerStep 1 – InstallationStep 2 – Verify InstallationHPA with CPU and Memory MetricsExample ConfigurationApplying the ResourcesAutoscaling with Custom MetricsPrerequisite – Metrics AdapterExample Custom AutoscalerSimulating Custom MetricsHorizontal vs Vertical ScalingFinal Thoughts

This guide explores Horizontal and Vertical scaling concepts, setting up metrics for HPA, custom autoscaling with metrics, and practical examples.

Table of Contents

  1. Introduction to Kubernetes Autoscaling
    • Horizontal Scaling (HPA)
    • Vertical Scaling (VPA)
  2. Setting Up Metrics Server
  3. HPA with CPU and Memory Metrics
  4. Autoscaling with Custom Metrics
  5. Horizontal vs Vertical Scaling
  6. Final Thoughts

Introduction to Kubernetes Autoscaling

Kubernetes autoscaling adjusts application capacity automatically, ensuring resources match workload requirements dynamically. This prevents over-provisioning (wasting money on unused resources) or under-provisioning (causing downtime).

Horizontal Scaling (HPA)

The Horizontal Pod Autoscaler (HPA) adjusts the number of Pods in a Deployment or ReplicaSet based on utilization metrics like CPU, memory, or custom metrics. HPA adds or removes Pods to handle traffic changes.

When to Use HPA:

  • Spiky traffic (e.g., web services)
  • Load-balancing workloads across multiple Pods

Example:
If CPU usage exceeds 70%, HPA can scale your Pods from 3 to 10 dynamically.

Vertical Scaling (VPA)

The Vertical Pod Autoscaler (VPA) optimizes resource requests and limits for individual Pods. Instead of adding Pods like HPA, VPA adjusts the CPU and memory allocated to each Pod based on observed usage.

When to Use VPA:

  • Resource-efficient workloads with predictable traffic
  • Avoid under-provisioned Pods crashing due to resource shortages

Example:
A data-processing Pod might have its memory allocation increased from 512 MiB to 1024 MiB if its consumption grows consistently.

Diagram of HPA vs VPA
Horizontal vs Vertical Scaling


Setting Up Metrics Server

The Metrics Server is essential for both HPA and VPA as it provides resource metrics like CPU and memory. Ensure it’s deployed in your cluster before configuring autoscalers.

Step 1 – Installation

Deploy the Metrics Server using the official Helm chart or YAML manifests:

kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

Step 2 – Verify Installation

  1. Check that the Metrics Server is running: kubectl get pods -n kube-system
  2. Test metrics availability: kubectl top nodes NAME CPU(cores) MEMORY(bytes) node-1 250m 512Mi node-2 300m 768Mi

If no metrics are available, ensure proper configuration of kube-apiserver:

--enable-aggregator-routing=true
--kubelet-insecure-tls

HPA with CPU and Memory Metrics

The Horizontal Pod Autoscaler dynamically adjusts Pod counts to optimize performance and meet target resource utilization.

Example Configuration

Below is an example of configuring HPA for a Deployment based on CPU usage.

Deployment YAML:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
spec:
  replicas: 3
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web-app
        image: nginx
        resources:
          requests:
            cpu: 200m # Requested resources
          limits:
            cpu: 500m # Upper limit

HPA YAML:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70

Applying the Resources

  1. Deploy the Deployment and HPA: kubectl apply -f deployment.yaml kubectl apply -f hpa.yaml
  2. Monitor HPA: kubectl get hpa web-app-hpa NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS web-app-hpa Deployment/web-app 80%/70% 3 10 5
  3. Simulate Load: Use tools like Apache Benchmark or kubectl scale to generate traffic: kubectl run -i --tty load-generator --image=busybox -- /bin/sh while true; do wget -q -O- http://web-app.default.svc.cluster.local; done

Autoscaling with Custom Metrics

Custom metrics allow scaling on application-specific parameters, such as request latency, active users, or queue length.

Prerequisite – Metrics Adapter

Install a custom metrics adapter like Kubernetes’ Prometheus Adapter:

helm install prometheus-adapter prometheus-community/prometheus-adapter

Ensure custom metrics are exposed to the cluster.

Example Custom Autoscaler

Below is an HPA scaling based on Prometheus metrics for HTTP request-per-second (RPS).

HPA YAML:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: custom-hpa
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  minReplicas: 2
  maxReplicas: 20
  metrics:
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: 200

Simulating Custom Metrics

Expose an application metric through Prometheus instrumentation (e.g., HTTP request count), then observe how HPA adjusts Pod counts based on load.


Horizontal vs Vertical Scaling

CriteriaHorizontal Pod Autoscaling (HPA)Vertical Pod Autoscaling (VPA)
Scaling MechanismAdds/removes Pods dynamically.Adjusts CPU and memory for each Pod.
Use CaseStateless apps handling spikes, such as APIs.Stateful apps with predictable resource use.
LimitationsCannot scale individual Pod resources.May require Pod restarts during adjustment.
Impact on DowntimeZero downtime with Rolling Updates.Risk of restart-induced downtime.

Both HPA and VPA complement each other in modern Kubernetes environments. Use them together for optimized autoscaling.


Final Thoughts

Autoscaling in Kubernetes, powered by HPA and VPA, ensures your applications always have the resources they need while maintaining cost efficiency. Horizontal scaling is ideal for handling sudden traffic bursts, while vertical scaling is perfect for optimizing resource allocation for steady-state workloads. Combine these strategies with custom metrics for complete autoscaling flexibility.

Use this guide to set up and manage autoscaling in your Kubernetes clusters, ensuring your system scales gracefully as demands evolve!

Share This Article
Facebook Email Copy Link Print
Previous Article Kubernetes Helm Charts: A Complete Tutorial
Next Article Docker Security Best Practices | Scout Trivy Scans
Leave a Comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Empowering Tomorrow's Leaders through Understanding Child Development and Learning

Learning to thrive

Daily Feed

Zero to Hero Kubernetes Crash Course – Minikube, kubectl, Helm Quickstart
June 23, 2025
Spring Boot Web Crash Course 2025 – REST APIs, Controllers, Get/Post
June 23, 2025
K8s Crash Course – Learn Containers to Clusters (Hands-On in 2025)
June 23, 2025
Spring Data JPA Crash Course 2025 – Repository, Query Methods & Paging
June 23, 2025

You Might Also Like

Kubernetes

Fastest Growing IT Companies in India – 2025 Startups to Watch

June 23, 2025
Kubernetes

Top 20 Learning Resources for DevOps + Spring Boot Developers (2025)

June 23, 2025
Kubernetes

Top 10 IT Companies in India – 2025 Ranking by Revenue & Workforce

June 23, 2025
Kubernetes

Kubernetes Architecture for Beginners | Cluster Master | Node

June 10, 2025
@Copyright 2025
  • Docker
  • Technology
  • Learning & Education
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?