Reducing AWS Costs by Right-Sizing Kubernetes Workloads

Introduction/Issue:
Running Kubernetes on AWS gave us flexibility, but we noticed our monthly bills were climbing fast. On closer inspection, we found workloads consuming way more resources than they needed, leading to over-provisioned nodes and wasted spend. As SREs, optimizing resource usage without compromising reliability became a priority.

Why it happens/Causes of the issue:
High AWS bills in Kubernetes often come from:

  • Over-Provisioned Requests: Developers request too much CPU/memory “just to be safe.”

  • Idle Workloads: Services running 24/7 when they could scale down.

  • Inefficient Node Sizes: Using larger instance types than necessary.

  • Lack of Autoscaling: No Horizontal Pod Autoscaler (HPA) or Cluster Autoscaler in place.

In our case, workloads had unnecessarily high CPU/memory requests, leading to underutilized EC2 nodes.

How we solved it (Step-by-step):

  1. Audit Current Resource Usage
    We used kubectl top pods and AWS CloudWatch metrics to identify pods with consistently low utilization compared to their requested resources.

    kubectl top pods --all-namespaces
  2. Right-Size Requests and Limits
    We adjusted resource requests/limits in the deployment YAMLs. For example:

    resources:
    requests:
    memory: "256Mi"
    cpu: "200m"
    limits:
    memory: "512Mi"
    cpu: "500m"

    Instead of the previous 2Gi memory and 1 CPU that the pod never used.

  3. Enable Horizontal Pod Autoscaler (HPA)

    kubectl autoscale deployment my-app --cpu-percent=70 --min=2 --max=10

    This allowed workloads to scale dynamically based on real demand.

  4. Leverage Cluster Autoscaler
    Installed AWS Cluster Autoscaler so unused nodes scale down automatically when pods don’t need them.

  5. Use Smaller EC2 Instances
    Shifted from m5.large to t3.medium instances where applicable, reducing per-node costs.

  6. Set Up Monitoring & Alerts
    Configured Prometheus + Grafana dashboards to continuously monitor pod and node utilization, ensuring workloads remained efficient.

Conclusion:
By right-sizing resource requests, enabling autoscaling, and optimizing EC2 instances, we reduced our AWS Kubernetes costs by nearly 35% without impacting performance. The lesson: Always monitor real resource usage, avoid over-provisioning, and let automation (HPA/Cluster Autoscaler) handle fluctuations. Cost optimization isn’t a one-time task—it’s a continuous SRE responsibility.

Recent Posts