Introduction/Issue:
During a period of increased traffic, our application faced performance bottlenecks due to insufficient resources. Manual intervention for scaling was inefficient, so we decided to implement auto-scaling for our Kubernetes workloads on Google Cloud Platform (GCP).
Why we need to do it/Cause of the issue:
As traffic grew, pods in the cluster became overwhelmed, leading to degraded performance and timeouts. The root cause was the absence of a dynamic scaling mechanism. Without scaling, we risked potential downtime and a poor user experience.
How do we solve it:
We implemented Horizontal Pod Autoscaler (HPA) and cluster auto-scaling on GCP:
1.Set Resource Requests and Limits:
We configured resources.requests and resources.limits in the deployment manifest to ensure Kubernetes could manage workloads efficiently.
resources: 
requests: 
memory: “512Mi” 
cpu: “500m” 
limits: 
memory: “1Gi” 
cpu: “1” 
2.Enable HPA:
We deployed the HPA to scale pods based on CPU utilization.
apiVersion: autoscaling/v2 
kind: HorizontalPodAutoscaler 
metadata: 
name: app-hpa 
spec: 
scaleTargetRef: 
apiVersion: apps/v1 
kind: Deployment 
name: my-app 
minReplicas: 2 
maxReplicas: 10 
metrics: 
– type: Resource 
resource: 
name: cpu 
target: 
type: Utilization 
averageUtilization: 70 
3. Cluster Auto-Scaler in GCP:
We ensured the Kubernetes cluster in GCP had auto-scaling enabled. This allowed the cluster to add nodes dynamically when workloads increased.
gcloud container clusters update my-cluster \ 
–enable-autoscaling \ 
–min-nodes 1 –max-nodes 5 –zone us-central1-a 
4. Testing the Scaling:
To verify the setup, we performed load testing using tools like Locust and observed pods scaling up as expected.
Conclusion:
By implementing HPA and enabling cluster auto-scaling, we achieved a dynamic scaling solution that maintained application performance during traffic spikes. This proactive approach eliminated manual interventions and ensured a seamless user experience.
Recent Posts