Scaling Applications in Kubernetes on GCP

Introduction/Issue:
During a period of increased traffic, our application faced performance bottlenecks due to insufficient resources. Manual intervention for scaling was inefficient, so we decided to implement auto-scaling for our Kubernetes workloads on Google Cloud Platform (GCP).

Why we need to do it/Cause of the issue:
As traffic grew, pods in the cluster became overwhelmed, leading to degraded performance and timeouts. The root cause was the absence of a dynamic scaling mechanism. Without scaling, we risked potential downtime and a poor user experience.

How do we solve it:
We implemented Horizontal Pod Autoscaler (HPA) and cluster auto-scaling on GCP:

1.Set Resource Requests and Limits:
We configured resources.requests and resources.limits in the deployment manifest to ensure Kubernetes could manage workloads efficiently.

resources:
requests:
memory: “512Mi”
cpu: “500m”
limits:
memory: “1Gi”
cpu: “1”

2.Enable HPA:
We deployed the HPA to scale pods based on CPU utilization.

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: app-hpa
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
minReplicas: 2
maxReplicas: 10
metrics:
– type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70

3. Cluster Auto-Scaler in GCP:
We ensured the Kubernetes cluster in GCP had auto-scaling enabled. This allowed the cluster to add nodes dynamically when workloads increased.

gcloud container clusters update my-cluster \
–enable-autoscaling \
–min-nodes 1 –max-nodes 5 –zone us-central1-a

4. Testing the Scaling:
To verify the setup, we performed load testing using tools like Locust and observed pods scaling up as expected.

Conclusion:
By implementing HPA and enabling cluster auto-scaling, we achieved a dynamic scaling solution that maintained application performance during traffic spikes. This proactive approach eliminated manual interventions and ensured a seamless user experience.

Dinesh I