GKE Standard Cluster Autoscaling¶

This tutorial demonstrates how to use the Cluster Autoscaler (CA) in GKE Standard.

The Cluster Autoscaler automatically resizes the number of nodes in a given node pool, based on the demands of your workloads. You don't need to manually add or remove nodes or over-provision your cluster.

Autoscaling Criteria: When does it scale?¶

The Cluster Autoscaler monitors your cluster and triggers scaling events based on specific criteria.

Criteria	Description	Scaling Action
Insufficient Capacity	Pods are in `Pending` state because no existing node has enough free CPU or Memory to satisfy the Pod's Requests.	Scale Up: Adds a new node that matches the requirements.
Node Selectors / Affinity	Pods require a node with specific labels (e.g., `gpu=true`) or affinity rules that current nodes cannot satisfy.	Scale Up: Adds a node with the required labels/taints if a matching Node Pool exists.
Pod Priority	High-priority Pods are pending.	Scale Up: Adds nodes to accommodate high-priority pods (may preempt lower-priority pods).
Underutilized Nodes	A node has low utilization (e.g., < 50% requested) and its pods can be moved to other nodes.	Scale Down: Moves pods to other nodes and deletes the underutilized node to save costs.
Max Node Limit	The Node Pool has reached its configured `--max-nodes` limit.	No Action: Scaling stops. Pods remain `Pending` until capacity is freed or limits are increased.

GKE Standard vs. Autopilot¶

Feature	GKE Standard	GKE Autopilot
Autoscaling	Manual Setup: You must explicitly enable Cluster Autoscaler and define min/max nodes per node pool.	Automatic: Node Auto-provisioning is built-in. You don't manage node pools or autoscalers.
Configuration	Flexible. You can have some pools fixed and others autoscaling.	Fully managed. Google creates nodes as needed.
Cost	You pay for the Nodes (even if partially empty). Autoscaler helps minimize waste.	You pay for the Pods (resources requested). No cost for empty node space.

Autoscaling Strategies in Standard¶

In GKE Standard, you have two main ways to autoscale nodes:

1. Per-Node Pool Autoscaling (Traditional)¶

This is the default method. You create specific node pools (e.g., "cpu-pool", "gpu-pool") and set a minimum and maximum size for each.

Behavior: The autoscaler can only add/remove nodes within existing pools.
Limitation: If you deploy a Pod requesting a GPU, but you don't have a GPU node pool, the Pod will remain Pending forever. The autoscaler cannot "create" a new type of node pool.

2. Node Auto-Provisioning (NAP)¶

NAP extends the Cluster Autoscaler. It allows GKE to automatically create new node pools based on pending pod specifications.

Behavior: If a pending Pod needs a specific resource (like a specific CPU/RAM ratio or a GPU) and no existing pool matches, NAP creates a brand new Node Pool optimized for that Pod.
Benefit: You don't need to pre-create every possible node pool type. It reduces management overhead and can optimize costs by picking the "right-sized" machine type.

Enabling NAP:

gcloud container clusters update $CLUSTER_NAME \
    --enable-autoprovisioning \
    --min-cpu 1 \
    --min-memory 1 \
    --max-cpu 100 \
    --max-memory 1024

Step 1: Create Cluster with Autoscaling¶

We will create a specific Node Pool with autoscaling enabled.

Set Variables:

export PROJECT_ID=$(gcloud config get-value project)
export REGION=us-central1
export CLUSTER_NAME=gke-autoscaling-demo

Create Cluster: We use --enable-autoscaling along with --min-nodes and --max-nodes.

gcloud container clusters create $CLUSTER_NAME \
    --region $REGION \
    --num-nodes 1 \
    --enable-autoscaling \
    --min-nodes 1 \
    --max-nodes 3 \
    --machine-type e2-medium \
    --enable-ip-alias

--num-nodes 1: Starts with 1 node per zone.
--min-nodes 1: Scale down limit (per zone).
--max-nodes 3: Scale up limit (per zone).
--max-nodes 3: Scale up limit (per zone).

Step 2: Configure kubectl Access¶

Get the authentication credentials for the cluster. This configures kubectl to talk to your new cluster.

gcloud container clusters get-credentials $CLUSTER_NAME --region $REGION

Install GKE Auth Plugin

Starting with GKE v1.26+, the separate auth plugin is required.

Using gcloud components (Recommended):

gcloud components install gke-gcloud-auth-plugin

Debian/Ubuntu:

sudo apt-get install google-cloud-sdk-gke-gcloud-auth-plugin

Red Hat/CentOS:

sudo yum install google-cloud-sdk-gke-gcloud-auth-plugin

Step 3: Deploy Sample Application¶

We will deploy a PHP-Apache application. Crucially, we must define resource requests. Without requests, the Autoscaler doesn't know "how big" a pod is and cannot make scaling decisions.

Create Deployment:

kubectl create deployment php-apache --image=registry.k8s.io/hpa-example

Set Resource Requests: We set a request of 200m CPU per pod. An e2-medium node has 2 vCPUs (approx 2000m, but some is reserved for system).
```
kubectl set resources deployment php-apache --requests=cpu=500m
```
* Each pod enables 500 millicores (0.5 vCPU).

Step 4: Trigger Scaling (Scale Up)¶

Check current nodes:
```
kubectl get nodes
```
(Should see 1 node per zone, e.g., 3 nodes total if in a region, or 1 if zonal).
Scale the Deployment: We will scale replicas to a number that exceeds the cluster's current capacity.
```
kubectl scale deployment php-apache --replicas=10
```
* Total Request: 10 * 500m = 5000m (5 vCPUs). * Current Capacity (assuming 1 zonal node): ~1.5 vCPU allocatable. * Result: Pending Pods.
Watch Pods:
```
kubectl get pods -w
```
You will see some pods Running and many Pending.
Inspect Pending Pod:
```
kubectl describe pod <PENDING_POD_NAME>
```
Look for the Events section. You should see FailedScheduling with message Insufficient cpu. This is the trigger for the Autoscaler.

Step 5: Observe Autoscaler Action¶

Watch the nodes scaling up.

kubectl get nodes -w

After a minute or so, you will see new nodes joining the cluster. Once they are Ready, the pending pods will be scheduled and turn to Running.

Step 6: Scale Down¶

The Autoscaler also removes underutilized nodes.

Scale Down Deployment:

kubectl scale deployment php-apache --replicas=1

Wait: The scale-down process is conservative. It waits (default ~10 minutes) to ensure the drop in load is not a temporary spike.

Eventually, you will see nodes being Cordoned (unschedulable) and then removed.

Quiz¶

#

What is the primary metric that triggers the Cluster Autoscaler to add a new node?

#

In GKE Autopilot, how do you enable Cluster Autoscaler?

#

What happens if your workload requires more nodes than the --max-nodes limit?

Cleanup¶

gcloud container clusters delete $CLUSTER_NAME --region $REGION --quiet

📬 DevopsPilot Weekly — Learn DevOps, Cloud & Gen AI the simple way.
👉 Subscribe here