What Cluster Autoscaler Does? When Does Cluster Autoscaler Decrease Desired Capacity?
What Cluster Autoscaler Does
-
The Cluster Autoscaler (commonly used with Kubernetes on AWS, GCP, or Azure) automatically adjusts the size of your node groups (e.g., ASGs in AWS) based on the current workload.
-
It increases the desired capacity when:
-
There are unscheduled pods that need resources.
-
-
It decreases the desired capacity when:
-
Nodes are underutilized and their pods can be safely moved elsewhere.
How It Works with ASG's desired_capacity
-
The autoscaler directly modifies the
desired_capacity
of the ASG to scale up/down. -
This change happens outside of Terraform, which means:
-
Terraform will see a drift if
desired_capacity
in the.tf
file is fixed/static. -
If
ignore_changes = [desired_capacity]
is not set, Terraform will try to revert it back during the next apply.
Best Practice with Terraform + Cluster Autoscaler
To avoid conflict between Terraform and Cluster Autoscaler:
This tells Terraform:
"Let the Cluster Autoscaler manage the desired capacity dynamically; don't try to enforce the value from the Terraform config."
When Does Cluster Autoscaler Decrease Desired Capacity?
The Cluster Autoscaler will scale down (i.e., reduce desired capacity) when all of the following conditions are met:
-
Node is underutilized
-
The node has very low CPU and memory usage (below configured thresholds).
-
Default: CPU and memory usage < 50%.
-
-
Pods on the node can be moved
-
All non-daemonset, non-static pods running on the node can be rescheduled to other nodes without causing disruption.
-
There must be enough capacity elsewhere in the cluster to move the pods.
-
-
Grace period is met
-
The node has been underutilized for a certain period (default: 10 minutes).
-
-
No scale-down blockers are active, such as:
-
PodDisruptionBudgets (PDBs) that prevent eviction.
-
Pods with local storage.
-
Recently started pods or nodes (default cooldown period: 10 minutes).
-
Non-replicated pods that can’t be moved.
What Happens During Scale Down?
If all conditions are satisfied:
-
Cluster Autoscaler marks the node as deletable.
-
It evicts the pods.
-
It calls the cloud provider API (e.g., AWS ASG) to reduce the desired capacity by 1.
-
The Auto Scaling Group terminates the instance.
Terraform conflict: If Terraform manages the ASG’s desired_capacity
, and doesn’t ignore changes, it might revert this scale-down during the next terraform apply
.
Termination grace period: The pods on the node are gracefully evicted using the standard pod termination process.
Example: AWS with Cluster Autoscaler
If an AWS ASG has min_size = 1
, max_size = 5
, and desired_capacity = 3
:
-
Cluster Autoscaler may reduce
desired_capacity
to 2 or 1 based on utilization. -
But it will never go below
min_size
(1 in this case).
Key Threshold Flags for Scale Down
You configure these thresholds in the Cluster Autoscaler deployment YAML under the container's command:
or args:
section.
Here are the most common flags:
1. --scale-down-utilization-threshold
-
Default:
0.5
(50%) -
Meaning: If a node’s average CPU and memory usage is below this threshold, it's considered underutilized.
-
Example:
2. --scale-down-unneeded-time
-
Default:
10m
-
Meaning: Node must be underutilized for this duration before it's eligible for scale-down.
-
Example:
3. --scale-down-delay-after-add
-
Default:
10m
-
Meaning: Time to wait after adding a new node before considering it for scale-down.
-
Example:
4. --scale-down-delay-after-delete
-
Default:
0s
-
Meaning: Delay after a scale-down before another scale-down is allowed.
5. --scale-down-enabled
-
Default:
true
-
Set to
false
if you want to disable scale-down entirely.
Example (Kubernetes Deployment YAML Snippet)
How to Update
-
Edit the Cluster Autoscaler deployment:
-
Modify the
args:
under the container spec. -
Save and exit — Kubernetes will automatically restart the pod with the new settings.
What is a PodDisruptionBudget (PDB)?
A PDB defines the number of pods in a group (usually from a Deployment or StatefulSet) that must remain available during voluntary disruptions, like:
-
Draining a node (e.g., for Cluster Autoscaler scale-down)
-
Manual
kubectl drain
-
Rolling updates
-
Node upgrades
It does not protect against involuntary disruptions like node crashes.
How PDBs Prevent Eviction
When the Cluster Autoscaler tries to delete a node, it must evict all pods from it. But if evicting even one pod would violate the PDB, the eviction is blocked and the node can't be scaled down.
Example Scenario:
-
You have a Deployment with 3 replicas.
-
You apply this PDB:
-
All 3 pods are running across 3 nodes.
-
If Cluster Autoscaler tries to scale down a node with one of these pods:
-
It tries to evict the pod.
-
Evicting it would reduce the available pods to 2, violating
minAvailable: 3
. -
Eviction fails, and the node remains.
-
How to Avoid This Problem
-
Use a Realistic
minAvailable
ormaxUnavailable
-
Don’t set
minAvailable
equal to total replicas unless you truly require 100% availability. -
Use
maxUnavailable: 1
for more flexibility.
-
-
Use
kubectl describe pdb
to debug -
Look for scale-down blockers in Cluster Autoscaler logs
-
Run
kubectl drain --dry-run=client
to simulate eviction
This can show you which PDBs would block a node from draining.
Sample Corrected PDB
This allows one pod to be evicted at any time — enabling scale-down while still protecting service availability.
No comments