-->

DEVOPSZONES

  • Recent blogs

    What Cluster Autoscaler Does? When Does Cluster Autoscaler Decrease Desired Capacity?

    What Cluster Autoscaler Does

    • The Cluster Autoscaler (commonly used with Kubernetes on AWS, GCP, or Azure) automatically adjusts the size of your node groups (e.g., ASGs in AWS) based on the current workload.

    • It increases the desired capacity when:

      • There are unscheduled pods that need resources.

    • It decreases the desired capacity when:

      • Nodes are underutilized and their pods can be safely moved elsewhere.


    How It Works with ASG's desired_capacity

    • The autoscaler directly modifies the desired_capacity of the ASG to scale up/down.

    • This change happens outside of Terraform, which means:

      • Terraform will see a drift if desired_capacity in the .tf file is fixed/static.

      • If ignore_changes = [desired_capacity] is not set, Terraform will try to revert it back during the next apply.

    Best Practice with Terraform + Cluster Autoscaler

    To avoid conflict between Terraform and Cluster Autoscaler:

    lifecycle { ignore_changes = [desired_capacity] }

    This tells Terraform:

    "Let the Cluster Autoscaler manage the desired capacity dynamically; don't try to enforce the value from the Terraform config."

    When Does Cluster Autoscaler Decrease Desired Capacity?

    The Cluster Autoscaler will scale down (i.e., reduce desired capacity) when all of the following conditions are met:

    1. Node is underutilized

      • The node has very low CPU and memory usage (below configured thresholds).

      • Default: CPU and memory usage < 50%.

    2. Pods on the node can be moved

      • All non-daemonset, non-static pods running on the node can be rescheduled to other nodes without causing disruption.

      • There must be enough capacity elsewhere in the cluster to move the pods.

    3. Grace period is met

      • The node has been underutilized for a certain period (default: 10 minutes).

    4. No scale-down blockers are active, such as:

      • PodDisruptionBudgets (PDBs) that prevent eviction.

      • Pods with local storage.

      • Recently started pods or nodes (default cooldown period: 10 minutes).

      • Non-replicated pods that can’t be moved.

    What Happens During Scale Down?

    If all conditions are satisfied:

    • Cluster Autoscaler marks the node as deletable.

    • It evicts the pods.

    • It calls the cloud provider API (e.g., AWS ASG) to reduce the desired capacity by 1.

    • The Auto Scaling Group terminates the instance.


  • Terraform conflict: If Terraform manages the ASG’s desired_capacity, and doesn’t ignore changes, it might revert this scale-down during the next terraform apply.

  • Termination grace period: The pods on the node are gracefully evicted using the standard pod termination process.

  • Example: AWS with Cluster Autoscaler

    If an AWS ASG has min_size = 1, max_size = 5, and desired_capacity = 3:

    • Cluster Autoscaler may reduce desired_capacity to 2 or 1 based on utilization.

    • But it will never go below min_size (1 in this case).

    Key Threshold Flags for Scale Down

    You configure these thresholds in the Cluster Autoscaler deployment YAML under the container's command: or args: section.

    Here are the most common flags:

    1. --scale-down-utilization-threshold

    • Default: 0.5 (50%)

    • Meaning: If a node’s average CPU and memory usage is below this threshold, it's considered underutilized.

    • Example:


      - --scale-down-utilization-threshold=0.4

    2. --scale-down-unneeded-time

    • Default: 10m

    • Meaning: Node must be underutilized for this duration before it's eligible for scale-down.

    • Example:


      - --scale-down-unneeded-time=5m

    3. --scale-down-delay-after-add

    • Default: 10m

    • Meaning: Time to wait after adding a new node before considering it for scale-down.

    • Example:


      - --scale-down-delay-after-add=5m

    4. --scale-down-delay-after-delete

    • Default: 0s

    • Meaning: Delay after a scale-down before another scale-down is allowed.

    5. --scale-down-enabled

    • Default: true

    • Set to false if you want to disable scale-down entirely.


    Example (Kubernetes Deployment YAML Snippet)


    spec: containers: - name: cluster-autoscaler image: k8s.gcr.io/autoscaling/cluster-autoscaler:v1.27.0 command: - ./cluster-autoscaler args: - --cloud-provider=aws - --nodes=1:5:my-node-group - --scale-down-utilization-threshold=0.4 - --scale-down-unneeded-time=5m - --scale-down-delay-after-add=2m

    How to Update

    1. Edit the Cluster Autoscaler deployment:

      kubectl edit deployment cluster-autoscaler -n kube-system
    2. Modify the args: under the container spec.

    3. Save and exit — Kubernetes will automatically restart the pod with the new settings.

    What is a PodDisruptionBudget (PDB)?

    A PDB defines the number of pods in a group (usually from a Deployment or StatefulSet) that must remain available during voluntary disruptions, like:

    • Draining a node (e.g., for Cluster Autoscaler scale-down)

    • Manual kubectl drain

    • Rolling updates

    • Node upgrades

    It does not protect against involuntary disruptions like node crashes.

    How PDBs Prevent Eviction

    When the Cluster Autoscaler tries to delete a node, it must evict all pods from it. But if evicting even one pod would violate the PDB, the eviction is blocked and the node can't be scaled down.

    Example Scenario:

    • You have a Deployment with 3 replicas.

    • You apply this PDB:


      apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: my-app-pdb spec: minAvailable: 3 selector: matchLabels: app: my-app
    • All 3 pods are running across 3 nodes.

    • If Cluster Autoscaler tries to scale down a node with one of these pods:

      • It tries to evict the pod.

      • Evicting it would reduce the available pods to 2, violating minAvailable: 3.

      • Eviction fails, and the node remains.

    How to Avoid This Problem

    1. Use a Realistic minAvailable or maxUnavailable

      • Don’t set minAvailable equal to total replicas unless you truly require 100% availability.

      • Use maxUnavailable: 1 for more flexibility.

    2. Use kubectl describe pdb to debug

      kubectl describe pdb <pdb-name>
    3. Look for scale-down blockers in Cluster Autoscaler logs

      Not removing node ip-10-0-0-12 because it has non-evictable pods due to PDB: my-app-pdb
    4. Run kubectl drain --dry-run=client to simulate eviction
      This can show you which PDBs would block a node from draining.

     Sample Corrected PDB

    apiVersion: policy/v1 kind: PodDisruptionBudget metadata: name: my-app-pdb spec: maxUnavailable: 1 selector: matchLabels: app: my-app

    This allows one pod to be evicted at any time — enabling scale-down while still protecting service availability.


    No comments