How to Perform Zero-Downtime Rolling Updates in Kubernetes Deployments

Configure Kubernetes rolling updates with readiness probes, maxSurge, maxUnavailable, and graceful shutdown.

How to Perform Zero-Downtime Rolling Updates in Kubernetes Deployments

Kubernetes rolling updates can replace Pods without a visible outage, but only if your Deployment and your application agree on when a Pod is ready and how it should shut down. The default strategy helps, but it does not save you from bad readiness probes, incompatible releases, or dropped in-flight requests.

Achieving true zero-downtime, however, requires more than just the default Kubernetes configuration. It mandates careful coordination between the Deployment manifest, the application's health endpoints, and the graceful termination process. This guide provides a comprehensive, step-by-step approach to configuring Kubernetes Deployments to ensure application updates are seamless and invisible to the end-user.

This guide covers readiness probes, maxSurge, maxUnavailable, and graceful termination so your service keeps capacity during a rollout.

Prerequisites for Zero-Downtime

Before configuring the Kubernetes manifest, the underlying application must adhere to certain principles to support zero-downtime deployments:

  1. Application Backward Compatibility: For the short period when both the old and new versions of the application are running simultaneously, they must be compatible with shared resources (databases, queues, caches).
  2. Idempotency: Operations that might be handled by both versions must be repeatable without negative side effects.
  3. Graceful Termination: The application must be programmed to recognize the SIGTERM signal sent by Kubernetes and gracefully stop accepting new connections while finishing in-flight requests before exiting.

Understanding the Kubernetes Rolling Update Strategy

Kubernetes Deployments default to the RollingUpdate strategy. This method ensures that the old application version is not entirely taken down before the new version is operational, managing the transition using two primary parameters:

Parameter Description Zero-Downtime Impact
maxSurge Defines the maximum number of Pods that can be created over the desired number of replicas. Can be an absolute number or a percentage (default: 25%). Controls the speed of the rollout and ensures capacity increases temporarily.
maxUnavailable Defines the maximum number of Pods that can be unavailable during the update. Can be an absolute number or a percentage (default: 25%). Crucial for zero-downtime. Setting this to 0% means no serving Pods are terminated until the new Pods are fully Ready.

Recommended Strategy for Zero Downtime

For the highest availability, the best configuration is often to ensure zero downtime loss of capacity:

  • maxUnavailable: 0 (Ensure capacity never drops).
  • maxSurge: 1 or 25% (Allow capacity to briefly exceed the target, ensuring a new Pod is ready before an old one is killed).

Step 1: Implementing Readiness Probes

The Readiness Probe is the single most important mechanism for ensuring zero-downtime updates. Kubernetes relies on this probe to determine if a new Pod is ready to receive user traffic and if an old Pod is still actively serving traffic.

Liveness vs. Readiness

  • Liveness Probe: Tells Kubernetes whether the container is healthy and functional. If it fails, the container is restarted.
  • Readiness Probe: Tells Kubernetes whether the container is ready to serve requests. If it fails, the Pod is removed from the associated Service endpoints, diverting traffic away from it until it becomes ready.

For rolling updates, the Readiness Probe is used to gate the transition. Kubernetes will not proceed to terminate an old Pod until a newly created Pod successfully passes its readiness check.

# deployment.yaml excerpt
spec:
  containers:
  - name: my-app
    image: myregistry/my-app:v2.0
    ports:
    - containerPort: 8080
    
    # --- Readiness Probe Configuration ---
    readinessProbe:
      httpGet:
        path: /health/ready
        port: 8080
      initialDelaySeconds: 15  # Time to wait before first probe attempt
      periodSeconds: 5         # How often to perform the check
      timeoutSeconds: 3
      failureThreshold: 3      # Number of consecutive failures to mark Pod as not ready

    # --- Liveness Probe Configuration (Standard Health Check) ---
    livenessProbe:
      httpGet:
        path: /health/live
        port: 8080
      initialDelaySeconds: 60
      periodSeconds: 10

Tip: Ensure your application's /health/ready endpoint returns a success code (HTTP 200-299) only when initialization, database connections, and other external dependencies are fully operational.

Step 2: Configuring the Deployment Strategy

To mandate true zero-downtime, we explicitly configure the rolling update strategy to prevent any drop in the number of available replicas.

In this configuration, Kubernetes will first create a new Pod (maxSurge: 1). Once the new Pod passes its readiness probe, only then will Kubernetes terminate an old Pod. Since maxUnavailable is 0, service capacity never dips below the target replica count.

# deployment.yaml excerpt
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-web-deployment
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      # Ensures capacity never drops below the desired replica count (4)
      maxUnavailable: 0 
      # Allows one extra Pod to be created during the rollout
      maxSurge: 1 
  template:
    # ... container specification ...

Step 3: Ensuring Graceful Termination

Even with robust readiness probes, if the application shuts down instantly upon receiving the termination signal, it risks dropping in-flight requests.

Kubernetes follows a specific termination sequence:

  1. The Pod is marked as Terminating.
  2. The Pod is removed from the Service endpoints (traffic stops routing to it).
  3. The pre-stop hook (if defined) is executed.
  4. The container receives the SIGTERM signal.
  5. Kubernetes waits for the duration defined by terminationGracePeriodSeconds (default: 30 seconds).
  6. If the container is still running, it receives a non-negotiable SIGKILL.

To ensure graceful shutdown, the application must handle SIGTERM, and the terminationGracePeriodSeconds must be long enough for the application to finish existing requests.

# deployment.yaml excerpt, inside the Pod template spec
spec:
  terminationGracePeriodSeconds: 45 # Pod-level setting
  containers:
  - name: my-app
    image: myregistry/my-app:v2.0
    lifecycle:
      preStop:
        exec:
          # Gives endpoint updates and external load balancers time to drain.
          command: ["/bin/sh", "-c", "sleep 10"]

Best Practice: Your application should stop accepting new work when it receives SIGTERM, then finish in-flight requests before exiting. A slightly longer terminationGracePeriodSeconds, such as 45 or 60 seconds, helps prevent hard kills for slower requests.

Step 4: Performing and Monitoring the Update

Once your Deployment manifest includes the optimized strategy and robust probes, performing the update is straightforward.

  1. Update the Image Tag: Modify your deployment manifest to reflect the new image version (e.g., v2.0 to v2.1).

  2. Apply the Configuration:

    kubectl apply -f deployment.yaml
    

    Alternatively, you can patch the image directly:

    kubectl set image deployment/my-web-deployment my-app=myregistry/my-app:v2.1
    
  3. Monitor the Rollout Status: Watch Kubernetes progress through the stages, verifying that the number of ready Pods never dips below the desired count.

    kubectl rollout status deployment/my-web-deployment
    
  4. Verify Pod Availability: Observe the Pod status to confirm the old Pods (v2.0) are gracefully terminated only after the new Pods (v2.1) are fully ready.

    kubectl get pods -l app=my-web-deployment -w
    

Advanced Considerations

Using Pod Disruption Budgets (PDBs)

While a deployment strategy manages rollouts, a Pod Disruption Budget (PDB) limits voluntary disruptions such as node drains and some cluster maintenance operations. It does not prevent every unplanned failure, but it gives Kubernetes and automation tools a minimum availability target to respect.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 75%  # Ensure at least 75% of replicas are always available
  selector:
    matchLabels:
      app: my-web-deployment

The Importance of Initial Delay

If your application takes time to warm up, tune initialDelaySeconds, periodSeconds, and failureThreshold so readiness reflects real startup behavior. A failing readiness probe keeps the Pod out of Service endpoints; a failing liveness probe is what can restart the container and create a crash loop.

Roll Out Safely

Achieving true zero-downtime rolling updates in Kubernetes is a combination of robust platform configuration and disciplined application development. By correctly leveraging Readiness Probes to signal operational status, tuning the Deployment strategy (maxUnavailable: 0) to maintain capacity, and implementing graceful termination handlers, you can ensure application updates are performed reliably without disrupting service to your users. Always test your update process thoroughly in a staging environment to validate the termination grace period and probe logic.