How to Perform Zero-Downtime Rolling Updates in Kubernetes Deployments

Introduction

In modern microservice architectures, maintaining continuous availability during application updates is a non-negotiable requirement. Kubernetes Deployments simplify this process by offering automated Rolling Updates, a strategy designed to replace old versions of Pods with new ones incrementally.

Achieving true zero-downtime, however, requires more than just the default Kubernetes configuration. It mandates careful coordination between the Deployment manifest, the application's health endpoints, and the graceful termination process. This guide provides a comprehensive, step-by-step approach to configuring Kubernetes Deployments to ensure application updates are seamless and invisible to the end-user.

We will cover the critical role of readiness probes, how to tune the deployment strategy parameters (maxSurge and maxUnavailable), and best practices for application termination to eliminate service interruptions during deployment transitions.

Prerequisites for Zero-Downtime

Before configuring the Kubernetes manifest, the underlying application must adhere to certain principles to support zero-downtime deployments:

Application Backward Compatibility: For the short period when both the old and new versions of the application are running simultaneously, they must be compatible with shared resources (databases, queues, caches).
Idempotency: Operations that might be handled by both versions must be repeatable without negative side effects.
Graceful Termination: The application must be programmed to recognize the SIGTERM signal sent by Kubernetes and gracefully stop accepting new connections while finishing in-flight requests before exiting.

Understanding the Kubernetes Rolling Update Strategy

Kubernetes Deployments default to the RollingUpdate strategy. This method ensures that the old application version is not entirely taken down before the new version is operational, managing the transition using two primary parameters:

Parameter	Description	Zero-Downtime Impact
`maxSurge`	Defines the maximum number of Pods that can be created over the desired number of replicas. Can be an absolute number or a percentage (default: 25%).	Controls the speed of the rollout and ensures capacity increases temporarily.
`maxUnavailable`	Defines the maximum number of Pods that can be unavailable during the update. Can be an absolute number or a percentage (default: 25%).	Crucial for zero-downtime. Setting this to 0% means no serving Pods are terminated until the new Pods are fully `Ready`.

Recommended Strategy for Zero Downtime

For the highest availability, the best configuration is often to ensure zero downtime loss of capacity:

maxUnavailable: 0 (Ensure capacity never drops).
maxSurge: 1 or 25% (Allow capacity to briefly exceed the target, ensuring a new Pod is ready before an old one is killed).

Step 1: Implementing Readiness Probes

The Readiness Probe is the single most important mechanism for ensuring zero-downtime updates. Kubernetes relies on this probe to determine if a new Pod is ready to receive user traffic and if an old Pod is still actively serving traffic.

Liveness vs. Readiness

Liveness Probe: Tells Kubernetes whether the container is healthy and functional. If it fails, the container is restarted.
Readiness Probe: Tells Kubernetes whether the container is ready to serve requests. If it fails, the Pod is removed from the associated Service endpoints, diverting traffic away from it until it becomes ready.

For rolling updates, the Readiness Probe is used to gate the transition. Kubernetes will not proceed to terminate an old Pod until a newly created Pod successfully passes its readiness check.

# deployment.yaml excerpt
spec:
  containers:
  - name: my-app
    image: myregistry/my-app:v2.0
    ports:
    - containerPort: 8080

    # --- Readiness Probe Configuration ---
    readinessProbe:
      httpGet:
        path: /health/ready
        port: 8080
      initialDelaySeconds: 15  # Time to wait before first probe attempt
      periodSeconds: 5         # How often to perform the check
      timeoutSeconds: 3
      failureThreshold: 3      # Number of consecutive failures to mark Pod as not ready

    # --- Liveness Probe Configuration (Standard Health Check) ---
    livenessProbe:
      httpGet:
        path: /health/live
        port: 8080
      initialDelaySeconds: 60
      periodSeconds: 10

Tip: Ensure your application's /health/ready endpoint returns a success code (HTTP 200-299) only when initialization, database connections, and other external dependencies are fully operational.

Step 2: Configuring the Deployment Strategy

To mandate true zero-downtime, we explicitly configure the rolling update strategy to prevent any drop in the number of available replicas.

In this configuration, Kubernetes will first create a new Pod (maxSurge: 1). Once the new Pod passes its readiness probe, only then will Kubernetes terminate an old Pod. Since maxUnavailable is 0, service capacity never dips below the target replica count.

# deployment.yaml excerpt
apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-web-deployment
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      # Ensures capacity never drops below the desired replica count (4)
      maxUnavailable: 0 
      # Allows one extra Pod to be created during the rollout
      maxSurge: 1 
  template:
    # ... container specification ...

Step 3: Ensuring Graceful Termination

Even with robust readiness probes, if the application shuts down instantly upon receiving the termination signal, it risks dropping in-flight requests.

Kubernetes follows a specific termination sequence:

The Pod is marked as Terminating.
The Pod is removed from the Service endpoints (traffic stops routing to it).
The pre-stop hook (if defined) is executed.
The container receives the SIGTERM signal.
Kubernetes waits for the duration defined by terminationGracePeriodSeconds (default: 30 seconds).
If the container is still running, it receives a non-negotiable SIGKILL.

To ensure graceful shutdown, the application must handle SIGTERM, and the terminationGracePeriodSeconds must be long enough for the application to finish existing requests.

# deployment.yaml excerpt, inside the container spec
    terminationGracePeriodSeconds: 45 # Increased time for graceful shutdown
    containers:
    - name: my-app
      image: myregistry/my-app:v2.0

      lifecycle:
        preStop:
          exec:
            # Example: Running a script to immediately drain connections
            command: ["/bin/sh", "-c", "sleep 10"]

Best Practice: The time between the Pod failing its Readiness Probe (Step 2) and receiving SIGTERM (Step 4) is critical. If your application handles SIGTERM correctly, setting a slightly longer terminationGracePeriodSeconds (e.g., 45 or 60 seconds) prevents hard kills.

Step 4: Performing and Monitoring the Update

Once your Deployment manifest includes the optimized strategy and robust probes, performing the update is straightforward.

Update the Image Tag: Modify your deployment manifest to reflect the new image version (e.g., v2.0 to v2.1).
Apply the Configuration:

bash kubectl apply -f deployment.yaml

Alternatively, you can patch the image directly:

bash kubectl set image deployment/my-web-deployment my-app=myregistry/my-app:v2.1
Monitor the Rollout Status: Watch Kubernetes progress through the stages, verifying that the number of ready Pods never dips below the desired count.

bash kubectl rollout status deployment/my-web-deployment
Verify Pod Availability: Observe the Pod status to confirm the old Pods (v2.0) are gracefully terminated only after the new Pods (v2.1) are fully ready.

bash kubectl get pods -l app=my-web-deployment -w

Advanced Considerations

Using Pod Disruption Budgets (PDBs)

While a deployment strategy manages voluntary updates, a Pod Disruption Budget (PDB) ensures a minimum number of Pods are available even during unplanned disruptions (e.g., node maintenance, cluster upgrades). Although PDBs don't directly control the rolling update speed, they act as a safety net.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 75%  # Ensure at least 75% of replicas are always available
  selector:
    matchLabels:
      app: my-web-deployment

The Importance of Initial Delay

If your application takes time to warm up (e.g., loading large configuration files or establishing caches), ensure the initialDelaySeconds in your Readiness Probe is sufficiently long. If the probe checks too early and fails, the Pod will be marked unhealthy and potentially stuck in a crash loop, halting the entire deployment.

Conclusion

Achieving true zero-downtime rolling updates in Kubernetes is a combination of robust platform configuration and disciplined application development. By correctly leveraging Readiness Probes to signal operational status, tuning the Deployment strategy (maxUnavailable: 0) to maintain capacity, and implementing graceful termination handlers, you can ensure application updates are performed reliably without disrupting service to your users. Always test your update process thoroughly in a staging environment to validate the termination grace period and probe logic.