How to Perform Zero-Downtime Rolling Updates in Kubernetes Deployments
Introduction
In modern microservice architectures, maintaining continuous availability during application updates is a non-negotiable requirement. Kubernetes Deployments simplify this process by offering automated Rolling Updates, a strategy designed to replace old versions of Pods with new ones incrementally.
Achieving true zero-downtime, however, requires more than just the default Kubernetes configuration. It mandates careful coordination between the Deployment manifest, the application's health endpoints, and the graceful termination process. This guide provides a comprehensive, step-by-step approach to configuring Kubernetes Deployments to ensure application updates are seamless and invisible to the end-user.
We will cover the critical role of readiness probes, how to tune the deployment strategy parameters (maxSurge and maxUnavailable), and best practices for application termination to eliminate service interruptions during deployment transitions.
Prerequisites for Zero-Downtime
Before configuring the Kubernetes manifest, the underlying application must adhere to certain principles to support zero-downtime deployments:
- Application Backward Compatibility: For the short period when both the old and new versions of the application are running simultaneously, they must be compatible with shared resources (databases, queues, caches).
- Idempotency: Operations that might be handled by both versions must be repeatable without negative side effects.
- Graceful Termination: The application must be programmed to recognize the
SIGTERMsignal sent by Kubernetes and gracefully stop accepting new connections while finishing in-flight requests before exiting.
Understanding the Kubernetes Rolling Update Strategy
Kubernetes Deployments default to the RollingUpdate strategy. This method ensures that the old application version is not entirely taken down before the new version is operational, managing the transition using two primary parameters:
| Parameter | Description | Zero-Downtime Impact |
|---|---|---|
maxSurge |
Defines the maximum number of Pods that can be created over the desired number of replicas. Can be an absolute number or a percentage (default: 25%). | Controls the speed of the rollout and ensures capacity increases temporarily. |
maxUnavailable |
Defines the maximum number of Pods that can be unavailable during the update. Can be an absolute number or a percentage (default: 25%). | Crucial for zero-downtime. Setting this to 0% means no serving Pods are terminated until the new Pods are fully Ready. |
Recommended Strategy for Zero Downtime
For the highest availability, the best configuration is often to ensure zero downtime loss of capacity:
maxUnavailable: 0(Ensure capacity never drops).maxSurge: 1or25%(Allow capacity to briefly exceed the target, ensuring a new Pod is ready before an old one is killed).
Step 1: Implementing Readiness Probes
The Readiness Probe is the single most important mechanism for ensuring zero-downtime updates. Kubernetes relies on this probe to determine if a new Pod is ready to receive user traffic and if an old Pod is still actively serving traffic.
Liveness vs. Readiness
- Liveness Probe: Tells Kubernetes whether the container is healthy and functional. If it fails, the container is restarted.
- Readiness Probe: Tells Kubernetes whether the container is ready to serve requests. If it fails, the Pod is removed from the associated Service endpoints, diverting traffic away from it until it becomes ready.
For rolling updates, the Readiness Probe is used to gate the transition. Kubernetes will not proceed to terminate an old Pod until a newly created Pod successfully passes its readiness check.
# deployment.yaml excerpt
spec:
containers:
- name: my-app
image: myregistry/my-app:v2.0
ports:
- containerPort: 8080
# --- Readiness Probe Configuration ---
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 15 # Time to wait before first probe attempt
periodSeconds: 5 # How often to perform the check
timeoutSeconds: 3
failureThreshold: 3 # Number of consecutive failures to mark Pod as not ready
# --- Liveness Probe Configuration (Standard Health Check) ---
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 60
periodSeconds: 10
Tip: Ensure your application's
/health/readyendpoint returns a success code (HTTP 200-299) only when initialization, database connections, and other external dependencies are fully operational.
Step 2: Configuring the Deployment Strategy
To mandate true zero-downtime, we explicitly configure the rolling update strategy to prevent any drop in the number of available replicas.
In this configuration, Kubernetes will first create a new Pod (maxSurge: 1). Once the new Pod passes its readiness probe, only then will Kubernetes terminate an old Pod. Since maxUnavailable is 0, service capacity never dips below the target replica count.
# deployment.yaml excerpt
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-web-deployment
spec:
replicas: 4
strategy:
type: RollingUpdate
rollingUpdate:
# Ensures capacity never drops below the desired replica count (4)
maxUnavailable: 0
# Allows one extra Pod to be created during the rollout
maxSurge: 1
template:
# ... container specification ...
Step 3: Ensuring Graceful Termination
Even with robust readiness probes, if the application shuts down instantly upon receiving the termination signal, it risks dropping in-flight requests.
Kubernetes follows a specific termination sequence:
- The Pod is marked as Terminating.
- The Pod is removed from the Service endpoints (traffic stops routing to it).
- The pre-stop hook (if defined) is executed.
- The container receives the
SIGTERMsignal. - Kubernetes waits for the duration defined by
terminationGracePeriodSeconds(default: 30 seconds). - If the container is still running, it receives a non-negotiable
SIGKILL.
To ensure graceful shutdown, the application must handle SIGTERM, and the terminationGracePeriodSeconds must be long enough for the application to finish existing requests.
# deployment.yaml excerpt, inside the container spec
terminationGracePeriodSeconds: 45 # Increased time for graceful shutdown
containers:
- name: my-app
image: myregistry/my-app:v2.0
lifecycle:
preStop:
exec:
# Example: Running a script to immediately drain connections
command: ["/bin/sh", "-c", "sleep 10"]
Best Practice: The time between the Pod failing its Readiness Probe (Step 2) and receiving
SIGTERM(Step 4) is critical. If your application handlesSIGTERMcorrectly, setting a slightly longerterminationGracePeriodSeconds(e.g., 45 or 60 seconds) prevents hard kills.
Step 4: Performing and Monitoring the Update
Once your Deployment manifest includes the optimized strategy and robust probes, performing the update is straightforward.
-
Update the Image Tag: Modify your deployment manifest to reflect the new image version (e.g.,
v2.0tov2.1). -
Apply the Configuration:
bash kubectl apply -f deployment.yamlAlternatively, you can patch the image directly:
bash kubectl set image deployment/my-web-deployment my-app=myregistry/my-app:v2.1 -
Monitor the Rollout Status: Watch Kubernetes progress through the stages, verifying that the number of ready Pods never dips below the desired count.
bash kubectl rollout status deployment/my-web-deployment -
Verify Pod Availability: Observe the Pod status to confirm the old Pods (v2.0) are gracefully terminated only after the new Pods (v2.1) are fully ready.
bash kubectl get pods -l app=my-web-deployment -w
Advanced Considerations
Using Pod Disruption Budgets (PDBs)
While a deployment strategy manages voluntary updates, a Pod Disruption Budget (PDB) ensures a minimum number of Pods are available even during unplanned disruptions (e.g., node maintenance, cluster upgrades). Although PDBs don't directly control the rolling update speed, they act as a safety net.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: my-app-pdb
spec:
minAvailable: 75% # Ensure at least 75% of replicas are always available
selector:
matchLabels:
app: my-web-deployment
The Importance of Initial Delay
If your application takes time to warm up (e.g., loading large configuration files or establishing caches), ensure the initialDelaySeconds in your Readiness Probe is sufficiently long. If the probe checks too early and fails, the Pod will be marked unhealthy and potentially stuck in a crash loop, halting the entire deployment.
Conclusion
Achieving true zero-downtime rolling updates in Kubernetes is a combination of robust platform configuration and disciplined application development. By correctly leveraging Readiness Probes to signal operational status, tuning the Deployment strategy (maxUnavailable: 0) to maintain capacity, and implementing graceful termination handlers, you can ensure application updates are performed reliably without disrupting service to your users. Always test your update process thoroughly in a staging environment to validate the termination grace period and probe logic.