Kubernetes Zero-Downtime Deployments [How-To 2026]

Zero-downtime deployment in Kubernetes is less about a single magic setting and more about coordinating several moving parts: rollout strategy, health checks, connection draining, and observability. If any one of those is weak, a release that looks safe on paper can still drop requests in production.

This tutorial walks through a practical baseline for stateless web services running on Kubernetes. We will use RollingUpdate, wire in readiness and liveness probes, delay shutdown long enough for in-flight requests to finish, and verify the rollout with concrete commands and expected output.

Key Takeaway

Zero-downtime deployment in Kubernetes depends on one rule: never send production traffic to a pod that is not ready, and never kill a pod before traffic has drained. RollingUpdate is the delivery mechanism, but readiness probes, preStop hooks, and conservative rollout settings are what make it reliable.

Why Zero-Downtime Matters

During a default rolling release, Kubernetes creates new pods and removes old ones over time. That sounds safe, but a real application can still fail during the handoff if startup takes longer than expected, the load balancer keeps sending traffic to terminating pods, or the deployment removes too much capacity at once.

A production-safe rollout should guarantee three things:

New pods do not receive traffic until they are actually ready.
Old pods keep serving traffic long enough to finish active requests.
The deployment never reduces healthy capacity below an acceptable threshold.

If you share manifests in docs or runbooks, tools such as the Code Formatter are useful for keeping YAML readable and consistent across teams.

Prerequisites

Before you start

A working Kubernetes cluster and kubectl access.
A stateless HTTP application with a /healthz or similar endpoint.
An image registry your cluster can pull from.
Basic familiarity with Deployments, Services, and Ingress.
A staging namespace to test the rollout before production.

Step 1: Prepare the Deployment

Start with a standard Deployment and Service. The important point here is to run more than one replica so Kubernetes has room to rotate pods without taking the app offline.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
        - name: api
          image: ghcr.io/example/api:v1
          ports:
            - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: api
spec:
  selector:
    app: api
  ports:
    - port: 80
      targetPort: 8080

Three replicas are not mandatory, but they make rollouts much more forgiving than a single-pod deployment. If one pod is starting and one pod is draining, you still have live capacity.

Step 2: Add Health Checks

The most important protection is the readiness probe. Kubernetes only adds a pod to the Service endpoints after the readiness check passes. Without it, traffic can hit a process that has started but is not yet ready to serve requests.

Add both readiness and liveness probes:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 3
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
        - name: api
          image: ghcr.io/example/api:v2
          ports:
            - containerPort: 8080
          readinessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8080
            initialDelaySeconds: 15
            periodSeconds: 10

Use the same endpoint only if it reflects real service readiness. For more complex apps, readiness should verify dependencies needed for serving traffic, while liveness should answer a narrower question: is the process stuck and in need of restart?

Step 3: Configure the Update Strategy

Now set a deployment strategy that preserves capacity during updates. The safest baseline for many APIs is maxUnavailable: 0, which tells Kubernetes not to take any pod out of rotation until a replacement is ready.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  minReadySeconds: 10
  selector:
    matchLabels:
      app: api
  template:
    metadata:
      labels:
        app: api
    spec:
      containers:
        - name: api
          image: ghcr.io/example/api:v2
          ports:
            - containerPort: 8080

Here is what those settings do:

maxSurge: 1 lets Kubernetes create one extra pod above the desired replica count during the rollout.
maxUnavailable: 0 prevents the controller from reducing available capacity while replacements are still warming up.
minReadySeconds: 10 requires a pod to stay healthy for 10 seconds before Kubernetes treats it as truly available.

That extra wait helps catch applications that pass readiness briefly and then fail under real startup load.

Step 4: Handle Traffic Draining

Readiness protects startup. You also need a clean shutdown path. When Kubernetes terminates a pod, there is a short window where upstream traffic may still reach it. A preStop hook plus a sane terminationGracePeriodSeconds gives the load balancer and app time to drain.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: api
spec:
  template:
    spec:
      terminationGracePeriodSeconds: 30
      containers:
        - name: api
          image: ghcr.io/example/api:v2
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 10"]

This pattern is intentionally simple. The container pauses for 10 seconds after receiving the termination signal, which gives kube-proxy, Service endpoints, and any Ingress or external load balancer time to stop routing new requests to that pod.

For higher-traffic services, replace the fixed sleep with application-aware draining logic if your runtime supports it. For example, a web server can stop accepting new connections immediately while keeping active requests alive until completion.

Step 5: Deploy and Monitor

Apply the manifest and watch the rollout rather than pushing blind.

kubectl apply -f deployment.yaml
kubectl rollout status deployment/api
kubectl get pods -l app=api -w

To trigger an image update directly:

kubectl set image deployment/api api=ghcr.io/example/api:v2
kubectl rollout status deployment/api

In production, pair this with metrics on request latency, error rate, and pod readiness transitions. A rollout is only zero-downtime if user-facing signals remain healthy throughout the change window.

Verification and Expected Output

After the deployment begins, verify that new pods become ready before old ones terminate.

kubectl describe deployment api
kubectl get endpoints api
kubectl get pods -l app=api

Expected signs of success:

kubectl rollout status ends with deployment "api" successfully rolled out.
kubectl get pods shows new pods entering Running and 1/1 ready before old pods disappear.
kubectl get endpoints api continuously lists healthy pod IPs, with no empty endpoint set during the rollout.

If you have a synthetic check or load test, run it during deployment. Even a simple loop from a jump box can expose brief 502 or 503 spikes:

while true; do
  curl -sf https://api.example.com/healthz || echo "request failed"
  sleep 1
done

Troubleshooting Top 3

1. New pods never become ready

This usually means the readiness probe is too strict, points to the wrong path, or the app is not binding to the expected port. Check pod events and container logs first:

kubectl describe pod <pod-name>
kubectl logs <pod-name>

If startup legitimately takes longer, adjust initialDelaySeconds, or better, optimize startup so the deployment stays predictable.

2. Requests fail during termination

If users still see failures when pods shut down, your app may be exiting before traffic stops. Increase terminationGracePeriodSeconds, keep the preStop delay, and confirm the application handles SIGTERM correctly.

3. Rollouts stall or take too long

A rollout can hang when cluster capacity is too tight for maxSurge, image pulls are slow, or a pod becomes ready and then flaps unhealthy. Review node capacity, image size, and readiness stability. In some environments, using a smaller image or pre-pulled base layer cuts rollout time materially.

When manifests include sample data, configs, or logs, sanitize secrets before sharing them in tickets or docs. A utility like the Data Masking Tool helps remove credentials and personally identifiable information from examples.

What's Next

This walkthrough gives you a strong default for stateless services, but it is only the baseline. From here, the next maturity steps are:

Introduce PodDisruptionBudgets so voluntary disruptions do not reduce capacity during maintenance.
Use canary or blue-green patterns when you need tighter control than a standard rolling update.
Add automated rollback based on SLO signals, not just container health.
Test failure paths in staging by injecting slow startup, failed readiness, and long-running requests.

Zero-downtime deployment is ultimately a systems discipline, not a checkbox. Once you have probes, surge settings, and graceful termination working together, Kubernetes can deliver frequent releases without the usual production drama.