Mastering kubectl logs and describe for Efficient Pod Debugging

This guide provides expert techniques for mastering the essential Kubernetes debugging commands: `kubectl logs` and `kubectl describe`. Learn the critical flags, such as `-f`, `--tail`, `-c`, and `--previous`, required for efficient troubleshooting. We detail how to interpret the crucial 'Events' section in `describe` to diagnose scheduling and configuration issues, and how to use `logs` to extract runtime errors from crashing or multi-container pods, accelerating your debugging workflow.

Mastering kubectl logs and describe for Efficient Pod Debugging

When a Kubernetes pod fails, kubectl describe and kubectl logs usually tell you where to look next. describe explains what Kubernetes tried to do. logs shows what the container wrote before, during, or after the failure.

These commands provide different, yet complementary, views into the state and history of a Kubernetes Pod. kubectl describe gives you the Pod's metadata, status, environment variables, and, crucially, a history of system events. kubectl logs provides the standard output (stdout) and standard error (stderr) streams generated by the containerized application itself.

The trick is using them in the right order. If the pod was never scheduled, logs will not help. If the pod starts and exits, events may only tell you that it crashed; the application logs usually explain why.


The Three-Step Pod Debugging Workflow

Before diving into the commands, it's helpful to understand the typical debugging workflow:

  1. Check Status: Use kubectl get pods to identify the failure state (Pending, CrashLoopBackOff, ImagePullBackOff, etc.).
  2. Get Context and Events: Use kubectl describe pod to understand why the state transition occurred (e.g., scheduler failed, liveness probe failed, volume failed to mount).
  3. Inspect Application Output: Use kubectl logs to examine the application's runtime behavior (e.g., configuration errors, database connection failures, stack traces).

1. kubectl describe: The System Triage Tool

kubectl describe is the first command you should run when a Pod is behaving poorly. It doesn't show application output, but it provides the critical metadata and history that Kubernetes itself has recorded about the Pod.

Basic Usage

The fundamental usage requires only the Pod name:

kubectl describe pod my-failing-app-xyz789

Use namespaces explicitly when you are not in the default namespace:

kubectl describe pod my-failing-app-xyz789 -n payments

If you only know the deployment or label, find the pod first:

kubectl get pods -n payments -l app=checkout -o wide

Key Sections in the Output

When reviewing the output of describe, focus on these critical sections:

A. Status and State

Look at the Status field at the top, and then review the individual container states within the Pod. This tells you if the container is Running, Waiting, or Terminated, and provides the reason for that state.

Field Common Status/Reason Meaning
Status Pending Pod is waiting to be scheduled or has missing resources.
Reason ContainerCreating Container runtime is pulling the image or running setup.
State Waiting / Reason: CrashLoopBackOff The container started and exited repeatedly.
State Terminated / Exit Code The container finished execution. Non-zero exit codes usually indicate errors.

B. Container Configuration

This section verifies that your environment variables, resource requests/limits, volume mounts, and liveness/readiness probes are correctly defined, matching the manifest you applied.

C. The Events Section (Crucial)

The Events section, located at the bottom of the output, is arguably the most valuable part. It provides a chronological log of what the Kubernetes control plane did to and for the Pod, including warnings and errors.

Common Errors revealed by Events:

  • Scheduling Issues: Warning FailedScheduling: Indicates the scheduler couldn't find a suitable node (e.g., due to resource constraints, node taints, or affinity rules).
  • Image Pull Failures: Warning Failed: ImagePullBackOff: Indicates the image name is wrong, the tag doesn't exist, or Kubernetes lacks credentials to pull from a private registry.
  • Volume Errors: Warning FailedAttachVolume: Indicates issues connecting external storage.

Tip: If the Events section is clean and the container has started, the problem is often application-related: a bad environment variable, failed migration, missing secret, unreachable dependency, or a process that exits immediately.

2. kubectl logs: Inspecting Application Output

If describe shows the Pod was scheduled successfully and containers attempted to run, the next step is checking the standard output streams using kubectl logs.

Basic Log Retrieval and Real-time Streaming

To view the current logs for the primary container in a Pod:

# Retrieve all logs up to the current moment
kubectl logs my-failing-app-xyz789

# Stream logs in real-time (useful for monitoring startup)
kubectl logs -f my-failing-app-xyz789

Handling Multi-Container Pods

For pods utilizing the Sidecar pattern or other multi-container designs, you must specify which container's logs you wish to view using the -c or --container flag.

# View logs for the 'sidecar-proxy' container within the Pod
kubectl logs my-multi-container-pod -c sidecar-proxy

# Stream logs for the main application container
kubectl logs -f my-multi-container-pod -c main-app

Debugging Restarting Containers (--previous)

One of the most common debugging scenarios is the CrashLoopBackOff state. When a container restarts, kubectl logs only shows the output of the current (failed) attempt, which often contains only the startup message before the crash.

To view the logs from the previous, terminated instance—which contains the actual error that caused the exit—use the --previous flag (-p):

# View logs from the previous, crashed container instance
kubectl logs my-crashloop-pod --previous

# Combine with container specification if needed
kubectl logs my-crashloop-pod -c worker --previous

Limiting Output

For high-volume logs, retrieving the entire history can be slow or overwhelming. Use --tail to limit the output to the last N lines.

# Show only the last 50 lines of the container logs
kubectl logs my-high-traffic-app --tail=50

You can also add timestamps and a time window:

kubectl logs my-high-traffic-app --tail=100 --timestamps
kubectl logs my-high-traffic-app --since=10m

For a deployment, recent kubectl versions can fetch logs through the workload name:

kubectl logs deploy/checkout-api -n payments --tail=100

When that is too broad, use a label selector:

kubectl logs -n payments -l app=checkout --all-containers=true --tail=50

3. Combining Techniques for Advanced Diagnosis

Effective debugging often involves rapidly switching between describe and specific logs commands.

Case Study: Diagnosing Liveness Probe Failure

Imagine a Pod is stuck in Running but occasionally restarts, causing disruption.

Step 1: Check describe for the system view.

kubectl describe pod web-server-dpl-abc

Output shows in the Events section:

Type     Reason      Age   From               Message
----     ------      ----  ----               -------
Warning  Unhealthy   2s    kubelet, node-a01  Liveness probe failed: HTTP GET http://10.42.0.5:8080/health failed: 503 Service Unavailable

Conclusion from Step 1: The container is running, but the Liveness probe is failing with a 503 error, causing Kubernetes to restart the container.

Step 2: Check logs for application context.

Now, investigate why the application is returning a 503 status, which is an application-level failure.

kubectl logs web-server-dpl-abc --tail=200

Log output reveals:

2023-10-26 14:01:15 ERROR Database connection failure: Timeout connecting to DB instance 192.168.1.10

Final Conclusion: The Pod is restarting due to a failing Liveness probe, and the probe is failing because the application cannot connect to the database. The issue is external networking or database configuration, not the container itself.

Case Study: Pending Pod With No Logs

A pod in Pending often has no useful container logs because no container has started yet.

kubectl get pod report-worker-6f9c7b9b7d-f2q8m -n analytics

Output:

NAME                                  READY   STATUS    RESTARTS   AGE
report-worker-6f9c7b9b7d-f2q8m        0/1     Pending   0          4m

Go straight to describe:

kubectl describe pod report-worker-6f9c7b9b7d-f2q8m -n analytics

Events might show:

Warning  FailedScheduling  default-scheduler  0/6 nodes are available: 6 Insufficient memory.

That is not an application bug. The pod is asking for more memory than the scheduler can place. The next step is to inspect the deployment requests, cluster capacity, node taints, and autoscaler behavior:

kubectl get deploy report-worker -n analytics -o yaml
kubectl top nodes

Case Study: ImagePullBackOff

For ImagePullBackOff, logs are usually empty because the container image never started. describe gives the useful error:

kubectl describe pod api-7dfb9c8b7f-bd2p9 -n staging

Common event messages include an image tag that does not exist, an authentication failure against a private registry, or a DNS/network problem reaching the registry. The fix might be as simple as correcting the tag:

kubectl set image deploy/api api=registry.example.com/api:2026-05-24 -n staging

Or it might require checking the image pull secret:

kubectl get secret regcred -n staging
kubectl describe serviceaccount default -n staging

Case Study: Multi-Container Pod With a Quiet App

Sidecars can hide the signal if you look at the wrong container. First list container names:

kubectl get pod checkout-84f7c9d7bf-px5mx -n payments \
  -o jsonpath='{.spec.containers[*].name}{"\n"}'

Then inspect each one deliberately:

kubectl logs checkout-84f7c9d7bf-px5mx -n payments -c checkout --tail=100
kubectl logs checkout-84f7c9d7bf-px5mx -n payments -c envoy --tail=100

If the app logs are quiet but the proxy logs show upstream connection failures, the pod may be healthy from Kubernetes' point of view while traffic still fails through service mesh or proxy configuration.

Best Practices and Warnings

Practice Command Rationale
Always check previous logs kubectl logs --previous Necessary for diagnosing CrashLoopBackOff. The critical error is almost always in the previous run.
Specify containers kubectl logs -c <name> Avoids ambiguity in multi-container Pods and prevents fetching logs from unintended sidecars.
Use labels for bulk operations kubectl logs -l app=frontend -f Allows streaming logs from multiple Pods matching a selector simultaneously (useful for rolling updates).
Warning: Log Rotation N/A Kubernetes nodes perform log rotation. Logs older than the node's configured retention policy (often a few days or based on size) will be pruned and unavailable via kubectl logs. Use an external centralized logging solution (e.g., Fluentd, Loki, Elastic Stack) for long-term retention.

Things These Commands Cannot Tell You

kubectl logs only shows container stdout and stderr as retained by the node. If the application writes to a file inside the container, kubectl logs may show nothing. That is a logging design issue, not a kubectl issue.

kubectl describe shows Kubernetes object state and recent events, but events are not permanent audit logs. Old events age out. For long-running investigations, copy the relevant output into the incident notes.

Neither command replaces metrics. A pod can be running and logging normally while CPU throttling, memory pressure, or downstream latency causes user-visible problems. After describe and logs, the next commands are often:

kubectl top pod -n payments
kubectl top node
kubectl get events -n payments --sort-by=.lastTimestamp

Use describe first when Kubernetes could not create or keep the pod running. Use logs first when the pod is running but the application is misbehaving. Switch between them until you can separate platform symptoms from application symptoms.

A Debugging Flow You Can Reuse

When you are under pressure, use the same flow every time:

kubectl get pod <pod> -n <namespace> -o wide
kubectl describe pod <pod> -n <namespace>
kubectl logs <pod> -n <namespace> --all-containers=true --tail=100
kubectl logs <pod> -n <namespace> --all-containers=true --previous --tail=100
kubectl get events -n <namespace> --sort-by=.lastTimestamp

The order matters. get pod -o wide tells you the node, pod IP, restart count, and age. describe tells you scheduling, image, volume, probe, and container state details. Current logs show what the running container is doing now. Previous logs catch the crash that already happened. Events show whether the same problem is happening across the namespace.

For deployments, add a rollout check:

kubectl rollout status deploy/<name> -n <namespace>
kubectl describe deploy/<name> -n <namespace>
kubectl get rs -n <namespace> -l app=<label>

Sometimes the pod you are debugging is from an old ReplicaSet during a rollout. If you fix the deployment but keep reading logs from an old terminating pod, you can chase the wrong issue for half an hour.

Reading Restarts Correctly

The RESTARTS column is a clue, not a diagnosis. A restart count of 1 after a node drain may be harmless. A restart count that increases every minute is a live failure. Use describe to check the last state:

Last State:     Terminated
  Reason:       Error
  Exit Code:    1
  Started:      Sun, 24 May 2026 10:14:02 +0800
  Finished:     Sun, 24 May 2026 10:14:07 +0800

An exit code of 1 usually means the process exited with a general application error. 137 often means the process was killed, commonly because it exceeded memory limits, though you should confirm with the Reason field and node/container runtime context. 143 often appears when a process receives SIGTERM during normal termination. Do not treat every non-zero exit code as the same kind of failure.

When memory is suspected, look for:

Reason:       OOMKilled
Exit Code:    137

Then compare the container limit with actual usage:

kubectl describe pod <pod> -n <namespace> | rg -A5 'Limits|Requests'
kubectl top pod <pod> -n <namespace>

If metrics-server is not installed, kubectl top will not work. In that case, use your platform's metrics system or node-level tooling.