Mastering kubectl logs and describe for Efficient Pod Debugging

Mastering `kubectl logs` and `describe` for Efficient Pod Debugging

Debugging applications in a distributed environment like Kubernetes can be challenging. When a pod fails to start, enters a restarting loop, or exhibits unexpected behavior, the two most critical tools in a Kubernetes operator's toolkit are kubectl describe and kubectl logs.

These commands provide different, yet complementary, views into the state and history of a Kubernetes Pod. kubectl describe gives you the Pod's metadata, status, environment variables, and, crucially, a history of system events. kubectl logs provides the standard output (stdout) and standard error (stderr) streams generated by the containerized application itself.

Mastering the flags and techniques associated with these commands is essential for rapidly diagnosing and resolving issues, significantly improving your overall cluster troubleshooting efficiency.

The Three-Step Pod Debugging Workflow

Before diving into the commands, it's helpful to understand the typical debugging workflow:

Check Status: Use kubectl get pods to identify the failure state (Pending, CrashLoopBackOff, ImagePullBackOff, etc.).
Get Context and Events: Use kubectl describe pod to understand why the state transition occurred (e.g., scheduler failed, liveness probe failed, volume failed to mount).
Inspect Application Output: Use kubectl logs to examine the application's runtime behavior (e.g., configuration errors, database connection failures, stack traces).

1. `kubectl describe`: The System Triage Tool

kubectl describe is the first command you should run when a Pod is behaving poorly. It doesn't show application output, but it provides the critical metadata and history that Kubernetes itself has recorded about the Pod.

Basic Usage

The fundamental usage requires only the Pod name:

kubectl describe pod my-failing-app-xyz789

Key Sections in the Output

When reviewing the output of describe, focus on these critical sections:

A. Status and State

Look at the Status field at the top, and then review the individual container states within the Pod. This tells you if the container is Running, Waiting, or Terminated, and provides the reason for that state.

Field	Common Status/Reason	Meaning
`Status`	`Pending`	Pod is waiting to be scheduled or has missing resources.
`Reason`	`ContainerCreating`	Container runtime is pulling the image or running setup.
`State`	`Waiting` / `Reason: CrashLoopBackOff`	The container started and exited repeatedly.
`State`	`Terminated` / `Exit Code`	The container finished execution. Non-zero exit codes usually indicate errors.

B. Container Configuration

This section verifies that your environment variables, resource requests/limits, volume mounts, and liveness/readiness probes are correctly defined, matching the manifest you applied.

C. The `Events` Section (Crucial)

The Events section, located at the bottom of the output, is arguably the most valuable part. It provides a chronological log of what the Kubernetes control plane did to and for the Pod, including warnings and errors.

Common Errors revealed by Events:

Scheduling Issues: Warning FailedScheduling: Indicates the scheduler couldn't find a suitable node (e.g., due to resource constraints, node taints, or affinity rules).
Image Pull Failures: Warning Failed: ImagePullBackOff: Indicates the image name is wrong, the tag doesn't exist, or Kubernetes lacks credentials to pull from a private registry.
Volume Errors: Warning FailedAttachVolume: Indicates issues connecting external storage.

Tip: If the Events section is clean, the problem is usually application-related (runtime crash, failed initialization, configuration error), directing you to use kubectl logs next.

2. `kubectl logs`: Inspecting Application Output

If describe shows the Pod was scheduled successfully and containers attempted to run, the next step is checking the standard output streams using kubectl logs.

Basic Log Retrieval and Real-time Streaming

To view the current logs for the primary container in a Pod:

# Retrieve all logs up to the current moment
kubectl logs my-failing-app-xyz789

# Stream logs in real-time (useful for monitoring startup)
kubectl logs -f my-failing-app-xyz789

Handling Multi-Container Pods

For pods utilizing the Sidecar pattern or other multi-container designs, you must specify which container's logs you wish to view using the -c or --container flag.

# View logs for the 'sidecar-proxy' container within the Pod
kubectl logs my-multi-container-pod -c sidecar-proxy

# Stream logs for the main application container
kubectl logs -f my-multi-container-pod -c main-app

Debugging Restarting Containers (`--previous`)

One of the most common debugging scenarios is the CrashLoopBackOff state. When a container restarts, kubectl logs only shows the output of the current (failed) attempt, which often contains only the startup message before the crash.

To view the logs from the previous, terminated instance—which contains the actual error that caused the exit—use the --previous flag (-p):

# View logs from the previous, crashed container instance
kubectl logs my-crashloop-pod --previous

# Combine with container specification if needed
kubectl logs my-crashloop-pod -c worker --previous

Limiting Output

For high-volume logs, retrieving the entire history can be slow or overwhelming. Use --tail to limit the output to the last N lines.

# Show only the last 50 lines of the container logs
kubectl logs my-high-traffic-app --tail=50

3. Combining Techniques for Advanced Diagnosis

Effective debugging often involves rapidly switching between describe and specific logs commands.

Case Study: Diagnosing Liveness Probe Failure

Imagine a Pod is stuck in Running but occasionally restarts, causing disruption.

Step 1: Check describe for the system view.

kubectl describe pod web-server-dpl-abc

Output shows in the Events section:

Type     Reason      Age   From               Message
----     ------      ----  ----               -------
Warning  Unhealthy   2s    kubelet, node-a01  Liveness probe failed: HTTP GET http://10.42.0.5:8080/health failed: 503 Service Unavailable

Conclusion from Step 1: The container is running, but the Liveness probe is failing with a 503 error, causing Kubernetes to restart the container.

Step 2: Check logs for application context.

Now, investigate why the application is returning a 503 status, which is an application-level failure.

kubectl logs web-server-dpl-abc --tail=200

Log output reveals:

2023-10-26 14:01:15 ERROR Database connection failure: Timeout connecting to DB instance 192.168.1.10

Final Conclusion: The Pod is restarting due to a failing Liveness probe, and the probe is failing because the application cannot connect to the database. The issue is external networking or database configuration, not the container itself.

Best Practices and Warnings

Practice	Command	Rationale
Always check previous logs	`kubectl logs --previous`	Necessary for diagnosing `CrashLoopBackOff`. The critical error is almost always in the previous run.
Specify containers	`kubectl logs -c <name>`	Avoids ambiguity in multi-container Pods and prevents fetching logs from unintended sidecars.
Use labels for bulk operations	`kubectl logs -l app=frontend -f`	Allows streaming logs from multiple Pods matching a selector simultaneously (useful for rolling updates).
Warning: Log Rotation	N/A	Kubernetes nodes perform log rotation. Logs older than the node's configured retention policy (often a few days or based on size) will be pruned and unavailable via `kubectl logs`. Use an external centralized logging solution (e.g., Fluentd, Loki, Elastic Stack) for long-term retention.

Conclusion

kubectl describe and kubectl logs are the indispensable core commands for debugging in Kubernetes. By treating describe as the system status report (focusing on configuration, events, and scheduling errors) and logs as the application execution stream (focusing on code errors and runtime behavior), you can systematically narrow down the cause of almost any Pod failure, significantly reducing Mean Time To Resolution (MTTR) within your cluster.