Mastering kubectl logs and describe for Efficient Pod Debugging
Debugging applications in a distributed environment like Kubernetes can be challenging. When a pod fails to start, enters a restarting loop, or exhibits unexpected behavior, the two most critical tools in a Kubernetes operator's toolkit are kubectl describe and kubectl logs.
These commands provide different, yet complementary, views into the state and history of a Kubernetes Pod. kubectl describe gives you the Pod's metadata, status, environment variables, and, crucially, a history of system events. kubectl logs provides the standard output (stdout) and standard error (stderr) streams generated by the containerized application itself.
Mastering the flags and techniques associated with these commands is essential for rapidly diagnosing and resolving issues, significantly improving your overall cluster troubleshooting efficiency.
The Three-Step Pod Debugging Workflow
Before diving into the commands, it's helpful to understand the typical debugging workflow:
- Check Status: Use
kubectl get podsto identify the failure state (Pending,CrashLoopBackOff,ImagePullBackOff, etc.). - Get Context and Events: Use
kubectl describe podto understand why the state transition occurred (e.g., scheduler failed, liveness probe failed, volume failed to mount). - Inspect Application Output: Use
kubectl logsto examine the application's runtime behavior (e.g., configuration errors, database connection failures, stack traces).
1. kubectl describe: The System Triage Tool
kubectl describe is the first command you should run when a Pod is behaving poorly. It doesn't show application output, but it provides the critical metadata and history that Kubernetes itself has recorded about the Pod.
Basic Usage
The fundamental usage requires only the Pod name:
kubectl describe pod my-failing-app-xyz789
Key Sections in the Output
When reviewing the output of describe, focus on these critical sections:
A. Status and State
Look at the Status field at the top, and then review the individual container states within the Pod. This tells you if the container is Running, Waiting, or Terminated, and provides the reason for that state.
| Field | Common Status/Reason | Meaning |
|---|---|---|
Status |
Pending |
Pod is waiting to be scheduled or has missing resources. |
Reason |
ContainerCreating |
Container runtime is pulling the image or running setup. |
State |
Waiting / Reason: CrashLoopBackOff |
The container started and exited repeatedly. |
State |
Terminated / Exit Code |
The container finished execution. Non-zero exit codes usually indicate errors. |
B. Container Configuration
This section verifies that your environment variables, resource requests/limits, volume mounts, and liveness/readiness probes are correctly defined, matching the manifest you applied.
C. The Events Section (Crucial)
The Events section, located at the bottom of the output, is arguably the most valuable part. It provides a chronological log of what the Kubernetes control plane did to and for the Pod, including warnings and errors.
Common Errors revealed by Events:
- Scheduling Issues:
Warning FailedScheduling: Indicates the scheduler couldn't find a suitable node (e.g., due to resource constraints, node taints, or affinity rules). - Image Pull Failures:
Warning Failed: ImagePullBackOff: Indicates the image name is wrong, the tag doesn't exist, or Kubernetes lacks credentials to pull from a private registry. - Volume Errors:
Warning FailedAttachVolume: Indicates issues connecting external storage.
Tip: If the
Eventssection is clean, the problem is usually application-related (runtime crash, failed initialization, configuration error), directing you to usekubectl logsnext.
2. kubectl logs: Inspecting Application Output
If describe shows the Pod was scheduled successfully and containers attempted to run, the next step is checking the standard output streams using kubectl logs.
Basic Log Retrieval and Real-time Streaming
To view the current logs for the primary container in a Pod:
# Retrieve all logs up to the current moment
kubectl logs my-failing-app-xyz789
# Stream logs in real-time (useful for monitoring startup)
kubectl logs -f my-failing-app-xyz789
Handling Multi-Container Pods
For pods utilizing the Sidecar pattern or other multi-container designs, you must specify which container's logs you wish to view using the -c or --container flag.
# View logs for the 'sidecar-proxy' container within the Pod
kubectl logs my-multi-container-pod -c sidecar-proxy
# Stream logs for the main application container
kubectl logs -f my-multi-container-pod -c main-app
Debugging Restarting Containers (--previous)
One of the most common debugging scenarios is the CrashLoopBackOff state. When a container restarts, kubectl logs only shows the output of the current (failed) attempt, which often contains only the startup message before the crash.
To view the logs from the previous, terminated instance—which contains the actual error that caused the exit—use the --previous flag (-p):
# View logs from the previous, crashed container instance
kubectl logs my-crashloop-pod --previous
# Combine with container specification if needed
kubectl logs my-crashloop-pod -c worker --previous
Limiting Output
For high-volume logs, retrieving the entire history can be slow or overwhelming. Use --tail to limit the output to the last N lines.
# Show only the last 50 lines of the container logs
kubectl logs my-high-traffic-app --tail=50
3. Combining Techniques for Advanced Diagnosis
Effective debugging often involves rapidly switching between describe and specific logs commands.
Case Study: Diagnosing Liveness Probe Failure
Imagine a Pod is stuck in Running but occasionally restarts, causing disruption.
Step 1: Check describe for the system view.
kubectl describe pod web-server-dpl-abc
Output shows in the Events section:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 2s kubelet, node-a01 Liveness probe failed: HTTP GET http://10.42.0.5:8080/health failed: 503 Service Unavailable
Conclusion from Step 1: The container is running, but the Liveness probe is failing with a 503 error, causing Kubernetes to restart the container.
Step 2: Check logs for application context.
Now, investigate why the application is returning a 503 status, which is an application-level failure.
kubectl logs web-server-dpl-abc --tail=200
Log output reveals:
2023-10-26 14:01:15 ERROR Database connection failure: Timeout connecting to DB instance 192.168.1.10
Final Conclusion: The Pod is restarting due to a failing Liveness probe, and the probe is failing because the application cannot connect to the database. The issue is external networking or database configuration, not the container itself.
Best Practices and Warnings
| Practice | Command | Rationale |
|---|---|---|
| Always check previous logs | kubectl logs --previous |
Necessary for diagnosing CrashLoopBackOff. The critical error is almost always in the previous run. |
| Specify containers | kubectl logs -c <name> |
Avoids ambiguity in multi-container Pods and prevents fetching logs from unintended sidecars. |
| Use labels for bulk operations | kubectl logs -l app=frontend -f |
Allows streaming logs from multiple Pods matching a selector simultaneously (useful for rolling updates). |
| Warning: Log Rotation | N/A | Kubernetes nodes perform log rotation. Logs older than the node's configured retention policy (often a few days or based on size) will be pruned and unavailable via kubectl logs. Use an external centralized logging solution (e.g., Fluentd, Loki, Elastic Stack) for long-term retention. |
Conclusion
kubectl describe and kubectl logs are the indispensable core commands for debugging in Kubernetes. By treating describe as the system status report (focusing on configuration, events, and scheduling errors) and logs as the application execution stream (focusing on code errors and runtime behavior), you can systematically narrow down the cause of almost any Pod failure, significantly reducing Mean Time To Resolution (MTTR) within your cluster.