Troubleshooting: Diagnosing Common Docker Container Errors Fast
Docker containers are designed to be resilient, but startup failures are an inevitable part of the development lifecycle. When a container suddenly exits, understanding the root cause quickly is paramount for maintaining deployment velocity. These failures are often cryptic, marked only by a non-zero exit code.
This guide provides expert troubleshooting methodologies using the essential Docker command toolkit. We will walk through a structured diagnostic process, leveraging docker ps, docker logs, and docker inspect to swiftly identify and resolve the most frequent container startup issues, allowing you to move beyond guesswork and apply actionable fixes.
Phase 1: Initial Triage and State Assessment
The first step in diagnosing any container failure is determining its current and recent state. The default docker ps command only shows running containers, which is unhelpful when a container has exited immediately upon startup.
Using docker ps -a to Find Failures
The crucial command for initial triage is docker ps -a (list all containers, running or stopped). This allows you to view the status, exit code, and age of the stopped container.
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
2d3f4b5c6e7a my-app:latest "/usr/bin/start.sh" 5 minutes ago Exited (127) 3 minutes ago web-service
d8c9a0b1c2d3 nginx:latest "nginx -g 'daemon..." 10 minutes ago Up 8 minutes 80/tcp active-proxy
Key Status Indicators:
- Exited (0): The container shut down gracefully and intentionally (often after a batch job completes). Diagnosis is usually minimal.
- Exited (Non-Zero): A failure occurred. Common non-zero codes (1, 126, 127) indicate severe issues, such as a process crashing, a file not being found, or permission errors.
- Created: The container was created but never started, or startup failed too fast for status to update.
Phase 2: Diving Deep with Container Logs
Once you have the container ID or name, the single most valuable tool for diagnosis is the logging mechanism. Docker captures the standard output (stdout) and standard error (stderr) streams from the container's primary process.
Retrieving Historical Logs
Use the docker logs command to retrieve all captured output from the failing container. This output often contains the precise error message (e.g., stack trace, configuration error, or missing file warning) that caused the container to halt.
# Retrieve logs for the failed container
$ docker logs web-service
# --- Example Log Output ---
Standardizing environment...
Error: Configuration file not found at /etc/app/config.json
Application initialization failed. Exiting.
Advanced Log Filtering Tips:
| Command Option | Purpose | Example |
|---|---|---|
-f, --follow |
Stream logs in real-time (useful if the container starts and crashes quickly). | docker logs -f web-service |
--tail N |
Display only the last N lines of logs. | docker logs --tail 50 web-service |
-t, --timestamps |
Show timestamps for each log entry (useful for correlating events). | docker logs -t web-service |
--since |
Show logs generated after a specific time or duration (e.g., 1h, 15m). |
docker logs --since 15m web-service |
Best Practice: Always check logs immediately after a failure. If logs are empty, the failure occurred before the main application process could start, often indicating an issue with the Docker
ENTRYPOINTorCMDconfiguration itself.
Phase 3: Analyzing State and Configuration with docker inspect
When logs are insufficient (e.g., showing a generic error or nothing at all), you need to analyze the container's internal configuration and execution environment.
Reviewing the Full State Object
docker inspect provides a comprehensive JSON object detailing everything about the container, from network settings to resource limits, and crucially, the final state and error message.
$ docker inspect web-service
Focus on the following key JSON paths within the output:
1. State Information
This section holds the detailed exit information, including the time of failure and any system-level error messages (if applicable).
...
"State": {
"Status": "exited",
"Running": false,
"Paused": false,
"Restarting": false,
"OOMKilled": false,
"Dead": false,
"Pid": 0,
"ExitCode": 127,
"Error": "", // Often empty, but can contain kernel-level messages
"StartedAt": "2023-10-26T14:30:00.123456789Z",
"FinishedAt": "2023-10-26T14:30:00.223456789Z"
},
...
2. Entrypoint and Command
If the container exited with code 127 (command not found) or 126 (command not executable), verify the Path and Args under the Config or State sections to ensure the primary process is correctly specified and the path exists within the image.
...
"Config": {
"Entrypoint": [
"/usr/bin/start.sh"
],
"Cmd": [
"--mode=production"
],
...
3. Mounts and Volumes
If the application failed due to missing files or permission errors, check the Mounts section to confirm that host volumes were correctly mapped, are accessible, and possess the necessary permissions.
Phase 4: Common Startup Failure Scenarios and Resolutions
By combining logs and inspection data, you can categorize the failure and apply a targeted fix.
Scenario 1: Port Already Allocated (Bind Error)
This occurs when the host port you are trying to map (-p 8080:80) is already in use by another process (either another container or a process running on the host machine).
Diagnosis: The container often fails to start immediately, or the logs show an error like bind: address already in use.
Resolution:
1. Stop the conflicting process or container.
2. Change the host port mapping (e.g., -p 8081:80).
Scenario 2: Command Not Found (Exit Code 127)
This means the Docker runtime could not execute the command specified in the ENTRYPOINT or CMD directive.
Diagnosis: Check docker logs (which might be empty) and verify the Config section using docker inspect.
Resolution:
1. Ensure the executable path is correct (e.g., /usr/local/bin/app, not just app).
2. Verify the executable exists in the image. You may need to run a temporary debugging container to examine the image filesystem:
# Temporarily run the image, overriding the failing command
$ docker run -it --entrypoint /bin/bash my-app:latest
# Now inside the container, check: ls -l /usr/bin/start.sh
Scenario 3: Permission Denied (Exit Code 126 or Volume Errors)
Typically occurs when the container user lacks permission to access a required file, directory, or volume mount point.
Diagnosis: Logs show errors like Permission denied or cannot open file.
Resolution:
1. Volume Permissions: If using host mounts (-v /host/data:/container/data), ensure the host folder has read/write permissions for the user ID the container runs as (often UID 1000 or root).
2. Entrypoint Permissions: Ensure the script specified in ENTRYPOINT has the executable flag set within the Dockerfile (RUN chmod +x /path/to/script).
Scenario 4: Out of Memory (OOMKilled)
This is a system-level failure where the kernel terminates the container's main process due to excessive memory consumption.
Diagnosis: Check docker ps -a for STATUS Exited (137) or run docker inspect [id] and look for the "OOMKilled": true field in the State object.
Resolution:
1. Increase the container's memory limit using the -m flag (e.g., --memory 2g).
2. Optimize the application to reduce memory usage.
Summary and Next Steps
Efficient Docker troubleshooting relies on a structured approach: start with docker ps -a to assess the failure, use docker logs as your primary investigative tool, and reserve docker inspect for deeper configuration and environmental issues. By understanding the meaning of exit codes and knowing where to look within the container's state, you can drastically reduce the time spent resolving common startup failures.
Further Actions:
- If the issue relates to the image, rebuild the image with temporary debugging steps (e.g., printing environment variables) included in the Dockerfile.
- If logs are scarce, temporarily switch the container's initialization to use
bashorshto manually navigate the file system and test commands within the environment.