Troubleshooting: Diagnosing Common Docker Container Errors Fast

Master the art of rapid Docker container troubleshooting with this essential guide. Learn the structured process for diagnosing startup failures using core Docker commands. We detail how to leverage `docker ps -a` to identify crashes, extract critical information using `docker logs`, and perform advanced configuration analysis with `docker inspect`. This article provides practical examples and targeted resolutions for frequent issues, including exit code 127 errors, port conflicts, and OOMKilled events, ensuring you can quickly identify the root cause and restore service.

Troubleshooting: Diagnosing Common Docker Container Errors Fast

When a Docker container exits immediately, do not start by rebuilding the image or changing random flags. Start by finding what Docker knows: the container state, the exit code, the logs, and the exact command Docker tried to run. Those four pieces usually narrow the problem quickly.

A container is just a process with isolation around it. If the main process exits, the container exits. That can be a crash, a missing executable, a completed batch job, a failed health dependency, or the kernel killing the process because it used too much memory. The commands below help you tell those cases apart.

Find the stopped container first

docker ps only shows running containers. Failed startup containers are usually hidden unless you ask for all containers:

docker ps -a

Look at STATUS, COMMAND, and NAMES:

CONTAINER ID   IMAGE          COMMAND              STATUS                      NAMES
2d3f4b5c6e7a   my-app:latest  "/usr/bin/start"     Exited (127) 2 minutes ago  web-service
91aa34c0db22   worker:latest  "python worker.py"   Exited (0) 10 minutes ago   nightly-worker

Exited (0) often means the process completed successfully. That is normal for one-shot jobs. For a web service, it may mean the command ran and finished instead of staying in the foreground.

Non-zero exit codes point to failure, but treat them as clues rather than final answers. Exit code 127 commonly means command not found. 126 commonly means found but not executable. 137 often means the process received SIGKILL; in containers that is frequently, but not always, related to memory pressure. Always confirm with logs and inspect output.

Read logs before changing anything

Docker captures stdout and stderr from the container's main process for the default logging driver. Use:

docker logs web-service

Useful options:

docker logs --tail 100 web-service
docker logs --since 15m web-service
docker logs -t web-service
docker logs -f web-service

If the logs say config file not found, check mounts and environment. If they show an application stack trace, debug the application. If they are empty, the process may have failed before it produced output, or the image entrypoint may be wrong.

For a crash loop, avoid docker logs -f as your only tool. It can make the failure feel active without giving you the state. Pair logs with docker inspect.

Inspect the container state

docker inspect returns a large JSON document. You rarely need all of it. Start with formatted fields:

docker inspect -f 'status={{.State.Status}} exit={{.State.ExitCode}} oom={{.State.OOMKilled}} error={{.State.Error}}' web-service

Then inspect command and image configuration:

docker inspect -f 'entrypoint={{json .Config.Entrypoint}} cmd={{json .Config.Cmd}} user={{.Config.User}}' web-service

Check mounts when the error involves files:

docker inspect -f '{{json .Mounts}}' web-service

If the container was killed for memory, .State.OOMKilled is the important field. If it is true, increasing memory may help, but the better next question is why memory grew. A larger limit can hide a leak long enough to fail later.

Reproduce with an interactive shell when possible

If the image contains a shell, override the entrypoint and inspect the filesystem:

docker run --rm -it --entrypoint /bin/sh my-app:latest

Some images have Bash:

docker run --rm -it --entrypoint /bin/bash my-app:latest

Inside, check the files and command paths:

ls -l /usr/bin/start
id
env

Minimal images may not include a shell. In that case, use the image's available tools, rebuild a temporary debug variant, or inspect the Dockerfile and build output. Do not permanently add debug packages to a production image just because troubleshooting was inconvenient once.

Command not found: exit 127

Exit 127 usually means Docker could not find the executable named by ENTRYPOINT or CMD, or a startup script tried to run a missing command.

Common causes:

  • The executable was never copied into the image.
  • The path is correct on the host but not inside the image.
  • The script uses /bin/bash, but the image only has /bin/sh.
  • The command depends on PATH, and PATH differs from what you expect.

Check the image command:

docker inspect -f '{{json .Config.Entrypoint}} {{json .Config.Cmd}}' web-service

If the entrypoint is a script, check its shebang and line endings. A script with Windows CRLF endings can fail with confusing “not found” messages because the interpreter path effectively contains a carriage return.

Permission denied: exit 126 or file errors

Exit 126 often means Docker found the command but could not execute it. For scripts, the file may lack the executable bit:

COPY start.sh /usr/local/bin/start.sh
RUN chmod 0755 /usr/local/bin/start.sh
ENTRYPOINT ["/usr/local/bin/start.sh"]

For volume-mounted files, remember that host permissions apply. If a container runs as UID 1000 and the host directory is owned by root with no write permission, the container cannot write there just because it is “inside Docker.”

Check the runtime user:

docker inspect -f 'user={{.Config.User}}' web-service

If it is blank, many images run as root by default, but not all. Official and security-hardened images often use a non-root user.

Port already allocated

A bind error usually appears when you publish a host port that is already in use:

docker run -p 8080:80 nginx

Docker may report something like bind: address already in use. Find the conflict:

docker ps --format 'table {{.Names}}\t{{.Ports}}'
lsof -iTCP:8080 -sTCP:LISTEN

Then stop the conflicting process or choose another host port:

docker run -p 8081:80 nginx

The container port can stay the same. The host port is the part before the colon.

Missing files and bad mounts

If logs say a config file is missing, compare what the application expects with what Docker mounted:

docker inspect -f '{{range .Mounts}}{{println .Source "->" .Destination}}{{end}}' web-service

A common mistake is mounting a host directory over a path that already had files in the image. The mount hides the image contents at that destination. If the image contains /app/config/default.yml and you mount an empty host directory over /app/config, the default file disappears from the container's view.

Also check relative paths. -v ./config:/app/config depends on the directory where you ran docker run, not the directory where the Dockerfile lives.

Health check failures are not always container crashes

A container can be running but unhealthy:

docker ps

You might see Up 2 minutes (unhealthy). Inspect health output:

docker inspect -f '{{json .State.Health}}' web-service

Health checks often fail because the app is listening on a different port, binds only to 127.0.0.1, takes longer to start than the health check allows, or needs a database that is not ready yet. Do not confuse an unhealthy container with an exited one; the diagnostic path is different.

A fast troubleshooting sequence

Use this order when you need an answer quickly:

  1. docker ps -a to find the container and exit code.
  2. docker logs --tail 100 <name> to read the application error.
  3. docker inspect -f ... to check state, command, user, and mounts.
  4. Run a temporary shell in the image if the command or filesystem is suspect.
  5. Check host conflicts for ports and mounted directory permissions.
  6. Rebuild only after you know whether the issue is image content, runtime flags, or application configuration.

That sequence keeps the investigation grounded. Docker usually has enough evidence; the trick is reading it before changing the scene.

Check restart policy before trusting what you see

A restart policy can make a container look like it is constantly failing or constantly recovering. Check it:

docker inspect -f 'restart={{json .HostConfig.RestartPolicy}}' web-service

If the policy is always or unless-stopped, Docker may restart the container after each crash. docker ps might show it as running for a few seconds, then restarting again. In that case, use logs with timestamps and inspect the restart count:

docker inspect -f 'restarts={{.RestartCount}} started={{.State.StartedAt}} finished={{.State.FinishedAt}}' web-service

A high restart count usually means the main process exits quickly. The fix is rarely “change the restart policy.” The policy is only revealing the underlying failure.

Distinguish build-time problems from run-time problems

If a file is missing inside the container, ask when it should have appeared. Files copied in the Dockerfile are build-time concerns. Files mounted with -v or Compose volumes are run-time concerns.

Build-time checks:

docker image inspect my-app:latest
docker run --rm --entrypoint /bin/sh my-app:latest -c 'ls -la /app'

Run-time checks:

docker inspect -f '{{range .Mounts}}{{println .Source "->" .Destination}}{{end}}' web-service

This split saves time. Rebuilding the image will not fix a bad host mount. Changing a volume flag will not fix a Dockerfile that never copied the binary.

Environment variables and secrets can fail quietly

Many applications exit because a required environment variable is missing, but the Docker error only says the process exited with code 1. Inspect the configured environment carefully:

docker inspect -f '{{range .Config.Env}}{{println .}}{{end}}' web-service

Be careful where you run that command; it can print secrets. In shared logs, print only the variable names:

docker inspect -f '{{range .Config.Env}}{{println .}}{{end}}' web-service | sed 's/=.*//'

If you use --env-file, check for CRLF line endings, unquoted spaces, and missing files. Docker env files are not full shell scripts. Keep them simple: KEY=value lines, comments where supported by your Docker version, and no assumptions that shell expansion will occur inside the file.

A real-world review pass before you ship

Before calling a script or container setup finished, read it once as if you are the next person who has to debug it at 2 a.m. That changes what you notice. A prompt that made sense while writing the script may be ambiguous when it appears in a CI log. A Docker service name that felt obvious may not match the variable name in the application. A Bash default may be safe for development and dangerous for production.

I like to do a short dry run with deliberately awkward values. Use a path with spaces. Use an empty optional value. Try a filename that starts with a dash. Run the script from a different working directory. Start the container without one expected environment variable. These tests are not fancy, but they catch the assumptions that usually break first.

Also check the failure message. If the only output is failed, the article's advice has not made it into the implementation. A useful failure says what value was used, what check failed, and what the operator can change. That does not mean dumping every environment variable or printing secrets. It means being specific where specificity helps: the config path, the missing command name, the network name, the service hostname, or the port the process tried to bind.

The final habit is to keep examples close to the way the system is actually run. If production uses Compose, test with Compose. If a script is launched by systemd, test it with systemd or with a similarly minimal environment. If a command is supposed to be safe for copy and paste, include the quoting, -- separators, and validation in the example itself. Readers copy working patterns more often than they copy warnings.

That review pass is not bureaucracy. It is how small automation stays boring. Boring is what you want from shell prompts, config loaders, variable expansion, container diagnostics, and Docker networking. The less surprising the behavior is, the easier it is for the next operator to trust it.

For Docker failures, save the exact command that created the container when you can. docker inspect can show the current configuration, but an incident note that includes the original docker run command, Compose service, image tag, and env file name is much easier to reproduce later. Avoid using latest during serious debugging. A moving tag can turn one failure into two different investigations because the image changes while you are still reading logs.

If you are diagnosing a production-like issue locally, pull the same image digest or tag, use the same volume layout, and pass the same non-secret environment shape. Reproduction beats speculation.