Resolving Docker Build Failures: A Comprehensive Troubleshooting Guide
Debug Docker build failures caused by bad paths, missing packages, cache surprises, network issues, permissions, or disk space.
Resolving Docker Build Failures: A Comprehensive Troubleshooting Guide
Docker build failures are easier to fix when you treat the build output as a transcript. Docker tells you which step failed, what command ran, and what the command printed. The useful work is finding the first real error, not the last line that says the build failed.
Run the build with plain progress when the default output hides too much:
docker build --progress=plain -t my-app:debug .
Look for the failing step number, such as #8, and the instruction beside it. If the failing instruction is COPY, you probably have a build context or path problem. If it is RUN apt-get install, you have a package, network, repository, or architecture problem. If it is RUN npm ci or pip install, read the package manager error before changing Docker settings.
COPY failed: the file is not in the build context
One of the most common build errors is also one of the simplest:
COPY failed: file not found in build context or excluded by .dockerignore
Docker can only copy files inside the build context, which is usually the final argument to docker build:
docker build -t my-app .
Here . is the context. A Dockerfile in a subdirectory cannot copy ../secret.txt from outside that context. Docker intentionally blocks that because builds should be reproducible from their context.
Check three things:
pwd
ls -la
docker build --progress=plain -f path/to/Dockerfile .
If your Dockerfile lives in docker/Dockerfile but the app is at the repository root, build from the root and point to the Dockerfile with -f:
docker build -f docker/Dockerfile -t my-app .
Also inspect .dockerignore. It may exclude the file you are trying to copy. This often happens with dist, target, .env, or generated files. If the Dockerfile expects a file, do not ignore it unless the file is created inside the build.
Package install failures
Package manager failures usually fall into a few buckets: stale package indexes, wrong package names, missing repositories, network problems, or using commands from the wrong Linux distribution.
This fails on Alpine because Alpine does not use apt-get:
FROM alpine:3.20
RUN apt-get update && apt-get install -y curl
Use the package manager for the base image:
FROM alpine:3.20
RUN apk add --no-cache curl
For Debian or Ubuntu images, keep update and install in the same layer:
RUN apt-get update && apt-get install -y --no-install-recommends curl ca-certificates && rm -rf /var/lib/apt/lists/*
If apt-get install says it cannot locate a package, confirm the package name for that distribution version. Package names differ between Debian, Ubuntu, Alpine, Fedora, and language-specific images. Minimal images may also lack tools you assume are present, such as bash, curl, git, tar, or ca-certificates.
If HTTPS downloads fail with certificate errors, install CA certificates before using curl, wget, git over HTTPS, npm, pip, or language package managers:
RUN apt-get update && apt-get install -y --no-install-recommends ca-certificates && rm -rf /var/lib/apt/lists/*
Cache surprises
Docker's cache is usually helpful, but it can make a broken build look inconsistent. If you suspect stale cache, run:
docker build --no-cache --progress=plain -t my-app:debug .
If the build only fails without cache, you may have been relying on old layers. If it only fails with cache, check whether a generated file or dependency lock file changed in a way Docker does not see where you expect.
For dependency installs, copy lock files before the full source tree:
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
For Python:
WORKDIR /app
COPY requirements.txt ./
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
This pattern is not only faster. It makes build failures easier to understand because dependency installation depends on dependency files, not on every source change.
Network and registry failures
A build may fail before your Dockerfile runs if Docker cannot pull the base image:
docker pull python:3.12-slim
If that fails, fix registry access first. Check authentication for private registries, corporate proxy settings, DNS, and firewall rules. Behind a proxy, the Docker daemon needs proxy configuration; setting HTTP_PROXY only in your interactive shell may not be enough.
For downloads inside RUN steps, test the URL from the host and then from a temporary container on the same network path:
curl -I https://example.com/file.tar.gz
docker run --rm curlimages/curl -I https://example.com/file.tar.gz
Do not depend on unpinned remote scripts if you can avoid it. A Dockerfile that curls install.sh from a moving branch can break because the remote script changed. Prefer versioned downloads and checksum verification for binaries:
RUN curl -fsSLo tool.tar.gz https://example.com/tool-1.2.3-linux-amd64.tar.gz && echo '<sha256> tool.tar.gz' | sha256sum -c - && tar -xzf tool.tar.gz -C /usr/local/bin && rm tool.tar.gz
Replace <sha256> with the real checksum from the project release page.
Architecture mismatches
On Apple Silicon or mixed CI fleets, build failures can come from architecture. An image or downloaded binary may be amd64 while the builder is arm64, or the other way around. Symptoms include exec format error, missing packages for an architecture, or binaries that fail during build.
Check your host and target:
docker version
docker buildx ls
Build for a specific platform when needed:
docker buildx build --platform linux/amd64 -t my-app:amd64 .
Be careful: cross-platform builds can be slower when emulation is involved. For CI, native builders for each platform are often faster and less surprising.
Permission errors during build
Permissions fail in builds when files are copied with unexpected ownership, scripts are not executable, or the Dockerfile switches to a non-root user before setup is complete.
If a script fails with permission denied, inspect it before copying assumptions into the Dockerfile:
ls -l scripts/start.sh
Then fix it either in git or in the image:
COPY scripts/start.sh /usr/local/bin/start.sh
RUN chmod +x /usr/local/bin/start.sh
If you use a non-root runtime user, create directories and set ownership before switching users:
RUN useradd -r -u 10001 appuser && mkdir -p /app/data && chown -R appuser:appuser /app
USER appuser
COPY --chown=appuser:appuser . . is often cleaner than copying as root and running a broad recursive chown later.
Disk space and build cache cleanup
Large builds can fail because the Docker host runs out of disk space. Check Docker's usage:
docker system df
Remove unused build cache when appropriate:
docker builder prune
Be more careful with broad cleanup commands. docker system prune -a removes unused images, and that can force large re-pulls or break workflows that rely on local images. Use it when you understand the impact.
If builds regularly fill the disk, the better fix is usually smaller build contexts, multi-stage builds, and avoiding huge temporary files in layers. Clean temporary artifacts in the same RUN instruction that creates them.
Debug a failing RUN step interactively
When a long RUN line fails, split it temporarily:
RUN apt-get update
RUN apt-get install -y --no-install-recommends packageA packageB
RUN some-command-that-fails
Once you find the failing command, you can combine related commands again for a cleaner image.
Another useful trick is to stop at a known good stage. If your Dockerfile has stages, build one target:
docker build --target builder -t my-app-builder .
docker run --rm -it my-app-builder sh
From there you can inspect files, run the failing command by hand, check environment variables, and see what the filesystem actually contains.
For images without a shell, add a temporary debug stage rather than polluting the production image:
FROM builder AS debug
RUN apt-get update && apt-get install -y --no-install-recommends bash curl
Build --target debug, investigate, then remove or ignore the debug target when done.
Keep builds predictable
A reliable Docker build is boring in the best way. Use versioned base images. Keep dependency lock files in source control. Avoid latest in production Dockerfiles unless your process intentionally rebuilds and tests against moving tags. Keep .dockerignore tight. Make network downloads versioned and verified. Put frequently changing source code after dependency installation.
When a build fails, do not rewrite the whole Dockerfile at once. Identify the failing instruction, reproduce with plain logs, isolate the command, and fix the smallest real cause. That approach is faster, and it leaves you with a Dockerfile the next person can still understand.