Optimize Docker Images with Multi-Stage Builds: A Comprehensive Guide
Master Docker multi-stage builds to dramatically shrink your image sizes, accelerate deployments, and enhance security. This comprehensive guide provides step-by-step instructions, practical examples for Go and Node.js, and essential best practices. Learn how to optimize your Dockerfiles by separating build dependencies, ensuring only necessary components reach your final runtime image. Essential reading for anyone looking to build efficient and secure containerized applications.
Optimize Docker Images with Multi-Stage Builds: A Comprehensive Guide
Multi-stage builds solve a very ordinary Docker problem: the tools you need to build an application are usually not the tools you need to run it.
A Go compiler, Node package cache, Maven repository, test framework, and build headers are useful during the image build. They are dead weight in the runtime image. They make pulls slower, increase the amount of software you have to patch, and make it harder to understand what is actually running in production.
With a multi-stage Dockerfile, you build in one stage and copy only the finished artifact into a smaller runtime stage. The final image does not inherit the build stage unless you explicitly copy files from it.
The Problem with Single-Stage Images
Consider a typical Go application. You need the Go toolchain to compile it. Once you have a Linux binary, the compiler is no longer needed. A single-stage image keeps it anyway:
FROM golang:1.21-alpine
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN go build -o myapp
EXPOSE 8080
CMD ["./myapp"]
This runs, but the final image still contains the Go toolchain and build cache. The same pattern shows up with Node, Java, Rust, Python packages with native extensions, and frontend builds.
The cost is practical:
- Increase Image Size: More layers, more data to pull and store.
- Extend Deployment Times: Larger images take longer to transfer.
- Introduce Security Risks: A larger attack surface with unnecessary software.
- Obscure the Runtime Environment: Makes it harder to understand what's truly needed.
Smaller images are not automatically faster at runtime, but they are faster to move through CI, registries, and deployment systems. They also make security review less noisy.
What Multi-Stage Builds Do
Each FROM instruction starts a new stage. You can name a stage and copy files from it later:
FROM golang:1.21-alpine AS builder
# build files here
FROM alpine:3.20
COPY --from=builder /app/myapp /app/myapp
The second stage starts fresh. It does not contain /usr/local/go, source files, package caches, or build tools from the first stage unless you copy them.
A Clean Go Example
Here is a small application:
package main
import (
"fmt"
"log"
"net/http"
)
func handler(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "Hello from optimized Docker image!")
}
func main() {
http.HandleFunc("/", handler)
log.Println("Server starting on :8080...")
log.Fatal(http.ListenAndServe(":8080", nil))
}
The multi-stage Dockerfile:
FROM golang:1.21-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -ldflags='-w -s' -o myapp
FROM alpine:3.20
WORKDIR /app
COPY --from=builder /app/myapp /app/myapp
EXPOSE 8080
CMD ["/app/myapp"]
The go.mod and go.sum files are copied before the full source tree so Docker can reuse the dependency download layer when only application code changes. CGO_ENABLED=0 is useful when you want a static binary. If your application depends on C libraries, you may need a runtime image that includes those libraries instead of forcing static builds.
Build and compare:
docker build -t go-app:multi-stage .
docker images go-app:multi-stage
docker history go-app:multi-stage
Do not rely on a blog's example size. Check your own image. Dependency choices, base image versions, debug symbols, certificates, timezone data, and native libraries all affect the result.
Runtime Base Image Choices
alpine is popular because it is small, but small is not always the same as compatible. Alpine uses musl libc, while many common Linux distributions use glibc. Most Go static binaries run fine. Some Python, Node, Java, or native packages behave differently.
Common runtime options:
| Runtime base | Good fit | Tradeoff |
|---|---|---|
alpine |
Small images, simple binaries | musl compatibility differences |
debian:bookworm-slim |
Broad Linux compatibility | Larger than Alpine |
| Distroless images | Production runtimes with fewer tools | Harder to debug inside the container |
scratch |
Static binaries only | No shell, CA certs, or package manager unless copied |
If the app calls HTTPS endpoints, make sure the final image includes CA certificates. A scratch image without certs can fail in a way that looks like a network problem.
FROM alpine:3.20 AS certs
RUN apk add --no-cache ca-certificates
FROM scratch
COPY --from=builder /app/myapp /myapp
COPY --from=certs /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/
CMD ["/myapp"]
Multi-Stage Builds for Other Languages/Frameworks
The same idea works anywhere there is a build step.
For a Node frontend:
FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM nginx:1.27-alpine
COPY --from=builder /app/dist /usr/share/nginx/html
For a Node API, do not copy node_modules from a development install if it includes dev dependencies:
FROM node:20-alpine AS deps
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci --omit=dev
FROM node:20-alpine
WORKDIR /app
ENV NODE_ENV=production
COPY --from=deps /app/node_modules ./node_modules
COPY . .
CMD ["node", "server.js"]
For Java:
FROM maven:3.9-eclipse-temurin-21 AS builder
WORKDIR /src
COPY pom.xml .
RUN mvn -q -DskipTests dependency:go-offline
COPY src ./src
RUN mvn -q -DskipTests package
FROM eclipse-temurin:21-jre
WORKDIR /app
COPY --from=builder /src/target/app.jar /app/app.jar
CMD ["java", "-jar", "/app/app.jar"]
Build Cache Matters Too
Multi-stage builds reduce final image size, but the Dockerfile order still controls cache behavior. Put stable dependency files before volatile source files. Use npm ci instead of npm install in reproducible builds. Pin base image versions instead of relying on latest in production.
With BuildKit, cache mounts can speed up package downloads without baking caches into the final image:
# syntax=docker/dockerfile:1.7
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN --mount=type=cache,target=/go/pkg/mod go mod download
COPY . .
RUN --mount=type=cache,target=/root/.cache/go-build \
CGO_ENABLED=0 GOOS=linux go build -o myapp
That cache is for the build machine, not the runtime image.
What to Copy, and What Not to Copy
Copy the smallest complete runtime set. For a compiled service, that may be one binary plus config templates and CA certs. For a frontend, it may be a dist directory. For Java, it may be a jar plus a JRE.
Do not copy source code, package manager caches, test fixtures, local .env files, SSH keys, or build output you do not run. Use a .dockerignore file so these files do not enter the build context in the first place:
.git
node_modules
coverage
dist
*.log
.env
The .dockerignore file does not replace careful COPY instructions, but it prevents accidental context bloat and secret leaks.
Debugging Multi-Stage Builds
Name your stages. A named stage is easier to target:
docker build --target builder -t app-builder .
docker run --rm -it app-builder sh
This is useful when the build succeeds but the final image fails because a file was copied to the wrong path or a runtime library is missing.
You can also inspect files copied into the final image:
docker run --rm -it --entrypoint sh my-image
If the image has no shell, temporarily switch the final stage to a debug-friendly base while diagnosing, then put the production base back.
A Practical Rule
Use one stage for each distinct job: dependencies, build, test, runtime. Keep the runtime stage boring. If someone opens the final Dockerfile stage, they should be able to answer one question quickly: what files does this container actually need to run?
That is the real value of multi-stage builds. Smaller images are nice. Clear runtime boundaries are better.