Nginx Performance Tuning: Optimize Worker Processes and Connections

Nginx performance tuning often starts with two settings: worker processes and worker connections. These settings decide how many requests your server can handle at the same time, so small mistakes can show up as slow pages, stalled downloads, or connection errors during traffic spikes.

The good news is that you do not need to guess wildly. You can tune Nginx by matching its worker model to your CPU, file limits, traffic pattern, and upstream application behavior.

How Nginx Workers Handle Traffic

Nginx uses a master process and one or more worker processes. The master process reads configuration, starts workers, and handles reloads. The workers do the actual request handling.

Each worker can handle many connections because Nginx uses an event-driven model. That is different from older web servers that often needed one process or thread per request. One Nginx worker can keep thousands of idle keepalive connections open if the operating system allows it.

The two core directives are usually placed in /etc/nginx/nginx.conf:

worker_processes auto;

events {
    worker_connections 1024;
}

worker_processes auto; tells Nginx to create a worker for each available CPU core. For most modern Linux servers, this is the right starting point. It avoids hardcoding a value that becomes wrong when you resize a virtual machine.

worker_connections sets the maximum number of simultaneous connections each worker can open. The rough upper limit is:

worker_processes * worker_connections

If you have 4 workers and 4096 worker connections, the theoretical maximum is 16,384 connections. In real life, the usable number is lower because reverse proxy traffic can use both client-side and upstream connections.

For example, if Nginx proxies traffic to a Node.js app, one user request may consume one client connection plus one upstream connection. That means 16,384 open connections might support closer to 8,000 active proxied requests, depending on keepalive and request timing.

Choosing Worker Process and Connection Values

Start with worker_processes auto unless you have a specific reason not to. Manually setting this value higher than the CPU count rarely helps. It can increase context switching and make performance worse under load.

Then tune worker_connections based on expected concurrency. A quiet internal tool may be fine at 1024. A public website behind a load balancer may need 4096, 8192, or more.

A practical baseline for many production servers looks like this:

worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 4096;
    multi_accept on;
}

worker_rlimit_nofile raises the file descriptor limit available to Nginx workers. This matters because every network socket uses a file descriptor. If the operating system limit stays low, increasing worker_connections alone will not help.

You should also check the service manager limit. On systemd systems, Nginx may need an override such as:

[Service]
LimitNOFILE=65535

After changing systemd limits, reload systemd and restart Nginx. For a broader command reference, see Nginx service control commands.

Be careful with multi_accept on. It allows a worker to accept as many new connections as possible after receiving a readiness notification. This can help during bursts, but it is not magic. If your upstream app is slow, accepting connections faster may only fill queues faster.

Operating System Limits That Affect Nginx

Nginx settings sit on top of Linux limits. If those limits are too small, Nginx will hit a ceiling even when its own configuration looks generous.

Check these areas when tuning:

Open file limit for the Nginx process
Kernel network backlog settings
Ephemeral port availability for heavy proxy traffic
Upstream keepalive behavior
Load balancer idle timeout values

The open file limit is the most common blocker. If Nginx logs messages like worker_connections are not enough or too many open files, you need to look at both Nginx and systemd limits.

Backlog settings matter when many clients connect at once. If the kernel accept queue fills, users may see connection timeouts even though CPU usage looks normal. Values such as net.core.somaxconn and net.ipv4.tcp_max_syn_backlog are often reviewed during high-traffic tuning.

Do not copy large kernel values from random examples without testing. A small team running one API server does not need the same settings as a CDN edge node. Tune in steps, measure, and keep notes.

There is another detail that trips people up: the Nginx connection limit is not the only connection limit in the path. A cloud load balancer has idle timeouts. A container runtime may have network address translation limits. The backend app may have its own worker pool or database connection pool. If Nginx can accept 20,000 connections but the app can process only 200 concurrent requests, users will still wait.

This is why a connection tuning change should include a quick end-to-end check. Run a small load test from a host outside the server, watch Nginx active connections, and also watch the backend. If backend latency climbs sharply while Nginx stays calm, the proxy is doing its job and the next limit is behind it.

Tuning for Reverse Proxy Workloads

Many Nginx servers act as reverse proxies in front of application servers. In that role, upstream behavior matters as much as Nginx capacity.

Use upstream keepalive when Nginx repeatedly talks to the same backend pool:

upstream app_backend {
    server 127.0.0.1:3000;
    keepalive 32;
}

server {
    location / {
        proxy_http_version 1.1;
        proxy_set_header Connection "";
        proxy_pass http://app_backend;
    }
}

This reduces the cost of opening new backend connections. It is especially useful when your app receives many small requests, such as API calls from a dashboard.

Also check your timeout values. Very long proxy timeouts can keep worker connections occupied after clients disappear or applications stop responding. Very short timeouts can break legitimate slow requests. Match timeout values to the workload instead of using one default everywhere.

A practical scenario: your site is fast most of the day but slows during a newsletter send. CPU is only 35%, but Nginx logs show connection warnings. That points away from raw CPU and toward connection limits, file descriptors, or upstream queueing. Raising worker connections may help, but only if the app and OS can support the extra load.

Another common scenario is a dashboard app that makes many small API calls from each browser tab. Ten people may create hundreds of short requests. In that case, upstream keepalive often matters more than simply raising worker_connections, because repeated TCP setup to the backend becomes unnecessary overhead.

For a file download service, the story is different. A small number of users can hold connections open for a long time while downloading large files. You may need enough worker connections for long-lived transfers, but you should also check sendfile, disk throughput, network throughput, and client timeout behavior.

For WebSocket or long-polling apps, idle connections are normal. A high Waiting number is not automatically bad. The question is whether those idle connections leave enough capacity for new requests and whether memory use stays predictable.

Reading `stub_status` While Tuning

The stub_status module gives you a quick view of connection behavior:

Active connections: 291
server accepts handled requests
 1162447 1162447 4496426
Reading: 6 Writing: 17 Waiting: 268

Reading means Nginx is reading request headers. A sustained high number can point to slow clients, large headers, or an attack pattern. Writing means Nginx is sending responses. This can rise when clients are slow to receive data or when responses are large. Waiting means idle keepalive connections. That number can be high on healthy sites.

The accepts and handled counters should usually move together. If accepted connections climb but handled connections lag or errors appear, check worker limits and file descriptor limits. Also check whether the kernel is dropping connections before Nginx can handle them.

These counters are basic, but they are useful because they separate connection pressure from CPU pressure. If active connections are low and CPU is high, the problem is probably not worker_connections. If active connections are high and CPU is low, connection limits, keepalive behavior, upstream queueing, or slow clients become more likely.

A Safe Baseline Configuration

For a small production server, I would rather start conservative and measure:

worker_processes auto;
worker_rlimit_nofile 65535;

events {
    worker_connections 4096;
    multi_accept on;
}

http {
    keepalive_timeout 30s;
    keepalive_requests 1000;
}

This is not a universal best configuration. It is a reasonable starting point for many normal reverse proxy workloads. A very small VM may need less. A busy edge proxy may need much more. The important part is that worker_connections and worker_rlimit_nofile are aligned.

After applying a baseline, compare before and after metrics during similar traffic. Do not judge a tuning change by one lucky minute after reload. Look at p95 or p99 latency, error rate, CPU, memory, and backend queueing over enough time to see real behavior.

Mistakes That Make Connection Tuning Look Random

The first mistake is counting requests instead of connections. A browser can reuse one connection for multiple requests, and HTTP/2 can multiplex many requests over one connection. A slow client can also hold a connection open while doing very little useful work. That means "we only get 500 requests per second" does not tell you how many connections Nginx needs.

The second mistake is forgetting upstream connections. If Nginx serves static files, most connections are client-facing. If Nginx proxies to an app, active requests often need backend sockets too. If you use keepalive to the upstream, some backend connections stay open for reuse. This is good, but it still consumes file descriptors on both sides.

The third mistake is raising Nginx limits without checking the application. Suppose Nginx can now accept 12,000 simultaneous connections, but the application has 16 worker processes and a database pool of 50 connections. Nginx will accept more work than the app can finish. Users may see fewer immediate connection errors, but latency can get worse because requests wait longer in queues.

The fourth mistake is using long keepalive timeouts everywhere. Keepalive is useful because it avoids repeated TCP and TLS setup. But a very long timeout can leave many idle sockets open after a traffic spike. On a memory-rich edge proxy this may be fine. On a small VM it can crowd out active work. If you see a huge Waiting count and low request reuse, try a shorter keepalive_timeout and measure again.

Troubleshooting Examples

If the error log says worker_connections are not enough, check the configured value, the number of workers, and the process file limit:

grep -R "worker_connections\\|worker_processes\\|worker_rlimit_nofile" /etc/nginx/nginx.conf /etc/nginx/conf.d
cat /proc/$(pgrep -o nginx)/limits | grep "open files"

The pgrep -o nginx command usually finds the oldest Nginx process, which is often the master. On some systems you may prefer systemctl status nginx to see the main PID.

If the error log says too many open files, do not only raise worker_connections. The process is hitting its descriptor limit. Add or adjust LimitNOFILE for the systemd service, reload systemd, and restart Nginx so the new limit is actually applied:

sudo systemctl edit nginx
sudo systemctl daemon-reload
sudo systemctl restart nginx

If users see timeouts but Nginx has spare CPU and no connection warnings, look behind Nginx. Check upstream response time in access logs. Check the app worker pool. Check database connections. A reverse proxy can accept traffic smoothly while the real bottleneck is a saturated backend.

If a spike causes connection resets before requests appear in access logs, the problem may be earlier than Nginx request handling. Look at kernel backlog settings, load balancer logs, firewall state tables, and SYN flood protection. Nginx cannot log a request it never received.

How to Test Changes Safely

Never tune production by editing and hoping. Test the syntax first:

sudo nginx -t

Then reload Nginx so active connections are handled gracefully:

sudo systemctl reload nginx

Watch the error log after every change:

sudo tail -f /var/log/nginx/error.log

You should also monitor request latency, 4xx and 5xx rates, active connections, CPU, memory, and upstream response time. A tuning change that increases connection capacity but raises application latency may not be a real win.

For deeper validation steps, see testing Nginx configurations.

When to Bring in a Specialist

Call an experienced DevOps engineer or web performance specialist when Nginx errors continue after basic tuning, when traffic spikes affect revenue, or when you are changing kernel network settings on a production system. The same applies if you are tuning Nginx in front of payment flows, login systems, or critical APIs.

Professional help is also useful when the bottleneck is unclear. Nginx may look like the problem when the real issue is a slow database, exhausted upstream app pool, overloaded TLS termination layer, or load balancer timeout mismatch.

The key takeaway is simple: tune worker processes to match CPU, tune worker connections to match concurrency, and make sure Linux file limits support both. Change one layer at a time, test the config before reloading, and measure real traffic instead of trusting theoretical maximums.