How to Diagnose and Resolve Nginx 502 Bad Gateway Errors

Nginx is a powerful and popular web server and reverse proxy, often used to serve static content, load balance traffic, and forward requests to various upstream application servers like PHP-FPM, Node.js, Python Gunicorn, or Apache Tomcat. When Nginx encounters an issue communicating with one of these upstream servers, it typically responds with a "502 Bad Gateway" error.

This article provides a comprehensive, step-by-step guide to understanding, diagnosing, and resolving Nginx 502 Bad Gateway errors. We'll explore the common causes, equip you with practical troubleshooting techniques using command-line tools, and offer actionable solutions to get your web services back online quickly. Whether you're a system administrator, developer, or managing your own server, this guide will help you effectively tackle one of the most common Nginx errors.

Understanding the Nginx 502 Bad Gateway Error

A 502 Bad Gateway error indicates that Nginx, acting as a reverse proxy, received an invalid response from an upstream server. It means Nginx successfully connected to an upstream server but received either no response, an incomplete response, or a response it couldn't understand. Crucially, the problem is not with Nginx itself, but with the service Nginx is trying to communicate with.

Common upstream servers include:

PHP-FPM: For PHP applications (e.g., WordPress, Laravel).
Gunicorn/uWSGI: For Python applications (e.g., Django, Flask).
Node.js: For JavaScript applications.
Apache Tomcat: For Java applications.
Other web servers: Such as Apache HTTP Server serving specific content.

The 502 error is a crucial indicator that your application's backend is not functioning correctly or is inaccessible to Nginx.

Step-by-Step Diagnosis

The key to resolving a 502 error is systematic diagnosis. Start with the most likely culprits and progressively investigate further.

1. Check Nginx Error Logs First

Your Nginx error logs are the primary source of information. They often contain specific details about why Nginx couldn't communicate with the upstream server.

Location: Typically found at /var/log/nginx/error.log.
Command: Use tail -f to monitor the logs in real-time while trying to reproduce the error.

tail -f /var/log/nginx/error.log

What to look for:
* connect() failed (111: Connection refused): Indicates the upstream server is not listening on the specified address/port or a firewall is blocking the connection.
* upstream timed out: The upstream server took too long to respond.
* upstream prematurely closed connection: The upstream server closed the connection before sending a complete response.
* no live upstreams while connecting to upstream: Nginx couldn't find any available upstream servers configured.

2. Verify Upstream Server Status

Once you have clues from the Nginx error logs, check the status of your upstream application server.

For PHP-FPM:

bash systemctl status phpX.X-fpm # Replace X.X with your PHP version, e.g., php7.4-fpm sudo service phpX.X-fpm status
For Node.js/Python/Other Custom Apps:
Check if the process is running.

bash ps aux | grep node ps aux | grep gunicorn
If using a process manager like PM2 (Node.js) or Supervisor (general), check its status.

bash pm2 status sudo supervisorctl status

If the service is not running, try starting it and check its own logs for errors.

systemctl start phpX.X-fpm
# Or
sudo service phpX.X-fpm start

3. Check Network Connectivity to Upstream

Ensure Nginx can reach the upstream server on the configured port or socket path.

For TCP/IP connections (e.g., 127.0.0.1:8000):
Use telnet or nc (netcat) to test port connectivity from the Nginx server.

bash telnet 127.0.0.1 8000 nc -vz 127.0.0.1 8000
A successful connection should show Connected to 127.0.0.1. or succeeded!. If it hangs or shows Connection refused, the upstream service isn't listening or a firewall is blocking it.
For Unix sockets (e.g., unix:/run/php/phpX.X-fpm.sock):
Verify the socket file exists and has correct permissions.

bash ls -l /run/php/phpX.X-fpm.sock
Nginx should have read/write permissions to this socket file. The Nginx user (e.g., www-data) needs to be part of the group that owns the socket (e.g., www-data or php-fpm).

Common Causes and Solutions

Based on your diagnostic steps, here are the most frequent causes of 502 errors and how to resolve them.

1. Upstream Server Not Running or Crashed

Cause: The application Nginx is trying to proxy to (e.g., PHP-FPM, Gunicorn, Node.js app) is not running or has crashed.

Solution: Start or restart the upstream service.

# Example for PHP-FPM
systemctl start phpX.X-fpm
# If it's already running and you suspect a crash, restart it:
systemctl restart phpX.X-fpm

# For custom applications, use their specific start/restart commands

Tip: Ensure your upstream services are configured to start automatically on system boot. For systemd services, use systemctl enable phpX.X-fpm.

2. Upstream Server Overload / Resource Exhaustion

Cause: The upstream server is overwhelmed, running out of memory, CPU, or hitting process limits, causing it to stop responding or refuse new connections.

Symptoms: Nginx error logs might show connection refused or upstream timed out intermittently, especially under load. System monitoring tools (top, htop, free -h) show high resource usage.

Solutions:

For PHP-FPM: Adjust PHP-FPM pool settings in its configuration file (e.g., /etc/php/X.X/fpm/pool.d/www.conf).
- pm.max_children: The maximum number of children that can be alive at the same time.
- pm.start_servers: The number of children created on startup.
- pm.min_spare_servers, pm.max_spare_servers: Control how many idle children are kept.
ini ; Example for dynamic process management pm = dynamic pm.max_children = 50 pm.start_servers = 10 pm.min_spare_servers = 5 pm.max_spare_servers = 20
* Increase memory_limit in php.ini if scripts are exhausting memory.
* For other applications: Increase the number of worker processes, threads, or allocate more memory if possible. Monitor your application's specific metrics.
* Nginx Timeouts: Increase Nginx's proxy_connect_timeout, proxy_send_timeout, and proxy_read_timeout directives in your Nginx configuration, but understand this merely delays the error if the backend is truly struggling.

nginx http { ... proxy_connect_timeout 60s; proxy_send_timeout 60s; proxy_read_timeout 60s; ... }

3. Incorrect Upstream Configuration in Nginx

Cause: Nginx is configured to connect to the wrong IP address, port, or Unix socket path for the upstream server.

Symptoms: Nginx error logs show connect() failed (111: Connection refused) immediately after a request.

Solution: Carefully review your Nginx server block configuration (/etc/nginx/sites-available/your_site.conf).

For HTTP/HTTPS upstreams:

nginx location /app { proxy_pass http://127.0.0.1:8000; # Ensure IP and port are correct proxy_set_header Host $host; proxy_set_header X-Real-IP $remote_addr; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; }
For PHP-FPM via Unix socket:

nginx location ~ \.php$ { fastcgi_pass unix:/run/php/phpX.X-fpm.sock; # Verify this path exactly matches PHP-FPM config fastcgi_index index.php; include fastcgi_params; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; }
For PHP-FPM via TCP/IP:

nginx location ~ \.php$ { fastcgi_pass 127.0.0.1:9000; # Verify IP and port fastcgi_index index.php; include fastcgi_params; fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name; }

After making changes, always test your Nginx configuration and reload/restart Nginx:

nginx -t
systemctl reload nginx # Or restart if -t indicates a need

4. PHP-FPM `request_terminate_timeout` Exceeded

Cause: A PHP script takes longer to execute than the request_terminate_timeout setting in PHP-FPM. Nginx waits for the response, but PHP-FPM terminates the script, causing Nginx to receive an incomplete response.

Symptoms: Nginx error logs might show upstream timed out or script timed out. PHP-FPM logs might show child XX exited on signal 9 (SIGKILL).

Solution:
* Increase request_terminate_timeout: In your PHP-FPM pool configuration (www.conf), find and adjust this directive. Setting it to 0 disables the timeout, but this is generally not recommended as long-running scripts can hang resources.

```ini
request_terminate_timeout = 300 # Increase to 5 minutes (300 seconds)
```

Increase fastcgi_read_timeout in Nginx: This Nginx timeout should be equal to or greater than request_terminate_timeout.

nginx location ~ \.php$ { ... fastcgi_read_timeout 300s; # Must be >= PHP-FPM's request_terminate_timeout ... }

Warning: While increasing timeouts can resolve the 502 error, it might mask underlying performance issues. The best long-term solution is to optimize the slow PHP script.

5. Firewall Issues

Cause: A firewall (either on the Nginx server or the upstream server if they are separate) is blocking connections to the upstream port or socket.

Solution:
* Check firewall status:

```bash
sudo ufw status # For UFW (Ubuntu/Debian)
sudo firewall-cmd --list-all # For firewalld (CentOS/RHEL)
sudo iptables -L # For iptables
```

Open necessary ports: Ensure the port Nginx uses to connect to the upstream (e.g., 9000 for PHP-FPM via TCP/IP) is open.

bash sudo ufw allow from 127.0.0.1 to any port 9000 # Allow localhost to connect to 9000 sudo firewall-cmd --permanent --add-port=9000/tcp # For firewalld sudo firewall-cmd --reload
* Temporarily disable the firewall for testing purposes only in a controlled environment, then re-enable and configure it properly.

6. SELinux or AppArmor Interference

Cause: Security enhancements like SELinux (on RHEL/CentOS) or AppArmor (on Ubuntu/Debian) might be preventing Nginx from accessing the upstream socket or making network connections, even if file permissions and firewalls are correctly configured.

Symptoms: Logs might show permission denied or similar messages, especially in /var/log/audit/audit.log (for SELinux).

Solution:
* Check audit.log:

```bash
sudo grep nginx /var/log/audit/audit.log
```

Temporarily set SELinux to permissive mode: sudo setenforce 0. If the error resolves, SELinux is the culprit. You'll then need to generate and apply appropriate SELinux policies (e.g., audit2allow). Remember to set it back to enforcing (sudo setenforce 1).
Check AppArmor status: sudo aa-status. If AppArmor is active, you may need to adjust the Nginx profile.

7. Large Request/Response Bodies (Proxy Buffering)

Cause: Nginx's default proxy buffering settings might be too small for very large request or response bodies, leading to premature connection closure.

Symptoms: Nginx error logs might show upstream prematurely closed connection while reading response header from upstream or upstream prematurely closed connection while reading response body from upstream.

Solution: Adjust Nginx proxy buffering directives in your http, server, or location block.

http {
    ...
    proxy_buffer_size   128k; # Size of the buffer for the first part of the response
    proxy_buffers   4 256k; # Number and size of buffers for the rest of the response
    proxy_busy_buffers_size   256k; # Max size of busy buffers
    proxy_temp_file_write_size 256k; # Size for writing to temporary files if buffering overflows
    ...
}

Note: These settings consume more memory. Adjust them cautiously based on your server's resources and the typical size of your application's responses.

General Troubleshooting Tips

Review all relevant logs: Besides Nginx error logs, also check Nginx access logs, upstream application logs (PHP-FPM, Gunicorn, Node.js app logs), and system logs (/var/log/syslog, dmesg).
Restart Nginx: After any configuration changes, always restart Nginx to ensure they take effect: systemctl restart nginx.
Test Nginx Configuration: Before restarting, validate your Nginx configuration syntax: nginx -t.
Isolate the Problem: Try to bypass Nginx and access the upstream application directly. For example, if your Node.js app is on localhost:3000, use curl http://localhost:3000 from the server's command line. If this also fails, the problem is definitely with your application, not Nginx.
Check Disk Space: A full disk can prevent applications from writing temporary files or logs, leading to crashes or failures. Use df -h to check disk usage.

Conclusion

Nginx 502 Bad Gateway errors are common but almost always point to an issue with the backend application Nginx is trying to connect to, not Nginx itself. By systematically checking your Nginx error logs, verifying upstream server status, confirming network connectivity, and then addressing common configuration or resource issues, you can effectively diagnose and resolve these problems.

Remember to approach troubleshooting methodically, starting with the most basic checks and progressively digging deeper. Always test your Nginx configuration after making changes and monitor your application's and server's health to prevent future occurrences. With these strategies, you'll be well-equipped to keep your services running smoothly.