Troubleshooting Common Jenkins Agent Connectivity Problems and Solutions
Encounter 'offline' or 'connection refused' issues with your Jenkins agents? This comprehensive guide provides step-by-step solutions for common connectivity problems. Learn to troubleshoot network, firewall, JNLP, SSH, and agent configuration issues, ensuring your Jenkins build executors are always available and running efficiently. Includes practical tips and log analysis for faster resolution.
Troubleshooting Common Jenkins Agent Connectivity Problems and Solutions
Jenkins agents, also called nodes, are where most build work actually runs. When one goes offline, the symptom is obvious: jobs sit in the queue, labels cannot be satisfied, and teams start rerunning builds that were never going to start. The useful work is figuring out which layer broke: network reachability, SSH, inbound remoting, Java, credentials, disk, or the controller itself.
Understanding why an agent might become unreachable is the first step to effective troubleshooting. These problems can stem from network misconfigurations, incorrect agent setup, firewall restrictions, or issues with the Jenkins controller itself. By systematically checking these areas, you can quickly identify the root cause and implement a solution.
Common Causes of Jenkins Agent Disconnection
Several factors can lead to an agent becoming offline. Identifying the specific symptom is key to narrowing down the potential causes:
- Agent unreachable: The Jenkins controller cannot establish a connection to the agent.
- Connection refused: The agent machine actively rejects the connection attempt from the controller.
- Agent reports offline after successful connection: The agent was connected but has since dropped its connection.
- JSch errors (for SSH-based agents): Specific errors related to the Java Secure Channel library used for SSH connections.
Network and Firewall Issues
Network connectivity is the most frequent culprit for agent connection problems. Ensuring that the Jenkins controller can reach the agent machine and vice-versa is paramount.
Verifying Network Reachability
Before diving into Jenkins-specific configurations, confirm basic network connectivity:
- Ping the agent: From the Jenkins controller machine, try pinging the IP address or hostname of the agent machine.
ping <agent-hostname-or-ip> - Telnet to the agent port: Test if the port Jenkins uses to connect to the agent is open and listening. For JNLP agents, this is typically port 50000. For SSH agents, it's the SSH port (default 22).
If the connection times out or is refused, there's likely a network or firewall issue blocking the port.telnet <agent-hostname-or-ip> <agent-port>
Firewall Configuration
Firewalls on either the Jenkins controller, the agent machine, or intermediate network devices can block the necessary ports.
- Jenkins Controller Firewall: Ensure the controller can initiate connections to the agent's port.
- Agent Machine Firewall: Ensure the agent machine's firewall (e.g.,
ufw,firewalld, Windows Firewall) allows incoming connections on the agent's port from the Jenkins controller's IP address. - Network Firewalls: If your network has internal firewalls, verify that traffic is permitted between the controller and agent.
Example: Allowing Port 50000 on an Agent (Linux with ufw)
# Allow connections from a specific IP (Jenkins controller)
sudo ufw allow from <jenkins-controller-ip> to any port 50000
# Or allow from any IP (less secure)
sudo ufw allow 50000
# Reload firewall rules
sudo ufw reload
Example: Allowing Port 22 on an Agent (Linux with firewalld)
# Allow SSH service permanently from a specific source IP
sudo firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="<jenkins-controller-ip>" port protocol="tcp" port="22" accept'
# Reload firewall rules
sudo firewall-cmd --reload
Tip: Always prioritize allowing connections from specific IP addresses for better security.
Jenkins Agent Configuration Issues
Misconfigurations within Jenkins or on the agent itself are common sources of connectivity problems.
JNLP Agent Configuration
Java Network Launch Protocol (JNLP) agents communicate with the Jenkins controller using a dedicated port. The primary configuration involves the agent's launch method and the controller's available ports.
Agent is Offline in Jenkins UI
If an agent appears offline in the Jenkins UI, it means the controller could not establish or maintain a connection.
- Check Agent Launch Method: Ensure the agent is configured to launch correctly. Common methods include:
- Launch agent by connecting it to the master: This requires manual initiation from the agent side.
- Launch agent via SSH: Configured through SSH credentials and host settings.
- Launch agent using built-in node properties: For specific scenarios.
- Verify JNLP Port Availability: The Jenkins controller needs to listen on the configured JNLP port (default 50000). Navigate to Manage Jenkins -> System -> Advanced -> File -> TCP port for JNLP agents and ensure it's set and accessible.
"Connection refused" when launching JNLP Agent
This often means the JNLP port (default 50000) on the Jenkins controller is not open or accessible from the agent machine. Verify firewall rules on the controller and ensure the port is correctly configured.
Tip: Restarting the Jenkins controller can sometimes resolve transient JNLP port issues.
SSH Agent Configuration
When using SSH to connect to agents, several factors can cause issues:
- Incorrect SSH Credentials: Verify the username, password, or private key configured in Jenkins for the SSH connection. Ensure the private key is correctly formatted (e.g., PEM format) and has the correct permissions.
- SSH Server Not Running on Agent: Ensure the SSH daemon (
sshd) is running on the agent machine.
If not running, start it:# On the agent machine sudo systemctl status sshd # or sudo service ssh statussudo systemctl start sshd sudo systemctl enable sshd - SSH Port Mismatch: Ensure the port configured in Jenkins for SSH matches the port the SSH server is listening on (default 22).
- Agent Hostname/IP Resolution: The Jenkins controller must be able to resolve the agent's hostname or IP address.
- SSH Key Permissions: On the agent machine, the
~/.ssh/authorized_keysfile for the user Jenkins connects as must have the correct permissions (usually 600).
Example: Testing SSH Connection Manually
From the Jenkins controller machine, try to SSH into the agent using the same credentials and port configured in Jenkins:
ssh -p <ssh-port> <jenkins-user>@<agent-hostname-or-ip>
If this manual SSH command fails, the problem lies outside of Jenkins' SSH configuration, likely in network, firewall, or SSH server settings on the agent.
Agent Working Directory Permissions
Jenkins requires specific permissions to operate on the agent's file system. The user that Jenkins uses to connect to the agent (or the user running the agent process) needs write permissions to the agent's configured working directory.
- Verify owner and permissions: On the agent, check the ownership and permissions of the Jenkins home directory and its subdirectories.
ls -ld /path/to/jenkins/agent/home ls -l /path/to/jenkins/agent/home - Grant permissions (if necessary): Ensure the user Jenkins connects as has read and write access. Use
chownandchmodcautiously.
Jenkins Controller Issues
Sometimes, the problem might not be with the agent but with the Jenkins controller itself.
Controller Overload
If the Jenkins controller is under heavy load (many jobs running, high CPU/memory usage), it might struggle to manage agent connections. Monitor the controller's resource utilization.
JNLP Port Conflicts
If the JNLP port (default 50000) is already in use by another process on the Jenkins controller, agents will fail to connect.
- Check port usage: On the controller machine, use
netstatorssto see which process is using the port.
If another process is using it, you'll need to reconfigure either Jenkins or the other application to use different ports.sudo netstat -tulnp | grep 50000 # or sudo ss -tulnp | grep 50000
Advanced Troubleshooting and Logs
When standard checks don't reveal the issue, deeper investigation is needed.
Jenkins Controller Logs
Review the Jenkins controller logs for errors related to agent connections. These logs can provide specific error messages.
- Location: Typically found in
$JENKINS_HOME/jenkins.logor accessible via Manage Jenkins -> System Log. - Look for: Messages mentioning the agent's hostname, IP address, connection attempts, JSch exceptions, or
Connection refusederrors.
Agent Logs
If the agent is running but reporting offline, check its logs for any errors.
- JNLP Agents: The agent process itself might output logs to its console or a designated log file.
- SSH Agents: Logs might be in
$JENKINS_HOME/agent.logon the agent machine, or related tosshdif the connection fails at the SSH level.
Enable Debug Logging
For very persistent issues, temporarily enabling debug logging for relevant Jenkins components can provide more granular information.
- JNLP/Agent Communication: You might need to adjust Java system properties or use Jenkins' logging configuration (Manage Jenkins -> System Log -> Log Recorders) to increase verbosity for
hudson.slavesor related packages.
Practical Habits That Prevent Repeat Outages
Troubleshooting Jenkins agent connectivity requires a systematic approach, starting with basic network checks and progressing to Jenkins-specific configurations.
- Verify Network: Always start with ping and telnet/nc to ensure basic network reachability and port access.
- Check Firewalls: Ensure firewalls on both the controller and agent, as well as any network firewalls, permit traffic on the required ports.
- Validate Credentials: Double-check SSH keys, usernames, and passwords.
- Confirm Agent Service: For SSH agents, ensure
sshdis running and accessible. - Monitor Jenkins Logs: Controller logs are your primary source for understanding connection failures.
- Use Specific IPs: Where possible, configure firewalls and Jenkins to use specific IP addresses rather than broad ranges or
0.0.0.0.
By following these steps, you can effectively diagnose and resolve most common Jenkins agent connectivity problems, keeping your CI/CD pipelines running smoothly.
Reading the Offline Message Without Guessing
The word "offline" is too broad to troubleshoot by itself. Before changing Jenkins settings, open the agent page and read the exact reason Jenkins gives. There is a big difference between "connection refused", "permission denied", "host key verification failed", "JNLP agent rejected", and "channel was closed". They all end with an offline node, but they point to different layers.
I usually write the symptom down in plain language: "controller cannot reach TCP port 22", "SSH login works but Java cannot start", "inbound agent starts but cannot call back to the controller", or "agent connects and then drops during builds." That one sentence keeps the investigation focused.
If the agent has never connected, suspect configuration, DNS, firewall, credentials, or launch command. If it connected for months and started failing today, check recent changes: rotated SSH keys, a Jenkins upgrade, a plugin update, a new firewall rule, an expired certificate, an agent image rebuild, or a cloud networking change. The timeline is often more useful than the error text.
SSH Agents: Separate Login Problems from Launch Problems
For SSH-based agents, test the same path Jenkins uses. From the controller host, connect as the Jenkins-configured user:
ssh -vvv jenkins-agent-user@agent-hostname
The verbose output tells you whether the failure happens before authentication, during authentication, or after login. If SSH never reaches the server, Jenkins cannot fix that. Check routing, security groups, network ACLs, host firewalls, and the SSH daemon. If SSH reaches the server but rejects the key, check the credential in Jenkins, the user's authorized_keys, file permissions, and whether the account is locked.
If SSH login works manually but Jenkins still fails, look at the remote root directory and Java startup. Jenkins needs a writable directory for the remoting files, and the agent user needs permission to create files there. A common mistake is pointing the remote root to a path owned by root or cleaned by another process.
Run these checks on the agent:
whoami
pwd
java -version
test -w /path/to/jenkins-agent && echo writable
df -h /path/to/jenkins-agent
Java version matters because modern Jenkins controllers require compatible Java versions on agents. The exact requirement depends on your Jenkins release, so check the Jenkins documentation for your version instead of assuming an old agent image is still valid.
Inbound Agents: The Callback Path Is the Usual Trap
Inbound agents are often used when the controller cannot initiate SSH to the agent, such as agents behind NAT or in restricted networks. The agent process starts outside Jenkins and connects back to the controller. That means the network path is reversed: the agent must resolve and reach the Jenkins URL.
On the agent host, test the Jenkins URL exactly as configured:
curl -I https://jenkins.example.com/
If Jenkins is behind a reverse proxy, confirm the public URL in Manage Jenkins > System is correct. A wrong Jenkins URL can make generated agent commands point to an internal hostname the agent cannot resolve. If WebSocket mode is enabled for inbound agents, make sure the proxy supports WebSocket upgrade headers. If you use the TCP inbound agent port instead, confirm the fixed port is configured and reachable from the agent network.
TLS issues can look like Jenkins issues. If the agent runs in a minimal container image, it may not have your internal CA certificate. curl will usually expose that quickly. Install the CA certificate into the agent image rather than disabling certificate verification.
Agents That Disconnect During Builds
An agent that connects successfully and then drops during a build is usually not a basic connectivity problem. Look at resource pressure and process lifecycle.
Check whether the operating system killed the agent process:
dmesg -T | grep -i -E 'killed process|out of memory'
journalctl -u jenkins-agent --since '2 hours ago'
Also check disk space. Jenkins remoting, checkout, test reports, and archived artifacts all need space. A full workspace volume can make an agent appear unreliable because the remoting process cannot write temporary files or logs.
If disconnections happen during large console output, artifact archiving, or test report publishing, look at network stability and controller load. The agent channel is a live connection. Long garbage collection pauses on the controller, overloaded proxies, idle connection timeouts, and packet loss can all close it. For agents crossing load balancers or corporate proxies, verify idle timeout settings and keepalive behavior.
DNS and Host Key Problems
DNS changes are easy to miss. Jenkins may connect to build-agent-01, while your manual test uses an IP address. Test the hostname from the controller:
getent hosts build-agent-01
nc -vz build-agent-01 22
If the hostname resolves to the wrong address, fix DNS or the agent configuration. Avoid long-term /etc/hosts patches unless you have a clear ownership process, because they become invisible infrastructure.
For SSH agents, host key verification protects Jenkins from connecting to an unexpected machine. If an agent was rebuilt, its host key may have changed. Do not blindly disable verification. Confirm the rebuild, remove the old key from the controller user's known_hosts, and accept the new key through the configured Jenkins strategy.
A Recovery Checklist for Production Agents
When several agents go offline at once, avoid fixing them one by one before finding the shared cause. Ask:
- Did the Jenkins controller restart or upgrade?
- Did a shared credential rotate?
- Did a base agent image change?
- Did a firewall, proxy, VPN, or DNS change roll out?
- Are all failed agents in the same subnet, cloud account, Kubernetes namespace, or availability zone?
If only one agent fails, inspect that host. If a whole group fails together, inspect the common dependency. This saves a lot of time in larger Jenkins fleets.