Troubleshooting Common Jenkins Agent Connectivity Problems and Solutions

Encounter 'offline' or 'connection refused' issues with your Jenkins agents? This comprehensive guide provides step-by-step solutions for common connectivity problems. Learn to troubleshoot network, firewall, JNLP, SSH, and agent configuration issues, ensuring your Jenkins build executors are always available and running efficiently. Includes practical tips and log analysis for faster resolution.

Troubleshooting Common Jenkins Agent Connectivity Problems and Solutions

Jenkins agents, also called nodes, are where most build work actually runs. When one goes offline, the symptom is obvious: jobs sit in the queue, labels cannot be satisfied, and teams start rerunning builds that were never going to start. The useful work is figuring out which layer broke: network reachability, SSH, inbound remoting, Java, credentials, disk, or the controller itself.

Understanding why an agent might become unreachable is the first step to effective troubleshooting. These problems can stem from network misconfigurations, incorrect agent setup, firewall restrictions, or issues with the Jenkins controller itself. By systematically checking these areas, you can quickly identify the root cause and implement a solution.

Common Causes of Jenkins Agent Disconnection

Several factors can lead to an agent becoming offline. Identifying the specific symptom is key to narrowing down the potential causes:

  • Agent unreachable: The Jenkins controller cannot establish a connection to the agent.
  • Connection refused: The agent machine actively rejects the connection attempt from the controller.
  • Agent reports offline after successful connection: The agent was connected but has since dropped its connection.
  • JSch errors (for SSH-based agents): Specific errors related to the Java Secure Channel library used for SSH connections.

Network and Firewall Issues

Network connectivity is the most frequent culprit for agent connection problems. Ensuring that the Jenkins controller can reach the agent machine and vice-versa is paramount.

Verifying Network Reachability

Before diving into Jenkins-specific configurations, confirm basic network connectivity:

  1. Ping the agent: From the Jenkins controller machine, try pinging the IP address or hostname of the agent machine.
    ping <agent-hostname-or-ip>
    
  2. Telnet to the agent port: Test if the port Jenkins uses to connect to the agent is open and listening. For JNLP agents, this is typically port 50000. For SSH agents, it's the SSH port (default 22).
    telnet <agent-hostname-or-ip> <agent-port>
    
    If the connection times out or is refused, there's likely a network or firewall issue blocking the port.

Firewall Configuration

Firewalls on either the Jenkins controller, the agent machine, or intermediate network devices can block the necessary ports.

  • Jenkins Controller Firewall: Ensure the controller can initiate connections to the agent's port.
  • Agent Machine Firewall: Ensure the agent machine's firewall (e.g., ufw, firewalld, Windows Firewall) allows incoming connections on the agent's port from the Jenkins controller's IP address.
  • Network Firewalls: If your network has internal firewalls, verify that traffic is permitted between the controller and agent.

Example: Allowing Port 50000 on an Agent (Linux with ufw)

# Allow connections from a specific IP (Jenkins controller)
sudo ufw allow from <jenkins-controller-ip> to any port 50000

# Or allow from any IP (less secure)
sudo ufw allow 50000

# Reload firewall rules
sudo ufw reload

Example: Allowing Port 22 on an Agent (Linux with firewalld)

# Allow SSH service permanently from a specific source IP
sudo firewall-cmd --permanent --zone=public --add-rich-rule='rule family="ipv4" source address="<jenkins-controller-ip>" port protocol="tcp" port="22" accept'

# Reload firewall rules
sudo firewall-cmd --reload

Tip: Always prioritize allowing connections from specific IP addresses for better security.

Jenkins Agent Configuration Issues

Misconfigurations within Jenkins or on the agent itself are common sources of connectivity problems.

JNLP Agent Configuration

Java Network Launch Protocol (JNLP) agents communicate with the Jenkins controller using a dedicated port. The primary configuration involves the agent's launch method and the controller's available ports.

Agent is Offline in Jenkins UI

If an agent appears offline in the Jenkins UI, it means the controller could not establish or maintain a connection.

  1. Check Agent Launch Method: Ensure the agent is configured to launch correctly. Common methods include:
    • Launch agent by connecting it to the master: This requires manual initiation from the agent side.
    • Launch agent via SSH: Configured through SSH credentials and host settings.
    • Launch agent using built-in node properties: For specific scenarios.
  2. Verify JNLP Port Availability: The Jenkins controller needs to listen on the configured JNLP port (default 50000). Navigate to Manage Jenkins -> System -> Advanced -> File -> TCP port for JNLP agents and ensure it's set and accessible.

"Connection refused" when launching JNLP Agent

This often means the JNLP port (default 50000) on the Jenkins controller is not open or accessible from the agent machine. Verify firewall rules on the controller and ensure the port is correctly configured.

Tip: Restarting the Jenkins controller can sometimes resolve transient JNLP port issues.

SSH Agent Configuration

When using SSH to connect to agents, several factors can cause issues:

  1. Incorrect SSH Credentials: Verify the username, password, or private key configured in Jenkins for the SSH connection. Ensure the private key is correctly formatted (e.g., PEM format) and has the correct permissions.
  2. SSH Server Not Running on Agent: Ensure the SSH daemon (sshd) is running on the agent machine.
    # On the agent machine
    sudo systemctl status sshd
    # or
    sudo service ssh status
    
    If not running, start it:
    sudo systemctl start sshd
    sudo systemctl enable sshd
    
  3. SSH Port Mismatch: Ensure the port configured in Jenkins for SSH matches the port the SSH server is listening on (default 22).
  4. Agent Hostname/IP Resolution: The Jenkins controller must be able to resolve the agent's hostname or IP address.
  5. SSH Key Permissions: On the agent machine, the ~/.ssh/authorized_keys file for the user Jenkins connects as must have the correct permissions (usually 600).

Example: Testing SSH Connection Manually

From the Jenkins controller machine, try to SSH into the agent using the same credentials and port configured in Jenkins:

ssh -p <ssh-port> <jenkins-user>@<agent-hostname-or-ip>

If this manual SSH command fails, the problem lies outside of Jenkins' SSH configuration, likely in network, firewall, or SSH server settings on the agent.

Agent Working Directory Permissions

Jenkins requires specific permissions to operate on the agent's file system. The user that Jenkins uses to connect to the agent (or the user running the agent process) needs write permissions to the agent's configured working directory.

  • Verify owner and permissions: On the agent, check the ownership and permissions of the Jenkins home directory and its subdirectories.
    ls -ld /path/to/jenkins/agent/home
    ls -l /path/to/jenkins/agent/home
    
  • Grant permissions (if necessary): Ensure the user Jenkins connects as has read and write access. Use chown and chmod cautiously.

Jenkins Controller Issues

Sometimes, the problem might not be with the agent but with the Jenkins controller itself.

Controller Overload

If the Jenkins controller is under heavy load (many jobs running, high CPU/memory usage), it might struggle to manage agent connections. Monitor the controller's resource utilization.

JNLP Port Conflicts

If the JNLP port (default 50000) is already in use by another process on the Jenkins controller, agents will fail to connect.

  • Check port usage: On the controller machine, use netstat or ss to see which process is using the port.
    sudo netstat -tulnp | grep 50000
    # or
    sudo ss -tulnp | grep 50000
    
    If another process is using it, you'll need to reconfigure either Jenkins or the other application to use different ports.

Advanced Troubleshooting and Logs

When standard checks don't reveal the issue, deeper investigation is needed.

Jenkins Controller Logs

Review the Jenkins controller logs for errors related to agent connections. These logs can provide specific error messages.

  • Location: Typically found in $JENKINS_HOME/jenkins.log or accessible via Manage Jenkins -> System Log.
  • Look for: Messages mentioning the agent's hostname, IP address, connection attempts, JSch exceptions, or Connection refused errors.

Agent Logs

If the agent is running but reporting offline, check its logs for any errors.

  • JNLP Agents: The agent process itself might output logs to its console or a designated log file.
  • SSH Agents: Logs might be in $JENKINS_HOME/agent.log on the agent machine, or related to sshd if the connection fails at the SSH level.

Enable Debug Logging

For very persistent issues, temporarily enabling debug logging for relevant Jenkins components can provide more granular information.

  • JNLP/Agent Communication: You might need to adjust Java system properties or use Jenkins' logging configuration (Manage Jenkins -> System Log -> Log Recorders) to increase verbosity for hudson.slaves or related packages.

Practical Habits That Prevent Repeat Outages

Troubleshooting Jenkins agent connectivity requires a systematic approach, starting with basic network checks and progressing to Jenkins-specific configurations.

  • Verify Network: Always start with ping and telnet/nc to ensure basic network reachability and port access.
  • Check Firewalls: Ensure firewalls on both the controller and agent, as well as any network firewalls, permit traffic on the required ports.
  • Validate Credentials: Double-check SSH keys, usernames, and passwords.
  • Confirm Agent Service: For SSH agents, ensure sshd is running and accessible.
  • Monitor Jenkins Logs: Controller logs are your primary source for understanding connection failures.
  • Use Specific IPs: Where possible, configure firewalls and Jenkins to use specific IP addresses rather than broad ranges or 0.0.0.0.

By following these steps, you can effectively diagnose and resolve most common Jenkins agent connectivity problems, keeping your CI/CD pipelines running smoothly.

Reading the Offline Message Without Guessing

The word "offline" is too broad to troubleshoot by itself. Before changing Jenkins settings, open the agent page and read the exact reason Jenkins gives. There is a big difference between "connection refused", "permission denied", "host key verification failed", "JNLP agent rejected", and "channel was closed". They all end with an offline node, but they point to different layers.

I usually write the symptom down in plain language: "controller cannot reach TCP port 22", "SSH login works but Java cannot start", "inbound agent starts but cannot call back to the controller", or "agent connects and then drops during builds." That one sentence keeps the investigation focused.

If the agent has never connected, suspect configuration, DNS, firewall, credentials, or launch command. If it connected for months and started failing today, check recent changes: rotated SSH keys, a Jenkins upgrade, a plugin update, a new firewall rule, an expired certificate, an agent image rebuild, or a cloud networking change. The timeline is often more useful than the error text.

SSH Agents: Separate Login Problems from Launch Problems

For SSH-based agents, test the same path Jenkins uses. From the controller host, connect as the Jenkins-configured user:

ssh -vvv jenkins-agent-user@agent-hostname

The verbose output tells you whether the failure happens before authentication, during authentication, or after login. If SSH never reaches the server, Jenkins cannot fix that. Check routing, security groups, network ACLs, host firewalls, and the SSH daemon. If SSH reaches the server but rejects the key, check the credential in Jenkins, the user's authorized_keys, file permissions, and whether the account is locked.

If SSH login works manually but Jenkins still fails, look at the remote root directory and Java startup. Jenkins needs a writable directory for the remoting files, and the agent user needs permission to create files there. A common mistake is pointing the remote root to a path owned by root or cleaned by another process.

Run these checks on the agent:

whoami
pwd
java -version
test -w /path/to/jenkins-agent && echo writable
df -h /path/to/jenkins-agent

Java version matters because modern Jenkins controllers require compatible Java versions on agents. The exact requirement depends on your Jenkins release, so check the Jenkins documentation for your version instead of assuming an old agent image is still valid.

Inbound Agents: The Callback Path Is the Usual Trap

Inbound agents are often used when the controller cannot initiate SSH to the agent, such as agents behind NAT or in restricted networks. The agent process starts outside Jenkins and connects back to the controller. That means the network path is reversed: the agent must resolve and reach the Jenkins URL.

On the agent host, test the Jenkins URL exactly as configured:

curl -I https://jenkins.example.com/

If Jenkins is behind a reverse proxy, confirm the public URL in Manage Jenkins > System is correct. A wrong Jenkins URL can make generated agent commands point to an internal hostname the agent cannot resolve. If WebSocket mode is enabled for inbound agents, make sure the proxy supports WebSocket upgrade headers. If you use the TCP inbound agent port instead, confirm the fixed port is configured and reachable from the agent network.

TLS issues can look like Jenkins issues. If the agent runs in a minimal container image, it may not have your internal CA certificate. curl will usually expose that quickly. Install the CA certificate into the agent image rather than disabling certificate verification.

Agents That Disconnect During Builds

An agent that connects successfully and then drops during a build is usually not a basic connectivity problem. Look at resource pressure and process lifecycle.

Check whether the operating system killed the agent process:

dmesg -T | grep -i -E 'killed process|out of memory'
journalctl -u jenkins-agent --since '2 hours ago'

Also check disk space. Jenkins remoting, checkout, test reports, and archived artifacts all need space. A full workspace volume can make an agent appear unreliable because the remoting process cannot write temporary files or logs.

If disconnections happen during large console output, artifact archiving, or test report publishing, look at network stability and controller load. The agent channel is a live connection. Long garbage collection pauses on the controller, overloaded proxies, idle connection timeouts, and packet loss can all close it. For agents crossing load balancers or corporate proxies, verify idle timeout settings and keepalive behavior.

DNS and Host Key Problems

DNS changes are easy to miss. Jenkins may connect to build-agent-01, while your manual test uses an IP address. Test the hostname from the controller:

getent hosts build-agent-01
nc -vz build-agent-01 22

If the hostname resolves to the wrong address, fix DNS or the agent configuration. Avoid long-term /etc/hosts patches unless you have a clear ownership process, because they become invisible infrastructure.

For SSH agents, host key verification protects Jenkins from connecting to an unexpected machine. If an agent was rebuilt, its host key may have changed. Do not blindly disable verification. Confirm the rebuild, remove the old key from the controller user's known_hosts, and accept the new key through the configured Jenkins strategy.

A Recovery Checklist for Production Agents

When several agents go offline at once, avoid fixing them one by one before finding the shared cause. Ask:

  1. Did the Jenkins controller restart or upgrade?
  2. Did a shared credential rotate?
  3. Did a base agent image change?
  4. Did a firewall, proxy, VPN, or DNS change roll out?
  5. Are all failed agents in the same subnet, cloud account, Kubernetes namespace, or availability zone?

If only one agent fails, inspect that host. If a whole group fails together, inspect the common dependency. This saves a lot of time in larger Jenkins fleets.