Troubleshooting Linux Services with systemctl and journalctl
A practical workflow for debugging failed or unhealthy Linux services with systemctl and journalctl.
Troubleshooting Linux Services with systemctl and journalctl
When a Linux service fails, the fastest path is usually not a web search. It is three local checks: what systemd thinks happened, what the service logged, and what changed before the failure. systemctl and journalctl give you those answers without guessing.
This guide uses common service failures as examples: a service that will not start, a process that is running but not doing useful work, and a service that exits after it looked healthy. The commands apply to most systemd-managed services, but exact unit names and log locations vary by distribution and package.
Understanding systemctl and journalctl
Before diving into troubleshooting, it's crucial to understand the roles of these two primary tools:
systemctl: This command is the central utility for controlling and querying thesystemdsystem and service manager. It allows you to start, stop, restart, check the status of, and enable/disable services.journalctl: This command is used to query the systemd journal, which is a centralized logging system. It collects logs from the kernel, system services, and applications, providing a unified view of system events.journalctlis invaluable for understanding why a service failed or behaved unexpectedly.
Common Troubleshooting Scenarios and Solutions
Let's explore typical problems and how to tackle them:
1. Service Failed to Start
This is perhaps the most common issue. You try to start a service, and it immediately fails.
Step 1: Check the Service Status
Use systemctl status to get an immediate overview of the service's state and recent log entries.
sudo systemctl status apache2.service
Expected Output (Illustrative - yours may vary):
● apache2.service - The Apache HTTP Server
Loaded: loaded (/lib/systemd/system/apache2.service; enabled; vendor preset: enabled)
Active: **failed** (result: exit-code) since Tue 2023-10-27 10:00:00 UTC; 1min ago
Docs: https://httpd.apache.org/docs/2.4/
Process: 12345 ExecStart=/usr/sbin/apachectl start (code=exited, status=1/FAILURE)
Main PID: 12345 (code=exited, status=1/FAILURE)
Oct 27 10:00:00 your-server systemd[1]: Starting The Apache HTTP Server...
Oct 27 10:00:00 your-server apachectl[12345]: AH00526: Syntax error on line 123 of /etc/apache2/apache2.conf:
Oct 27 10:00:00 your-server apachectl[12345]: Invalid Mutex directory in argument file: '/var/run/apache2/'
Oct 27 10:00:00 your-server systemd[1]: apache2.service: Main process exited, code=exited, status=1/FAILURE
Oct 27 10:00:00 your-server systemd[1]: **Failed** to start The Apache HTTP Server.
Oct 27 10:00:00 your-server systemd[1]: apache2.service: Unit entered failed state.
Analysis: The systemctl status output clearly shows Active: failed and provides a snippet of the error message: Invalid Mutex directory in argument file: '/var/run/apache2/'. This suggests a configuration problem.
Step 2: Investigate Logs with journalctl
For more detailed information, use journalctl to view logs specifically for the failed service. The -u flag specifies the unit (service).
sudo journalctl -u apache2.service -xe
-u apache2.service: Filters logs for theapache2.serviceunit.-x: Adds explanations for some log messages.-e: Jumps to the end of the journal, showing the most recent entries.
Potential Findings: The journalctl output might reveal more context about the configuration error, permission issues, or dependency problems.
Step 3: Check Configuration Files
Based on the error message, examine the relevant configuration files. In the example above, it points to /etc/apache2/apache2.conf and the directory /var/run/apache2/.
sudo nano /etc/apache2/apache2.conf
Solution: Issues like this often come from a missing runtime directory, a packaging change, or a configuration file that references a path that no longer exists. Do not blindly create directories from an example on the internet. First confirm what the application expects on your distribution, then fix the missing path or configuration. A possible repair might look like this:
sudo mkdir -p /var/run/apache2/
sudo chown www-data:www-data /var/run/apache2/
sudo systemctl start apache2.service
If the error mentions a syntax problem, run the application's own configuration test before restarting again:
sudo apachectl configtest
sudo nginx -t
sudo sshd -t
Application-specific validators catch mistakes that systemd cannot understand. Systemd knows whether the process exited. It does not know whether your Nginx server block points to the wrong certificate file or whether an Apache directive belongs in a different context.
2. Service is Running but Not Responding
Sometimes, systemctl status shows a service as active (running), but it's not performing its intended function (e.g., a web server isn't serving pages).
Step 1: Verify Service Status and PID
Confirm it's actually running and has a Process ID (PID).
sudo systemctl status nginx.service
If it shows active (running), note the PID.
Step 2: Examine Service Logs for Errors
Even if running, the service might be encountering internal errors that prevent it from functioning correctly.
sudo journalctl -u nginx.service -f
-f: Follows the log output in real-time. This is useful if you can trigger the issue (e.g., try to access the web page) whilejournalctlis running.
Step 3: Check Application-Specific Logs
Many services write their own logs in addition to systemd's journal. For web servers like Nginx or Apache, check their typical log locations (e.g., /var/log/nginx/error.log, /var/log/apache2/error.log).
sudo tail -n 50 /var/log/nginx/error.log
Step 4: Check Resource Utilization
An overloaded system can cause services to become unresponsive.
top
htop
free -h
Look for high CPU, memory, or disk I/O by the service's processes.
Also check whether the service is listening where you expect:
sudo ss -ltnp
sudo ss -lunp
For a web service, seeing nginx active in systemctl is only half the story. You still need to know whether it is bound to 0.0.0.0:80, 127.0.0.1:8080, an IPv6 socket, or no socket at all. A firewall rule, reverse proxy mismatch, or bad bind address can make a healthy process look broken from the outside.
Solution: If logs indicate issues or resources are strained, you might need to:
- Optimize configurations.
- Restart the service (
sudo systemctl restart <service_name>.service). - Investigate underlying system resource issues.
- Increase system resources if necessary.
3. Service Stops Unexpectedly
If a service that was previously running suddenly stops, it's often due to an unhandled exception or a watchdog timeout.
Step 1: Check Recent History with journalctl
Use journalctl to see what happened just before the service stopped. The --since and --until flags can be helpful if you know the approximate time.
sudo journalctl -u <service_name>.service --since "1 hour ago"
Or, to see all logs related to the service since the last boot:
sudo journalctl -u <service_name>.service -b
Step 2: Look for Core Dumps or Crash Reports
If the service crashed, the system might have generated a core dump or a crash report.
ls -l /var/crash/
Step 3: Review systemd Service Unit File
Examine the service's unit file (usually in /etc/systemd/system/ or /lib/systemd/system/) for Restart= directives and WatchdogSec= settings. An incorrect Restart= configuration or a WatchdogSec= that's too short could cause unexpected restarts or failures.
systemctl cat <service_name>.service
Solution: Address the root cause identified in the logs. This might involve fixing code bugs, adjusting systemd unit file parameters, or increasing resource limits.
If you see repeated restarts, check whether systemd has rate-limited the unit:
systemctl status <service_name>.service
journalctl -u <service_name>.service --since "30 minutes ago"
Messages about Start request repeated too quickly usually mean the service crashed several times in a short window. After fixing the underlying problem, clear the failed state:
sudo systemctl reset-failed <service_name>.service
sudo systemctl start <service_name>.service
4. systemctl enable or systemctl disable Issues
While not a runtime failure, problems enabling or disabling services can occur.
Problem: A service is enabled but doesn't start on boot, or vice versa.
Check Status:
sudo systemctl is-enabled <service_name>.service
This command will output enabled or disabled.
Troubleshooting:
- Ensure the service unit file itself is valid and placed correctly (e.g.,
/etc/systemd/system/). - After making changes to a unit file, always run
sudo systemctl daemon-reload. - Check logs for the service (
journalctl -u <service_name>.service) for any startup errors that might prevent it from becoming active even if enabled.
Tips for Effective Troubleshooting
- Start with
systemctl status: Always begin here. It provides a quick snapshot and often points you in the right direction. - Use
journalctl -u <service>: This is your primary tool for understanding why something is happening. -fflag withjournalctl: Extremely useful for real-time monitoring when trying to reproduce an issue.systemctl restart <service>: After making configuration changes, always restart the service to apply them.systemctl daemon-reload: Crucial after modifying any.serviceunit files.- Check Dependencies: Sometimes a service fails because a service it depends on hasn't started or is failing itself.
systemctl statuswill often show this. - Permissions: Many service failures are due to incorrect file or directory permissions. Ensure the user the service runs as has the necessary access.
- Network Issues: If the service relies on the network, check network connectivity, firewall rules, and port availability.
A Troubleshooting Order That Holds Up
When the pressure is on, use the same order every time:
systemctl status <service>.service
journalctl -u <service>.service -b --no-pager
systemctl cat <service>.service
systemctl list-dependencies <service>.service
Start with the current state, then read logs from the current boot, then inspect the unit exactly as systemd sees it, then check dependencies. If the service is network-facing, add ss -ltnp and a local curl or client test. If it reads a config file, run the service's own config validator.
The point is to avoid random restarts. Restarting can be a valid fix after a config change or a stuck process, but it also destroys evidence. Read enough of the journal first that you know what you are changing and why.
Reading Journal Output Without Getting Lost
journalctl can be noisy, especially on busy servers. Start narrow, then widen only when you need to.
For one service in the current boot:
journalctl -u <service>.service -b --no-pager
For the last few minutes:
journalctl -u <service>.service --since "15 minutes ago" --no-pager
For the previous boot:
journalctl -u <service>.service -b -1 --no-pager
That previous-boot view is useful when a service failed during startup and then recovered, or when the whole machine rebooted before you could inspect it. You can list boots with:
journalctl --list-boots
If the service logs structured fields or long lines, use short ISO timestamps:
journalctl -u <service>.service -o short-iso --no-pager
When you need to share logs, remove secrets, tokens, internal hostnames, and customer data. Service logs often include environment-derived settings, URLs, headers, or connection strings. A clean troubleshooting habit includes redaction before pasting output anywhere.
When systemctl Says "Active" but Users Still See Failure
An active (running) state only means systemd has a process that matches the unit's expectations. It does not prove the application is healthy. A web application can be running while returning HTTP 500. A worker can be active while stuck on a bad queue message. A database proxy can be running while all backend connections fail.
For network services, test from the same layers your users depend on:
curl -v http://127.0.0.1:8080/health
curl -v http://localhost/health
curl -v https://service.example.com/health
Those three checks answer different questions. The first checks the local app port. The second may include a local reverse proxy. The third checks DNS, TLS, routing, firewall rules, and the public-facing path.
For worker services, look at the thing they consume or produce. A queue worker may need a queue depth check. A backup service may need a recent output file. A metrics collector may need a query against the metrics backend. systemctl tells you whether supervision is working; application checks tell you whether the service is useful.
Fix One Variable at a Time
When a unit fails after a deployment, it is tempting to change several things and restart. That can hide the real cause. Prefer one change at a time:
systemctl cat my-app.service
journalctl -u my-app.service --since "30 minutes ago" --no-pager
sudo systemctl edit my-app.service
sudo systemctl daemon-reload
sudo systemctl restart my-app.service
Then check the result before changing the next thing. If the failure is a missing file, fix the file path. If it is a permission error, fix ownership or mode. If it is a dependency, fix the unit relationship or the application retry behavior. Slow, boring troubleshooting is often faster than a restart loop with five untracked edits.