Troubleshooting Systemd Service Failures: A Step-by-Step Guide
Diagnose systemd service failures with status checks, journal logs, unit file review, dependency fixes, and environment debugging.
Troubleshooting Systemd Service Failures: A Step-by-Step Guide
Systemd service failures are easier to debug when you slow down and follow the evidence. A failed unit usually leaves three useful clues: the state systemd recorded, the command it tried to run, and the logs written by either systemd or the application. If you read those in order, you avoid the common trap of editing a unit file before you know whether the problem is the unit, the application, a dependency, or the host.
The examples below use a fictional mywebapp.service, but the same workflow applies to database helpers, queue consumers, backup jobs, exporters, and internal daemons.
The First Line of Defense: systemctl status
When a service fails to start, the very first command you should run is systemctl status <service_name>. This command provides a snapshot of the service's current state, including whether it's active, loaded, and, crucially, a snippet of its recent logs. This often provides enough information to quickly identify the problem.
Let's say your web application service, mywebapp.service, isn't starting:
systemctl status mywebapp.service
Example Output Interpretation:
● mywebapp.service - My Web Application
Loaded: loaded (/etc/systemd/system/mywebapp.service; enabled; vendor preset: disabled)
Active: failed (Result: exit-code) since Mon 2023-10-26 10:30:05 UTC; 10s ago
Process: 12345 ExecStart=/usr/local/bin/mywebapp-start.sh (code=exited, status=1/FAILURE)
Main PID: 12345 (code=exited, status=1/FAILURE)
CPU: 10ms
Oct 26 10:30:05 hostname systemd[1]: Started My Web Application.
Oct 26 10:30:05 hostname mywebapp-start.sh[12345]: Error: Port 8080 already in use
Oct 26 10:30:05 hostname systemd[1]: mywebapp.service: Main process exited, code=exited, status=1/FAILURE
Oct 26 10:30:05 hostname systemd[1]: mywebapp.service: Failed with result 'exit-code'.
From this output, we can immediately see:
- The service
mywebapp.serviceisfailed. - It failed with
Result: exit-code, meaning theExecStartcommand exited with a non-zero status. - The
Processline shows the commandmywebapp-start.shfailed withstatus=1/FAILURE. - Crucially, the log lines indicate:
Error: Port 8080 already in use. This is a clear indicator of the problem.
This command is your first diagnostic tool, often pointing directly to the cause or narrowing down where to look next.
Diving Deep with journalctl
While systemctl status provides a quick summary, journalctl is your go-to command for detailed logging. It queries the systemd journal, which collects logs from all parts of the system, including services.
Basic Log Review
To view all logs for a specific service, including historical entries:
journalctl -u mywebapp.service
This will show all log entries associated with mywebapp.service. If the service fails repeatedly, you'll see entries from each failed attempt.
Filtering and Time-Based Queries
To narrow down the results, especially after a recent failure, you can use flags like --since and --priority:
- Show logs since a specific time:
journalctl -u mywebapp.service --since "10 minutes ago" journalctl -u mywebapp.service --since "2023-10-26 10:00:00" - Show only error-level messages or higher:
journalctl -u mywebapp.service -p err - Combine with
-xefor extended explanation and verbose output:journalctl -u mywebapp.service -xe --since "5 minutes ago"-xcan add explanatory text for some systemd messages. Treat those explanations as hints, not as a replacement for the unit-specific logs.
Understanding Log Messages
Look for keywords like Error, Failed, Warning, or application-specific messages that indicate what went wrong. Pay attention to timestamps to understand the sequence of events leading up to the failure.
Tip: If your service's ExecStart script prints to standard output or standard error, those messages are usually captured by journalctl. Ensure your scripts log descriptive error messages.
Inspecting the Unit File: The Blueprint of Your Service
Every systemd service is defined by a unit file (e.g., mywebapp.service). Misconfigurations in this file are a common source of startup failures. You need to understand what the service is trying to do.
Retrieving the Unit File
To view the active unit file for your service:
systemctl cat mywebapp.service
This command shows the exact unit file that systemd is using, including any overrides.
Key Directives to Check
Focus on the [Service] section for execution-related issues and [Unit] for dependencies.
ExecStart: This is the command systemd executes to start your service. Verify the path is correct and the command itself is executable and runs successfully when invoked manually (e.g., as theUserspecified).ExecStart=/usr/local/bin/mywebapp-start.shType: Defines the process startup type. Common types include:simple(default):ExecStartis the main process.forking:ExecStartforks a child process and the parent exits. Systemd waits for the parent to exit.oneshot:ExecStartruns and exits; systemd considers the service active as long as the command is running.notify: Service sends a notification to systemd when ready.- Incorrect
Typecan lead to systemd thinking a service failed when it actually started, or vice-versa.
User/Group: The user and group under which the service will run. Permissions issues often stem from the service attempting to access files or resources it doesn't have rights to under this user.User=mywebappuser Group=mywebappgroupWorkingDirectory: The directory the service will execute from. Relative paths inExecStartor other commands depend on this.Restart: Defines when the service should be restarted. If set toon-failureoralways, a failing service might constantly restart, making it harder to catch the initial failure.TimeoutStartSec/TimeoutStopSec: How long systemd waits for the service to start or stop. If a service takes longer to initialize thanTimeoutStartSec, systemd will kill it and report a failure.
Common Unit File Issues
- Incorrect paths: Typo in
ExecStartor other file paths. - Missing
Environmentvariables: Services often require specific environment variables (e.g.,PATH) that might not be present in systemd's clean environment (see below). - Permissions: The
Userspecified doesn't have execute permissions for the script or read/write permissions for necessary data files. - Syntax errors: Simple typos in the unit file itself.
To test ExecStart manually:
Switch to the service's user and try running the command directly:
sudo -u mywebappuser /usr/local/bin/mywebapp-start.sh
This often reproduces the error seen in journalctl directly in your terminal, making debugging easier.
Dependency Management: When Services Can't Start Alone
Services often rely on other services or system components to be active before they can start themselves. Systemd uses Wants, Requires, After, and Before directives to manage these dependencies.
Identifying Dependencies
Use systemctl list-dependencies <service_name> to see what a service explicitly requires or wants to run.
systemctl list-dependencies mywebapp.service
Common directives in [Unit] section:
After=: Specifies that this service should start after the listed units. If the listed unit fails, this service will still attempt to start (unlessRequires=is also used).Requires=: Specifies that this service requires the listed units. If any of the required units fail to start, this service will not start.Wants=: A weaker form ofRequires=. If a wanted unit fails, this service will still attempt to start.
Example:
[Unit]
Description=My Web Application
After=network.target mysql.service
Requires=mysql.service
Here, mywebapp.service is ordered after network.target and mysql.service, and it requires mysql.service to be started successfully. If mysql.service fails, mywebapp.service will not start.
Resolving Dependency Conflicts
If a service fails due to a dependency issue, journalctl will usually indicate which dependency couldn't be met. For example, it might state Dependency failed for My Web Application followed by details about mysql.service's failure.
Steps to resolve:
- Check the dependent service: Run
systemctl status <dependent_service>(e.g.,systemctl status mysql.service) andjournalctl -u <dependent_service>to troubleshoot its failure first. - Verify
After=andRequires=directives: Ensure they correctly reflect the desired startup order and strictness. Sometimes, a service needs to wait for a specific port to be open, not just for another unit's start job to finish. For narrow checks,ExecStartPre=can help. For network daemons, socket activation or application-level retry logic is often more reliable.
Environment Variables and Paths: The Hidden Gotchas
Systemd services run in a very clean and minimal environment. This often leads to issues where commands that work perfectly in a user's shell fail when run by systemd because crucial environment variables (like PATH) are missing.
Systemd's Clean Environment
When systemd starts a service, it doesn't inherit the full environment of the user who initiated systemctl start. The PATH variable, for instance, is often stripped down, meaning commands like python or node might not be found if they're not in standard locations like /usr/bin or /bin.
Symptom: ExecStart=/usr/local/bin/myscript.sh fails with python: command not found, node: command not found, a missing library error, or an application message saying a required setting is empty.
Fix: Make the service environment explicit.
[Service]
WorkingDirectory=/opt/mywebapp
Environment="APP_ENV=production"
Environment="PATH=/opt/mywebapp/venv/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin"
ExecStart=/opt/mywebapp/venv/bin/gunicorn app:app
For many variables, use an environment file:
[Service]
EnvironmentFile=/etc/mywebapp/mywebapp.env
ExecStart=/opt/mywebapp/bin/server
Keep that file simple. EnvironmentFile= is not a Bash script. Use KEY=value lines, not export KEY=value, command substitution, or shell conditionals. Also set restrictive permissions if the file contains secrets:
sudo chown root:mywebapp /etc/mywebapp/mywebapp.env
sudo chmod 0640 /etc/mywebapp/mywebapp.env
Permissions: Reproduce the Failure as the Service User
Permissions problems are common because manual testing often happens as root or as your login user, while the unit runs as a dedicated service account.
Check the configured user:
systemctl show mywebapp.service -p User -p Group
Then run the same command as that user:
sudo -u mywebappuser /usr/local/bin/mywebapp-start.sh
If the app needs a working directory, include it:
sudo -u mywebappuser bash -lc 'cd /opt/mywebapp && /usr/local/bin/mywebapp-start.sh'
Look beyond the executable. The service user may need read access to /etc/mywebapp/config.yml, write access to /var/lib/mywebapp, execute access on every parent directory, or permission to create a Unix socket under /run/mywebapp. A quick check can save a lot of guessing:
sudo -u mywebappuser test -r /etc/mywebapp/config.yml
sudo -u mywebappuser test -w /var/lib/mywebapp
namei -l /var/lib/mywebapp/uploads
If the service fails only when binding to a low port such as 80 or 443, do not immediately run it as root. A reverse proxy, socket activation, or a targeted capability may be safer depending on the service.
Start Limits and Restart Loops
A service that crashes repeatedly may stop with a message like start request repeated too quickly. That means systemd's rate limit kicked in. The original failure happened earlier, so do not focus only on the rate-limit message.
Use:
journalctl -u mywebapp.service --since "30 minutes ago"
systemctl show mywebapp.service -p NRestarts -p Restart -p StartLimitBurst -p StartLimitIntervalUSec
After fixing the root cause, clear the failed state:
sudo systemctl reset-failed mywebapp.service
sudo systemctl start mywebapp.service
Be careful with Restart=always. It is useful for resilient daemons, but during debugging it can flood the journal and hide the first clear error. You can temporarily stop the unit, review the logs, and start it manually once you have changed one thing.
Validate the Unit Before Reloading
Before you restart a service after editing a unit file, validate the file and reload systemd:
sudo systemd-analyze verify /etc/systemd/system/mywebapp.service
sudo systemctl daemon-reload
sudo systemctl restart mywebapp.service
If the service has drop-in overrides, inspect the merged version:
systemctl cat mywebapp.service
systemctl show mywebapp.service -p FragmentPath -p DropInPaths -p ExecStart
This catches the awkward cases: you edited a file under /usr/lib/systemd/system, but a drop-in under /etc/systemd/system/mywebapp.service.d/override.conf still changes ExecStart; or you fixed a copied unit file that is not the one systemd loaded.
A Practical Order of Operations
When a production service is down, use a short, repeatable loop:
- Run
systemctl status mywebapp.service --no-pager. - Read
journalctl -u mywebapp.service --since "15 minutes ago". - Inspect
systemctl cat mywebapp.service. - Check the command, user, working directory, environment, and dependencies.
- Reproduce the command as the service user.
- Make one change.
- Run
systemctl daemon-reloadif the unit changed. - Restart and check the journal again.
That order keeps the investigation grounded. If the journal says Permission denied, fix permissions. If it says No such file or directory, check paths from systemd's point of view. If it says Dependency failed, debug the dependency first. If it says the process exited with status 0/SUCCESS but the service is failed, check Type= and whether the application daemonizes or exits immediately.
The goal is not to memorize every systemd directive. It is to keep matching the failure message to the layer that produced it.