Understanding Exit Codes: Effective Error Handling with $? and exit

Use Bash exit codes, $?, exit, set -e, and pipefail to make script failures clear and controlled.

Understanding Exit Codes: Effective Error Handling with $? and exit

When a Bash script fails, the exit code tells the caller what happened next: continue, retry, alert, or stop. Understanding exit codes, $?, and exit is the difference between automation that hides failures and automation that reports them clearly.

This guide shows how Bash tracks command status and how you can use that status for simple, reliable error handling.

The Concept of Exit Statuses

Every command or program executed in a Unix-like shell environment—whether it's a built-in command like cd, an external utility like grep, or another shell script—returns an integer value upon completion. This integer is the exit code, which signals the outcome of the operation to the calling process.

The Standard Convention

The convention for exit codes is universally recognized:

  • 0 (Zero): Signifies success. The command executed exactly as expected, and no errors occurred.
  • 1 to 255: Signify failure or specific error conditions. These non-zero values indicate that something went wrong. Higher numbers often correspond to specific types of errors (e.g., file not found, permission denied, syntax error), though the exact meaning depends on the specific program.

Note on Range: While exit codes are technically an 8-bit value (0-255), shell scripts usually only concern themselves with 0 for success and non-zero for failure. Exit codes greater than 255 are usually truncated or interpreted modulo 256 by the shell.

Inspecting the Last Exit Code: The $? Variable

The special shell variable $? (dollar question mark) is central to monitoring command status. Immediately after any command executes, the shell stores its exit code in $?.

How to Use $?

You must check $? immediately after the command you are interested in, as any subsequent command (even echoing the variable) will overwrite its value.

Example 1: Checking Success and Failure

# 1. A successful command
echo "Success test" > /dev/null
echo "Exit code for success: $?"

# 2. A failing command (e.g., trying to list a non-existent file)
ls /non/existent/path
echo "Exit code for failure: $?"

Expected Output:

Exit code for success: 0
ls: cannot access '/non/existent/path': No such file or directory
Exit code for failure: 2

Implementing Conditional Error Checking

Simply knowing the exit code isn't enough; the power comes from using this information to control script flow. This is typically done using if statements or short-circuit operators (&& and ||).

Using if Statements

This is the most explicit way to handle errors:

if grep -q "important data" logfile.txt;
then
    echo "Data found successfully."
else
    LAST_STATUS=$?
    echo "Error: Grep failed with status $LAST_STATUS. Data not found."
    # Consider exiting here if the script cannot proceed
fi

In the example above, grep -q suppresses output (-q) and returns 0 only if a match is found. The if structure checks the exit status automatically, but explicitly capturing $? inside the else block is useful for detailed logging.

Using Short-Circuit Logic (&& and ||)

For simple sequential checks, short-circuit operators provide concise error handling:

  • && (AND): The command following && only executes if the preceding command succeeded (returned 0).
  • || (OR): The command following || only executes if the preceding command failed (returned non-zero).

Example 2: Concise Error Handling

# 1. Only run 'process_data' IF 'fetch_data' succeeds
fetch_data.sh && ./process_data.sh

# 2. Run 'send_alert' ONLY IF the primary operation fails
rsync -a source/ dest/ || echo "RSync failed on $(date)" >> /var/log/rsync_errors.log

Controlling Script Termination with exit

The exit command is used to immediately terminate the current shell script or function and return a specified exit status to the caller (which might be another script or the user's terminal).

Syntax and Usage

The syntax is simply exit [status_code].

If no status is provided, exit defaults to the status of the most recently executed foreground command. If you explicitly call exit 0 without running any command first, it returns 0.

Example 3: Exiting on Pre-Condition Failure

This script ensures a required configuration file exists before proceeding.

CONFIG_FILE="/etc/app/config.conf"

if [[ ! -f "$CONFIG_FILE" ]]; then
    echo "Error: Configuration file not found at $CONFIG_FILE."
    # Terminate script immediately with a specific error code (e.g., 20)
    exit 20 
fi

echo "Configuration loaded. Continuing script..."
# ... rest of script
exit 0

Best Practice: Using Meaningful Exit Codes

While 0 and 1 cover most basic cases, using different non-zero codes helps the calling script diagnose the exact problem:

Code Meaning (Example)
0 Success
1 General catch-all error
2-10 Syntax errors, argument parsing issues
20 Missing prerequisite (e.g., file not found)
30 Permission issue

Making Scripts Fail Fast: The set Command

For maximum reliability in complex scripts, it is a strong best practice to enable error checking globally using the set command options at the top of your script:

#!/bin/bash

# Exit immediately if a command exits with a non-zero status.
set -e

# Treat unset variables as an error when substituting.
set -u

# Pipefail: Ensures that a pipeline's return status is the status of the rightmost command that exited with a non-zero status.
set -o pipefail

# (Optional but helpful) Print commands as they are executed for debugging
# set -x 

# If any command below fails, the script stops immediately.
ls /valid/path && grep pattern file.txt && ./next_step.sh

# The following line will ONLY run if all preceding commands succeeded.
echo "All steps complete."

When set -e is active, many unhandled non-zero statuses stop the script before later commands run on bad assumptions. It has exceptions in conditionals, pipelines, and compound commands, so still handle expected failures explicitly.

For example, grep returns 1 when it finds no match. That may be a normal result, not a fatal error:

if grep -q "READY" status.txt; then
    echo "Service is ready."
else
    echo "Service is not ready yet."
fi

Takeaway

Check critical commands where they run, write errors to stderr, and exit with a non-zero status when the script cannot continue safely. Use set -euo pipefail for fail-fast scripts, but do not rely on it as your only error-handling strategy.