Advanced Bash Scripting: Best Practices for Error Handling

Improve Bash error handling with strict mode, explicit checks, cleanup traps, clear exit codes, and stderr logging.

Advanced Bash Scripting: Best Practices for Error Handling

Bash scripting error handling is what keeps a small automation mistake from becoming a messy production problem. If a backup fails, an API call returns an error, or a temp file is left behind, your script should stop clearly and leave the system in a known state.

Use these patterns when your script changes files, deploys code, talks to remote services, or runs without someone watching the terminal.

The Foundation: Understanding Exit Codes

Every command executed in Bash, whether successful or failed, returns an exit status (or exit code). This is the fundamental mechanism for signaling command outcomes.

  • Exit Code 0: Indicates successful execution. By convention, zero means success.
  • Exit Code 1-255 (Non-zero): Indicates an error, failure, or warning. Specific non-zero codes often denote specific error types (e.g., 1 generally means generic error, 2 often means misuse of shell command).

The most recent exit status is stored in the special variable $?.

# Successful command
ls /tmp
echo "Status: $?"
# Status: 0

# Failed command (non-existent file)
cat /nonexistent_file
echo "Status: $?"
# Status: 1 (or higher, depending on the error)

Mandatory Best Practice: Implementing Strict Mode

For any serious Bash script, three directives should be placed immediately after the shebang line. Collectively, these are often called "strict mode." They push the script toward failing early instead of continuing after a broken prerequisite.

1. Exit Immediately on Error (set -e)

The set -e or set -o errexit command instructs Bash to immediately exit the script if any command exits with a non-zero status. This prevents cascading failures.

Warning: set -e is ignored in conditional tests (if, while) or if a command is part of an && or || list. The failure status must be explicitly used by the surrounding structure.

2. Treat Unset Variables as Errors (set -u)

The set -u or set -o nounset command causes the script to exit immediately if it attempts to use a variable that has not been set (e.g., mistyping $FIELNAME instead of $FILENAME). This prevents hard-to-debug bugs resulting from empty or unintended variables.

3. Handle Errors in Pipelines (set -o pipefail)

By default, if a series of commands is piped together (e.g., cmd1 | cmd2 | cmd3), Bash reports only the exit status of the last command (cmd3). If cmd1 fails, the script might continue executing successfully.

set -o pipefail ensures that the exit status of the pipeline is the exit status of the last command that failed, or zero if all commands succeeded. This is critical for reliable data processing.

Standard Strict Mode Header

Always start advanced scripts with this robust header:

#!/bin/bash

set -euo pipefail

Some older templates also set IFS=$'\n\t'. Use that only when you understand how it affects word splitting in the rest of the script. Quoting variables and reading input with while IFS= read -r line is usually clearer.

Conditional Error Checking

While set -e handles unexpected errors, you often need to check specific conditions or provide custom error messages.

Using if Statements and Custom Functions

Instead of relying solely on set -e, use if blocks to handle known potential failures gracefully and provide descriptive output.

# Define a custom error function for consistency
error_exit() {
    printf '[FATAL] %s\n' "$1" >&2
    exit 1
}

TEMP_DIR="/tmp/data_processing_$(date +%s)"

# Check if directory creation was successful
if ! mkdir -p "$TEMP_DIR"; then
    error_exit "Failed to create temporary directory: $TEMP_DIR"
fi

echo "Temporary directory created successfully: $TEMP_DIR"

# Example of checking if a file exists before processing
FILE_TO_PROCESS="input.csv"

if [[ ! -f "$FILE_TO_PROCESS" ]]; then
    error_exit "Input file not found: $FILE_TO_PROCESS"
fi

Short-Circuit Logic (&& and ||)

For simple, sequential operations, use short-circuit operators. This is highly readable and concise.

  • Success Chain (&&): The second command runs only if the first succeeds.
  • Failure Catch (||): The second command runs only if the first fails.
# Execute setup and then process, failing if setup fails
setup_environment && process_data

# Try to connect, otherwise exit gracefully with a message
ssh user@server || { echo "Connection failed, check network settings." >&2; exit 2; }

Graceful Termination and Cleanup with trap

The trap command allows the script to catch signals (like Ctrl+C, system termination, or script exit) and execute a specified command or function before terminating. This is essential for cleanup tasks.

The cleanup Function

Define a dedicated function to reverse any changes (e.g., deleting temporary files, resetting configurations) and use trap to ensure it runs regardless of how the script ends.

# Global variable for cleanup function to check
TEMP_FILE=""

cleanup() {
    printf '%s\n' "--- Running Cleanup Procedures ---"
    if [[ -f "$TEMP_FILE" ]]; then
        rm -f "$TEMP_FILE"
        echo "Deleted temporary file: $TEMP_FILE"
    fi
    # Optionally, provide final exit status report
}

# 1. Trap EXIT: Runs cleanup regardless of success, failure, or signal.
trap cleanup EXIT

# 2. Trap signals (INT=Ctrl+C, TERM=Kill signal)
trap 'printf "%s\n" "Script interrupted by user or system signal." >&2; exit 130' INT
trap 'printf "%s\n" "Script terminated." >&2; exit 143' TERM

# --- Main Script Logic ---
TEMP_FILE=$(mktemp)
echo "Temporary content" > "$TEMP_FILE"
# If the script fails or is interrupted here, cleanup() is guaranteed to run

Why Use trap cleanup EXIT?

Setting a trap on EXIT guarantees that the cleanup function will run whether the script finishes normally (exit 0), explicitly exits with an error (exit 1), or is forced to terminate due to set -e.

Advanced Error Reporting

Standard error messages (command not found) often lack context. Advanced scripts should report what failed, where it failed, and why.

Logging Line Numbers

When a function like error_exit is called, you can determine the line number within the script where the error occurred using the BASH_LINENO array or the caller command (though caller is often restricted to functions).

For simple reporting outside of functions, use the LINENO variable:

# Example of immediate failure reporting
(some_risky_command) || {
    echo "[ERROR $LINENO] some_risky_command failed with status $?" >&2
    exit 3
}

Distinguishing Output

Always send informational messages to standard output (stdout) and error/warning messages to standard error (stderr). This is crucial if your script's output is being piped to another program or logged externally.

  • echo "Informational message" (goes to stdout)
  • echo "[WARNING] Configuration override" >&2 (goes to stderr)

Put the Pattern Together

For most production scripts, the practical pattern is simple: start with set -euo pipefail, validate inputs before doing work, wrap expected failures in if ! command; then ...; fi, and add trap cleanup EXIT before you create temporary state.

That gives you useful failures instead of mystery failures. The next time a job breaks at 2 a.m., the log should show what failed, where to look, and whether cleanup ran.