Troubleshooting Common Bash Script Configuration Issues
Master the art of troubleshooting configuration issues in Bash scripts. This guide details essential debugging techniques, focusing on environmental dependencies, common syntax pitfalls like improper quoting and word splitting, and critical execution failures. Learn how to use robust flags (`set -euo pipefail`), handle argument parsing errors, and resolve common problems like DOS line endings and incorrect PATH variables, ensuring your automation scripts run reliably in any environment.
Troubleshooting Common Bash Script Configuration Issues
Bash configuration problems usually show up as something vague: a script works from your terminal but fails in cron, a deploy script cannot find kubectl, or a config file path with a space breaks only for one customer. The bug is often not in the main logic. It is in the assumptions around environment, arguments, quoting, permissions, or the shell that actually ran the file.
When I troubleshoot a Bash script, I first try to answer four questions: Which shell is running it? What environment did it receive? What inputs did it parse? Which command failed first? That order keeps you from chasing symptoms.
Confirm the shell and execution context
A script that starts with Bash syntax but runs under sh can fail in strange ways. Arrays, [[ ... ]], source, process substitution, and set -o pipefail are Bash features. If the file uses them, the shebang should say Bash:
#!/usr/bin/env bash
Then run it the same way your automation runs it. These are not equivalent:
./deploy.sh
bash deploy.sh
sh deploy.sh
./deploy.sh uses the shebang. bash deploy.sh forces Bash. sh deploy.sh may use dash, BusyBox ash, or another shell depending on the system. If production calls sh deploy.sh, a perfect Bash shebang will not help.
Cron, systemd, CI runners, SSH forced commands, and Docker entrypoints all provide different environments. A script that works interactively may fail because your login shell set PATH, AWS_PROFILE, NVM_DIR, or a language version manager before you ran it.
Add a temporary diagnostic block near the top:
printf 'shell=%s\n' "$BASH_VERSION" >&2
printf 'user=%s pwd=%s\n' "$(id -un)" "$PWD" >&2
printf 'PATH=%s\n' "$PATH" >&2
Remove or gate this once you have the answer. Diagnostics are useful, but leaking environment values into logs can expose secrets.
Use strict mode carefully, not blindly
set -euo pipefail is a strong default for many automation scripts, but it has edge cases. set -u catches missing variables. pipefail makes pipeline failures visible. set -e stops after many command failures, though it behaves differently inside conditionals, pipelines, and compound commands than new Bash users expect.
A practical starting point is:
set -Eeuo pipefail
trap 'printf "Error on line %s: %s\n" "$LINENO" "$BASH_COMMAND" >&2' ERR
Use it when a failed command should stop the script. Do not use it casually in scripts that intentionally probe commands and continue. For expected failures, write the condition explicitly:
if ! grep -q '^enabled=true$' "$config_file"; then
printf 'Feature is disabled.\n'
fi
That is clearer than letting grep fail under set -e and wondering why the script exited.
Validate arguments before reading files
A common configuration bug is treating $1 as present when it is not. Under set -u, referencing a missing $1 exits immediately. Without set -u, it becomes an empty string.
Use a small usage block:
usage() {
printf 'Usage: %s <config-file> [environment]\n' "${0##*/}" >&2
}
if (( $# < 1 )); then
usage
exit 2
fi
config_file=$1
environment=${2:-dev}
if [[ ! -r $config_file ]]; then
printf 'Config file is not readable: %s\n' "$config_file" >&2
exit 1
fi
Notice the default for environment, but not for config_file. Defaults are helpful for optional values and dangerous for required values. A script should not silently fall back to ./config.yml for a production deployment unless that behavior is very deliberate.
Quote paths and values from configuration
Most Bash scripts eventually read a path from a config file or environment variable. If that value is unquoted, Bash performs word splitting and glob expansion.
backup_dir="/mnt/backups/May reports"
# Broken: becomes multiple arguments.
cp $backup_dir/latest.tar.gz /restore/
# Correct.
cp "$backup_dir/latest.tar.gz" /restore/
The same rule applies to command substitutions:
release_name=$(git describe --tags --always)
printf 'Deploying %s\n' "$release_name"
If you intentionally need multiple arguments, use an array instead of a string:
rsync_opts=(-a --delete --exclude '.git')
rsync "${rsync_opts[@]}" "$src/" "$dest/"
This avoids the brittle pattern of opts="-a --delete" followed by rsync $opts ....
Check PATH and external command dependencies
command not found is usually a context problem. Your terminal may find aws at /opt/homebrew/bin/aws, while cron only has /usr/bin:/bin.
At startup, check required tools:
require_cmd() {
command -v "$1" >/dev/null 2>&1 || {
printf 'Required command not found: %s\n' "$1" >&2
exit 127
}
}
require_cmd docker
require_cmd jq
require_cmd aws
For critical system utilities, absolute paths can be fine. For developer tools installed in different places, a dependency check with a clear error is usually easier to maintain.
If a script is launched by systemd, set the environment in the unit or an environment file instead of relying on a user's .bashrc. Non-interactive shells do not necessarily read the same startup files as your terminal.
Parse environment variables explicitly
Environment-driven configuration is convenient, but empty and unset are not always the same thing. Bash parameter expansion lets you be precise:
: "${APP_ENV:?APP_ENV must be set}"
log_level=${LOG_LEVEL:-INFO}
${APP_ENV:?message} fails if the variable is unset or empty. ${LOG_LEVEL:-INFO} uses a default if unset or empty. If an empty string is meaningful in your script, use the forms without the colon, such as ${VAR-default}.
Avoid dumping the whole environment into logs while troubleshooting. It is too easy to print tokens, database passwords, or cloud credentials.
Watch for CRLF line endings and invisible characters
A script edited on Windows may contain CRLF endings. The classic symptom is an error containing ^M, or a shebang failure that looks like the interpreter does not exist.
Check with:
file deploy.sh
sed -n 'l' deploy.sh | head
Fix with one of these:
dos2unix deploy.sh
# or, if dos2unix is unavailable:
sed -i 's/\r$//' deploy.sh
Also check copied configuration values for trailing spaces. A variable that looks like prod but is actually prod can miss a case branch and send you in circles.
Debug the first failing command
set -x shows commands after expansion. That is exactly what you need for quoting and config bugs:
PS4='+ ${BASH_SOURCE}:${LINENO}: '
set -x
# failing section here
set +x
Do not enable xtrace around secrets. If your script handles passwords, tokens, signed URLs, or private keys, trace only the narrow section you need.
For configuration files, print the resolved value and the test you are about to apply:
printf 'Using config_file=%q\n' "$config_file" >&2
[[ -r $config_file ]] || exit 1
%q is useful for debugging because it makes whitespace visible in a shell-friendly way.
Handle permissions as configuration too
Sometimes the script is correct, but the account running it cannot read the config, execute the helper, or write the output directory.
Check the actual user:
id
namei -l "$config_file"
namei -l is especially useful because every directory in the path needs execute permission. A readable file inside an inaccessible parent directory is still inaccessible.
For executable scripts, set permissions and line endings together during packaging or image build:
chmod 0755 /usr/local/bin/deploy
If a script only works with sudo, identify which file or command needs privilege. Do not run the entire script as root just to paper over one bad ownership setting.
A reliable troubleshooting pass
When a Bash configuration issue is unclear, run this pass in order:
- Confirm the script is running under Bash if it uses Bash features.
- Print the working directory, user, and
PATHfor the failing context. - Validate required arguments and config files before main logic.
- Quote every expansion unless you intentionally want splitting.
- Check required external commands with
command -v. - Use
set -xonly around the failing section, with secrets protected. - Check permissions and line endings before changing business logic.
That sequence catches most real-world failures without turning the script into a mystery novel. Bash is small, but its execution context is large; troubleshoot the context first.
Separate configuration loading from execution
A script is easier to troubleshoot when loading config is its own step. Do not read a file, export variables, create directories, and restart services all in one long block. First resolve the values. Then validate them. Then run the work.
load_config() {
local file=$1
[[ -r $file ]] || {
printf 'Cannot read config: %s\n' "$file" >&2
return 1
}
# Example for a deliberately simple KEY=VALUE file.
# Do not source files you do not fully trust.
while IFS='=' read -r key value; do
[[ -z $key || $key == \#* ]] && continue
case $key in
APP_PORT) APP_PORT=$value ;;
APP_ENV) APP_ENV=$value ;;
*) printf 'Ignoring unknown config key: %s\n' "$key" >&2 ;;
esac
done < "$file"
}
Sourcing a config file with . config.env is common, but it executes shell code. That is acceptable only when the file is trusted and owned like code. For user-editable config, parse only the keys you support.
Make failures actionable for the next operator
A good error message says what failed and what value caused it. Compare these:
printf 'Error\n' >&2
and:
printf 'Cannot write backup directory: %s\n' "$backup_dir" >&2
The second message gives the next person something to check. This matters in DevOps scripts because the person seeing the failure may not be the author. They may be on call, half awake, and looking at CI logs from a failed deployment.
Exit codes can also carry meaning. Use 2 for usage problems, 1 for general runtime failures, and tool-specific codes when you have a documented reason. Do not spend all day inventing a taxonomy, but avoid returning success after a failed validation just because the script printed a warning.
Test the failing context, not your favorite context
If systemd runs the script, test with systemd. If cron runs it, test with a stripped environment. A quick approximation is:
env -i HOME="$HOME" PATH=/usr/bin:/bin bash ./script.sh config.env
That removes the comfort blanket of your interactive shell. Missing exports and PATH assumptions show up quickly.
For Docker entrypoint scripts, run the image with the same environment and mounts as production as closely as you can:
docker run --rm --env-file app.env -v "$PWD/config:/config:ro" my-image:tag
If it fails only in CI, print the CI runner's working directory and the exact command line. Many CI Bash failures are just wrong relative paths after checkout, not deep shell issues.
A real-world review pass before you ship
Before calling a script or container setup finished, read it once as if you are the next person who has to debug it at 2 a.m. That changes what you notice. A prompt that made sense while writing the script may be ambiguous when it appears in a CI log. A Docker service name that felt obvious may not match the variable name in the application. A Bash default may be safe for development and dangerous for production.
I like to do a short dry run with deliberately awkward values. Use a path with spaces. Use an empty optional value. Try a filename that starts with a dash. Run the script from a different working directory. Start the container without one expected environment variable. These tests are not fancy, but they catch the assumptions that usually break first.
Also check the failure message. If the only output is failed, the article's advice has not made it into the implementation. A useful failure says what value was used, what check failed, and what the operator can change. That does not mean dumping every environment variable or printing secrets. It means being specific where specificity helps: the config path, the missing command name, the network name, the service hostname, or the port the process tried to bind.
The final habit is to keep examples close to the way the system is actually run. If production uses Compose, test with Compose. If a script is launched by systemd, test it with systemd or with a similarly minimal environment. If a command is supposed to be safe for copy and paste, include the quoting, -- separators, and validation in the example itself. Readers copy working patterns more often than they copy warnings.
That review pass is not bureaucracy. It is how small automation stays boring. Boring is what you want from shell prompts, config loaders, variable expansion, container diagnostics, and Docker networking. The less surprising the behavior is, the easier it is for the next operator to trust it.