A Practical Guide to Debugging Failed Shell and Command Modules
Debug Ansible shell and command failures with register, stdout, stderr, rc, failed_when, and changed_when examples.
A Practical Guide to Debugging Failed Shell and Command Modules
The Ansible command and shell modules are useful when no purpose-built module exists, but they can be awkward to debug. A failed task may only show a return code unless you capture the command output yourself.
This guide shows you how to debug failed shell and command modules by checking rc, stdout, and stderr, then using failed_when and changed_when to make Ansible report the real result.
Command vs. Shell: Understanding the Difference
Before diving into debugging, it is vital to understand the fundamental difference between the two modules, as their execution environment impacts failure modes.
ansible.builtin.command
This module executes the command directly, bypassing the standard shell environment. This makes it safer and more predictable, as it avoids shell features like variable interpolation, globbing, pipes (|), and redirection (>).
Best Practice: Use command whenever the task is simple and does not require shell features.
ansible.builtin.shell
This module executes the command via the remote host's standard shell (/bin/sh or equivalent). This is necessary for complex operations, environment variables, or when using standard shell syntax (e.g., cd /tmp && ls -l).
Warning: Since shell relies on the environment, it is more prone to unpredictable failures related to PATH configuration, hidden environment variables, or complex quoting.
The Anatomy of an Ansible Command Failure
By default, Ansible determines the success or failure of a command or shell module task based on the process's return code (RC).
| Return Code (RC) | Interpretation |
|---|---|
rc = 0 |
Success (Task continues) |
rc != 0 |
Failure (Task immediately stops, host marked failed) |
However, this simple check often doesn't capture the nuance of real-world scripts. A command might return an RC of 0 but still produce an unwanted result (a logical failure), or a command might return an expected non-zero RC (e.g., grep returns 1 if it finds no matches).
To handle these nuances, we must capture the output and conditionally control the failure state.
Step 1: Capturing Command Output with register
The first step in effective debugging is capturing all available output streams into an Ansible variable using the register keyword. This allows inspection of the return code, standard output, and standard error.
To prevent the playbook from halting immediately upon a non-zero return code during initial testing, it is often useful to temporarily use ignore_errors: yes.
- name: Execute a potentially unreliable command and capture results
ansible.builtin.shell: |
/usr/local/bin/check_config.sh 2>&1 || exit 1
register: cmd_output
ignore_errors: yes # Temporarily allow RC != 0 to proceed
Once registered, the cmd_output variable will contain several useful keys, most notably:
cmd_output.rc: The integer return code.cmd_output.stdout: The standard output stream.cmd_output.stderr: The standard error stream.cmd_output.failed: A boolean indicating if Ansible currently considers the task failed.
Step 2: Inspecting Captured Data with debug
Use the debug module immediately after the failed task to inspect the contents of the registered variable. This helps distinguish between a true technical failure (e.g., command not found) and a logical failure (e.g., script ran but reported an internal error).
- name: Display full captured output for debugging
ansible.builtin.debug:
var: cmd_output
# Use 'when' to only show this if the task failed, cleaning up output
when: cmd_output.failed is defined and cmd_output.failed
- name: Highlight stderr contents
ansible.builtin.debug:
msg: "Captured STDERR: {{ cmd_output.stderr }}"
when: cmd_output.stderr | length > 0
By inspecting the full output, you can pinpoint the specific error message or pattern that indicates a true failure.
Step 3: Overriding Default Failure Behavior with failed_when
The failed_when conditional is the most powerful tool for debugging and managing complex shell module results. It allows you to define custom logic, using Jinja2 expressions, to determine if a task should be marked as failed, regardless of the default return code.
Scenario A: Handling an Expected Non-Zero Return Code
Some utilities return a non-zero code for an expected result. For example, grep returns 1 when it finds no match and greater than 1 for actual errors.
- name: Check whether a setting exists, but do not fail when absent
ansible.builtin.command: grep -q '^feature_enabled=true' /etc/myapp.conf
register: grep_result
failed_when: grep_result.rc > 1
changed_when: false
Scenario B: Failing on Logical Errors (RC=0, but Bad Output)
If a script always returns RC=0 even when an internal error occurs, but prints a specific error string to stdout or stderr, use failed_when to catch that string.
- name: Validate database connectivity script
ansible.builtin.shell: /opt/scripts/db_connect_test.sh
register: db_result
# Check both stdout and stderr for common error phrases
failed_when: >
('Connection refused' in db_result.stderr) or
('Authentication failure' in db_result.stdout)
Scenario C: Combining RC and Output Checks
For robust checks, combine the return code and content checks using logical operators (and, or, parentheses).
- name: Check deployment logs
ansible.builtin.shell: tail -n 50 /var/log/deployment.log
register: log_check
# Fail if the RC is non-zero OR if the successful output contains the word 'FATAL'
failed_when: log_check.rc != 0 or 'FATAL' in log_check.stdout
Tip: When using
failed_when, you should generally removeignore_errors: yesunless you explicitly want the failure to be logged but the play to continue.
Best Practices for Reliable Command Execution
To minimize the need for complex debugging, follow these standards when writing tasks that use command or shell:
1. Always Use Absolute Paths
Do not rely on the remote user's $PATH. Always specify the full path to the executable (e.g., /usr/bin/python, not just python). This avoids failures caused by inconsistent environments or subtle differences in the execution path.
2. Leverage Conditionals over Shell Logic
Instead of using complex shell logic like || or && inside the shell module, utilize Ansible's native conditionals (when:, failed_when:, changed_when:) and the register keyword. This keeps the playbook logic transparent and easier to debug.
3. Explicitly Control Change Detection (changed_when)
By default, command and shell mark a task as changed if the return code is 0. If your script runs but makes no changes to the system (e.g., a simple status check), you should manually define when the task results in a change using changed_when.
- name: Check disk space (should not result in 'changed')
ansible.builtin.command: df -h /data
changed_when: false
4. Use State Modules Where Possible
If you find yourself using shell to check for file existence, start/stop services, or install packages, stop and look for a dedicated Ansible module (e.g., ansible.builtin.stat, ansible.builtin.service, ansible.builtin.package). Dedicated modules handle idempotency and error checking internally, reducing debugging effort significantly.
Final Takeaway
When a shell or command task fails, capture the result first, inspect rc, stdout, and stderr, then encode the real success condition in failed_when. Once the task is stable, add changed_when so status checks do not show false changes in every playbook run.