A Comprehensive Guide to Ansible Fact Caching Configuration
Ansible's ability to gather facts about managed nodes is crucial for dynamic inventory, conditional execution, and detailed reporting. However, running gather_facts: true for every playbook execution can significantly increase overall playbook runtime, especially in environments with hundreds or thousands of hosts. This performance bottleneck is addressed effectively through Ansible Fact Caching.
Fact caching allows Ansible to store the gathered facts from a previous run and reuse them instantly for subsequent runs, bypassing the time-consuming SSH connections and data collection process. This guide details how to configure and leverage fact caching using two primary methods: JSON files and Redis, enabling substantial performance improvements in your automation workflows.
Understanding Ansible Facts and Performance Impact
Ansible gathers facts using the setup module (or implicitly via gather_facts: true). These facts include operating system details, network interfaces, installed packages, and more. While invaluable, gathering these facts over SSH can be slow, particularly over high-latency connections or when managing a large fleet of machines.
Key Performance Benefit: By enabling caching, subsequent playbook runs read facts from a local cache (JSON file) or a fast in-memory store (Redis) instead of executing the setup module on remote hosts.
Configuration Methods for Fact Caching
Ansible supports several caching mechanisms configured via the ansible.cfg file. The two most common and reliable methods are JSON file caching and Redis caching.
1. JSON File Caching (Local Storage)
JSON caching is the simplest method, storing fact data as serialized files on the control machine. It requires no external services.
Configuring JSON Caching in ansible.cfg
To enable JSON caching, you must define the cache plugin and specify the location where the files will be stored.
[defaults]
# Specify the caching plugin to use
fact_caching = json
# Specify the directory where fact files will be stored
fact_caching_connection = /path/to/ansible_facts_cache
# Set the cache expiration time (in seconds). 0 means never expire.
fact_caching_timeout = 600
Explanation of Parameters:
fact_caching = json: Activates the built-in JSON caching plugin.fact_caching_connection: This directory must exist and be writable by the user executing Ansible.fact_caching_timeout: In this example, facts are considered stale and will be re-gathered after 600 seconds (10 minutes).
Best Practice: Ensure the cache directory is located on fast local storage (like an NVMe drive) for optimal read/write performance.
2. Redis Caching (Shared, High-Performance Storage)
Redis is an in-memory data structure store often used as a high-performance cache or message broker. Using Redis for fact caching is ideal for team environments where multiple users or CI/CD pipelines need to access the same cache rapidly and consistently.
Prerequisites for Redis Caching
- A running Redis server accessible from the Ansible control machine.
- The Python
redislibrary must be installed on the control machine:pip install redis.
Configuring Redis Caching in ansible.cfg
When using Redis, fact_caching_connection is used to define the Redis connection parameters (host and port).
[defaults]
# Specify the caching plugin to use
fact_caching = redis
# Connection string format: <host>[:<port>][/<db_number>]
# If running on the same machine on default port:
fact_caching_connection = 127.0.0.1:6379/0
# Set the cache expiration time (in seconds). Highly recommended for Redis.
fact_caching_timeout = 3600
Note on Redis Database: The final number (e.g., /0) specifies the Redis database index to use. Ensure this index is dedicated for Ansible facts to prevent conflicts if Redis is used for other purposes.
Integrating Caching into Playbooks
Configuring ansible.cfg sets the default behavior. To utilize caching effectively, you must ensure two things in your playbooks:
- The cache is populated by running a play that gathers facts.
- Subsequent plays rely on the cache rather than re-gathering.
Enforcing Fact Gathering for Initial Population
When you run a playbook for the first time, or after the timeout, Ansible will execute the fact gathering process.
- name: Play 1 - Gather Facts and Execute Tasks
hosts: webservers
gather_facts: true # This populates the cache initially
tasks:
- name: Use gathered facts
debug:
msg: "OS Family is {{ ansible_os_family }}"
Utilizing the Cache on Subsequent Runs
If fact_caching is configured, subsequent runs will automatically use the cached data if gather_facts is set to true and the facts are within the timeout period.
However, if you want to guarantee that Ansible skips fact gathering entirely and relies only on the cache (or fails if the cache is missing), you can use gather_facts: false after the initial population, provided the facts are still valid.
If you explicitly set gather_facts: false and caching is enabled, Ansible will check the cache first. If valid data exists, it uses it. If not, it proceeds without facts, which might break tasks relying on facts.
Crucial Behavior: If gather_facts: true is used, Ansible will only perform remote fact gathering if the cached facts are expired or missing.
Managing the Fact Cache
It is sometimes necessary to manually clear the cache, forcing Ansible to gather fresh data from all hosts.
Clearing JSON Cache
If using JSON caching, simply delete the contents of the directory specified in fact_caching_connection.
# Example using the path defined earlier
rm -rf /path/to/ansible_facts_cache/*
Clearing Redis Cache
If using Redis, you can selectively clear keys related to Ansible or clear the entire database used by Ansible.
To clear all keys associated with the default Ansible prefix (typically related to the inventory source):
# Connect to redis-cli and flush the entire database (DB 0 in this example)
redis-cli -n 0 FLUSHDB
Warning:
FLUSHDBorFLUSHALLin Redis should be used with extreme caution, as it deletes all data in the specified database or the entire Redis instance, respectively.
Summary of Best Practices
- Choose Wisely: Use JSON caching for simple, single-user setups or when external dependencies are restricted. Use Redis for collaborative environments or large-scale CI/CD integration.
- Set Realistic Timeouts: Configure
fact_caching_timeoutto balance performance gains against data freshness. A timeout of 1 to 24 hours is common for environments where configurations change infrequently. - Verify Configuration: Always run
ansible --versionor check the output of your first cached run to confirm the cache plugin is active and functioning. - Inventory Dependence: Fact caching works best with static or dynamically generated inventories. If using dynamic inventory scripts that change frequently, the benefit of caching may be negated by staleness or errors.
By correctly implementing fact caching, you move Ansible from a fully iterative configuration tool to a highly optimized system capable of managing infrastructure at massive scale with minimal latency per run.