Best Practices for Tuning Linux Memory Swappiness and Cache Behavior
Tune Linux swappiness and VFS cache behavior carefully, with workload examples, sysctl commands, and validation steps.
Best Practices for Tuning Linux Memory Swappiness and Cache Behavior
Linux will use free RAM. That surprises people the first time they see a server with very little "free" memory in free -h. Most of that memory may be page cache, dentries, and inode cache, not a leak. The hard part is knowing when the kernel is making good use of RAM and when memory reclaim is starting to hurt your applications.
Two sysctl settings often come up during Linux memory tuning: vm.swappiness and vm.vfs_cache_pressure. They are useful, but they are not magic performance switches. A bad value can hide the real problem or push pain from one workload to another.
Understanding Linux Memory Management Parameters
Linux uses heuristics to decide which memory pages to reclaim when the system needs more free RAM. The two main areas controlled by kernel parameters are swapping (moving inactive memory pages to disk) and caching (keeping file system metadata and data in RAM).
1. vm.swappiness
vm.swappiness dictates the kernel's tendency to move processes out of physical memory and onto the swap space on disk. It is a value between 0 and 100.
- Higher value (for example, 60, a common default): The kernel is more willing to reclaim anonymous memory and use swap as part of normal memory management. This can preserve page cache, but it can also hurt latency-sensitive services if active pages are pushed out.
- Low Value (e.g., 10 or less): The kernel prefers to reclaim memory from the page cache before it starts swapping processes out. This keeps running applications in RAM longer, improving responsiveness but potentially reducing disk I/O performance if the system constantly needs to drop cache pages.
- Value of 0: The kernel avoids swapping as much as it reasonably can, but this does not disable swap. If you need to disable swap, use
swapoffand understand the out-of-memory risk first.
Practical Application of vm.swappiness
The optimal setting depends heavily on the workload:
| Workload Type | Recommended swappiness Range |
Rationale |
|---|---|---|
| Database Servers, High-Performance Computing (HPC) | 1 - 10 | Often a good starting point when swap latency is worse than dropping cache. Validate with real workload metrics. |
| General Purpose Servers, Desktops | 30 - 60 | Usually reasonable unless you have evidence that swap behavior is hurting you. |
| File-serving or cache-heavy systems | 60 or higher in some cases | Can preserve page cache, but only makes sense if occasional swapping is acceptable for the workload. |
How to Check the Current Value:
cat /proc/sys/vm/swappiness
How to Change the Value Temporarily (until reboot):
To set swappiness to 10:
sudo sysctl vm.swappiness=10
How to Change the Value Permanently:
Edit the /etc/sysctl.conf file and add or modify the line:
# /etc/sysctl.conf
vm.swappiness = 10
After saving, apply changes without rebooting using:
sudo sysctl -p
For memory-intensive databases, 1 to 10 is a common starting point. Do not treat it as a rule. A database that already has its own buffer cache, such as PostgreSQL or MySQL/InnoDB, usually benefits from avoiding swap. A file server may prefer a larger page cache. A small VM with too little RAM will suffer no matter what number you choose.
2. vfs_cache_pressure
vfs_cache_pressure controls how aggressively the kernel reclaims memory used for directory and inode metadata (the VFS cache).
- This value ranges from 0 to 1000.
- The default value is typically 100.
At a value of 100, the kernel balances reclaiming VFS cache memory against reclaiming memory used by page cache (disk data). A value of 100 means that when memory pressure exists, the kernel tries to reclaim 1 part of inode/dentry cache memory for every 1 part of page cache memory.
Adjusting vfs_cache_pressure
- Increasing the Value (e.g., > 100): Makes the kernel more aggressive about reclaiming VFS cache memory. This frees up RAM faster but can lead to slower subsequent file system lookups, as the metadata needs to be read from disk again.
- Decreasing the Value (e.g., < 100): Makes the kernel more conservative about reclaiming VFS cache. This keeps directory and inode information in memory longer, speeding up repeated file system operations.
When to Decrease vfs_cache_pressure:
If your system frequently accesses the same large directory structures (common in complex applications, container orchestration, or specific networking setups), setting this value lower (e.g., 50) can improve performance by keeping metadata readily available in RAM.
When to Increase vfs_cache_pressure:
If your system is suffering from general memory pressure and you want the kernel to reclaim any unused memory quickly, you might raise this value, though this is less common than lowering it.
How to Check the Current Value:
cat /proc/sys/vm/vfs_cache_pressure
How to Change the Value Permanently:
Edit /etc/sysctl.conf:
# /etc/sysctl.conf
vfs_cache_pressure = 50
Apply changes with sudo sysctl -p.
Warning: Very low
vfs_cache_pressurevalues can make the kernel hold directory and inode cache longer than you expect. That may help metadata-heavy workloads, but it can also make memory pressure worse for applications. Avoid extreme values unless you have measured the effect.
Comprehensive Tuning Scenarios
Choosing the right combination of these parameters optimizes the trade-off between application stability and file system caching.
Scenario 1: Database Server (Memory Priority)
Goal: Maximize application memory residency; minimize swapping at all costs.
vm.swappiness = 5vfs_cache_pressure = 50(Keep directory data cached somewhat, but prioritize application memory over VFS metadata if RAM gets tight).
Before changing anything, check whether the database is actually swapping:
free -h
vmstat 1
grep -E 'pswpin|pswpout' /proc/vmstat
If swap-in and swap-out counters are climbing during query latency spikes, lowering swappiness may help. If swap is unused and the database is slow, swappiness is not your problem. Look at query plans, buffer hit ratio, checkpoints, disk latency, and connection pressure instead.
Scenario 2: High Disk I/O Server (Caching Priority)
Goal: Maximize disk performance by keeping frequently accessed file data in the page cache.
vm.swappiness = 80(Allows swapping to occur sooner to free up RAM for disk cache expansion).vfs_cache_pressure = 100(Standard balance between inode and page cache).
This is the scenario where people most often overtune. If the server mostly reads the same files repeatedly, page cache matters. But if the system starts swapping active worker processes to preserve cache, users may see worse latency even though filesystem cache looks healthy. Watch application response time, not only cache size.
Scenario 3: Virtualization Host or General Purpose System
Goal: Stable performance across multiple workloads.
vm.swappiness = 30(A moderate setting that favors keeping active VMs/processes in RAM slightly longer than the default 60, but still allows controlled swapping).vfs_cache_pressure = 100(Default is often sufficient).
Virtualization hosts need extra caution because guest memory behavior can mislead host-level tuning. Ballooning, overcommit, and guest swap can all interact. A host that swaps guest memory heavily can create painful latency inside VMs even when each guest thinks its own workload is normal.
A Safer Tuning Workflow
Do not start by editing /etc/sysctl.conf. Start by proving the setting is relevant.
Capture a baseline during normal load:
free -h vmstat 1 10 cat /proc/sys/vm/swappiness cat /proc/sys/vm/vfs_cache_pressureCapture the same data during the slow period. Add process-level memory:
ps -eo pid,comm,rss,vsz,%mem --sort=-rss | head -20Change one value temporarily:
sudo sysctl vm.swappiness=10Run the workload long enough to observe behavior. Look for lower swap activity, better application latency, and no new filesystem slowdown.
Make the value persistent only after it survives a realistic test window.
On systems that use /etc/sysctl.d/, a small dedicated file is often cleaner than appending to /etc/sysctl.conf:
sudo tee /etc/sysctl.d/90-memory-tuning.conf >/dev/null <<'EOF'
vm.swappiness = 10
vm.vfs_cache_pressure = 100
EOF
sudo sysctl --system
If your configuration management system owns sysctl settings, put the change there instead. Manual sysctl edits on one server are easy to forget and hard to reproduce.
Reading free -h Without Panicking
A typical free -h output might show a small number under free and a large number under buff/cache. That is normal. Linux keeps recently used file data in memory because unused RAM does not help anyone.
Focus on available, swap use, and whether swap activity is happening now. A server can have swap allocated from a past memory spike but no current swap churn. That is less urgent than a server constantly swapping in and out.
Use:
vmstat 1
If si and so stay near zero under normal load, swap is not actively driving latency at that moment. If they remain nonzero while applications stall, memory pressure is a serious suspect.
When Not to Tune These Settings
There are several cases where changing swappiness or cache pressure is the wrong first fix.
If the server has no swap configured, vm.swappiness has little practical effect. You may still tune it for policy consistency, but it will not solve memory pressure by itself.
If swap exists only as a tiny emergency partition, the setting also has limited room to help. The kernel can choose when to use swap, but it cannot turn a few hundred megabytes of emergency space into a real memory tier. In that setup, focus on OOM risk and service limits.
If a process has a real memory leak, lowering swappiness delays pain. The leak will keep growing. Restarting the service may restore capacity temporarily, but the durable fix is application-level: patch the leak, cap memory, reduce concurrency, or change the workload.
If the disk is slow because of a failing drive, storage throttling, or a saturated cloud volume, memory tuning can reduce some reads but will not fix the storage fault. Check iostat, kernel logs, cloud volume metrics, and SMART/NVMe health.
If the working set is larger than RAM, there is no perfect sysctl value. You need more memory, less concurrency, smaller caches, a different data layout, or a workload split.
Container and Kubernetes Notes
Memory tuning gets trickier in containers. A container can hit its cgroup memory limit even while the host has free RAM. The host's swappiness setting still matters, but the immediate symptom may be an OOM kill inside a pod or container.
Check cgroup and orchestrator signals:
dmesg -T | grep -i 'killed process'
docker stats
kubectl describe pod <pod-name>
For Kubernetes, changing node-level sysctls should be part of node pool configuration, not a one-off shell session. Also remember that some sysctls are namespaced and some are node-level. vm.swappiness and vm.vfs_cache_pressure are host-level settings on typical Linux systems, so changing them affects every workload on that node.
Monitoring and Validation
After applying changes, continuous monitoring is crucial to validate the impact. Use tools like free, vmstat, and system performance monitoring dashboards.
Using vmstat:
Monitor the si (swap in) and so (swap out) columns. A healthy system with low swappiness should show low or zero values for si and so under normal load.
vmstat 5 10
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----\ r b swpd free buff cache si so bi bo in cs us sy id wa st
0 0 0 123456 102400 5123456 0 0 0 5 40 70 1 1 98 0 0
If so values remain high after reducing swappiness, the workload likely needs more usable memory or lower memory demand. More RAM is one answer, but not the only one. You may also reduce worker counts, shrink application caches, tune database memory, fix leaks, or split services across hosts.
Treat vm.swappiness and vm.vfs_cache_pressure as workload preferences, not universal upgrades. The practical path is boring but reliable: measure current swap and reclaim behavior, change one setting, test under real load, and keep the change only if application behavior improves.