Comprehensive Guide to Systemd Cgroups for Resource Limiting and Isolation
Use systemd cgroups, slices, and unit properties to limit CPU, memory, and I/O without editing raw cgroup files.
Comprehensive Guide to Systemd Cgroups for Resource Limiting and Isolation
Systemd already puts services into Linux control groups. You do not have to create raw cgroup directories by hand to keep a batch worker from eating the whole machine. In many cases, you can add a few properties to a service or slice, reload systemd, and get CPU, memory, task, and I/O controls that survive reboot and show up in normal systemctl tooling.
The trick is choosing the right kind of limit. A hard memory cap can protect the host but kill the service if you set it too low. CPU weights are gentle until the system is busy. CPU quotas are strict but can add latency. I/O limits depend on the storage stack and cgroup version. Resource control is not a checkbox; it is an operational tradeoff.
Understanding Control Groups (cgroups)
Before diving into systemd's implementation, it's essential to grasp the fundamental concepts of cgroups. Cgroups are a hierarchical mechanism in the Linux kernel that allows you to group processes and then assign resource management policies to these groups. These policies can include:
- CPU: Limiting CPU time, prioritizing CPU access.
- Memory: Setting memory usage limits, preventing out-of-memory (OOM) conditions.
- I/O: Throttling disk read/write operations.
- Network: Network control is possible through Linux traffic control and related tooling, but systemd's built-in unit properties are mostly focused on CPU, memory, process count, device access, and block I/O.
- Device Access: Controlling access to specific devices.
The kernel exposes cgroup configurations through a virtual file system, typically mounted at /sys/fs/cgroup. Each controller (e.g., cpu, memory) has its own directory, and within these, hierarchies of directories represent groups and their associated resource limits.
Systemd's Cgroup Management Architecture
Systemd abstracts the complexity of direct cgroup manipulation by providing a structured unit management system. It organizes processes into a hierarchy of units, which are then mapped to cgroup hierarchies. The primary unit types relevant to resource management are:
- Slices: These are abstract containers for service units. Slices form a hierarchy, allowing for the delegation of resources. For example, a slice for user sessions might contain slices for individual applications. Systemd automatically creates slices for system services, user sessions, and virtual machines/containers.
- Scopes: These are typically used for temporary or dynamically created groups of processes, often associated with user sessions or system services that aren't managed as full service units. They are transient and exist as long as the processes within them are running.
- Services: These are the fundamental units for managing daemons and applications. When a service unit is started, systemd places its processes into a cgroup hierarchy, usually within a slice. Resource limits can be directly applied to service units.
Systemd's default hierarchy often looks like this:
-.slice (Root slice)
|- system.slice
| |- <service_name>.service
| |- another-service.service
| ...
|- user.slice
| |- user-1000.slice
| | |- session-c1.scope
| | | |- <application>.service (if started by user)
| | | ...
| | ...
| ...
|- machine.slice (for VMs/containers)
...
Applying Resource Limits with Systemd Unit Files
Systemd allows you to specify cgroup resource limits directly within the .service, .slice, or .scope unit files. These directives are placed under the [Service], [Slice], or [Scope] sections, respectively.
CPU Limits
The primary directives for CPU resource control are:
CPUQuota=: Limits the total CPU time the unit can use. This is specified as a percentage (e.g.,50%for half a CPU core) or a fraction of a CPU core (e.g.,0.5). It's also possible to specify a value in microseconds per period. The default period is 100ms.CPUWeight=: Sets a relative weighting for CPU time on cgroup v2 systems. A unit with a higher weight gets a larger share when there is contention, but it does not reserve CPU when the machine is idle.CPUShares=: Older cgroup v1-era weighting. PreferCPUWeight=on modern distributions unless you know you need v1 compatibility.CPUQuotaPeriodSec=: Sets the period forCPUQuota. Default is100ms.
Example: Limiting a web server to 75% of one CPU core:
Create or edit a service file, for instance, /etc/systemd/system/mywebapp.service:
[Unit]
Description=My Web Application
[Service]
ExecStart=/usr/bin/mywebapp
User=webappuser
Group=webappgroup
# Limit to 75% of one CPU core
CPUQuota=75%
[Install]
WantedBy=multi-user.target
After creating or modifying the service file, reload the systemd daemon and restart the service:
sudo systemctl daemon-reload
sudo systemctl restart mywebapp.service
Memory Limits
Memory limits are controlled by directives such as:
MemoryMax=: Sets a hard limit on the amount of memory the unit's processes can consume. This can be specified in bytes or with suffixes likeK,M,G,T(e.g.,512M).MemoryLimit=: Older spelling retained on some systems for compatibility. PreferMemoryMax=on modern systemd releases.MemoryHigh=: Sets a soft limit. When this limit is approached, memory reclamation (swapping) is triggered more aggressively, but the hard limit is not yet enforced.MemorySwapMax=: Limits the amount of swap space the unit can use.
Example: Limiting a database to 2GB of RAM:
Create or edit a service file, for example, /etc/systemd/system/mydb.service:
[Unit]
Description=My Database Service
[Service]
ExecStart=/usr/bin/mydb
User=dbuser
Group=dbgroup
# Limit memory to 2 Gigabytes
MemoryMax=2G
[Install]
WantedBy=multi-user.target
Reload and restart:
sudo systemctl daemon-reload
sudo systemctl restart mydb.service
I/O Limits
I/O throttling can be controlled using directives like:
IOWeight=: Sets a relative weight for I/O operations. Higher values give more I/O priority. Range is 1 to 1000 (default 500).IOReadBandwidthMax=: Limits read I/O bandwidth. Specified as[<device>] <bytes_per_second>. For example,IOReadBandwidthMax=/dev/sda 100Mlimits read operations on/dev/sdato 100MB/s.IOWriteBandwidthMax=: Limits write I/O bandwidth. Similar format toIOReadBandwidthMax.
Example: Limiting a background processing service to 50MB/s on a specific disk:
Create or edit a service file, e.g., /etc/systemd/system/batchproc.service:
[Unit]
Description=Batch Processing Service
[Service]
ExecStart=/usr/bin/batchproc
User=batchuser
Group=batchgroup
# Limit write operations to 50MB/s on /dev/sdb
IOWriteBandwidthMax=/dev/sdb 50M
# Give it a moderate read priority
IOWeight=200
[Install]
WantedBy=multi-user.target
Reload and restart:
sudo systemctl daemon-reload
sudo systemctl restart batchproc.service
Managing and Monitoring Cgroups
Systemd provides tools to inspect and manage the cgroups associated with your units.
Inspecting Cgroup Status
The systemctl status command provides information about a unit's cgroup membership and resource usage.
systemctl status mywebapp.service
Look for lines indicating the cgroup path. For example:
● mywebapp.service - My Web Application
Loaded: loaded (/etc/systemd/system/mywebapp.service; enabled; vendor preset: enabled)
Active: active (running) since Tue 2023-10-27 10:00:00 UTC; 1 day ago
Docs: man:mywebapp(8)
Main PID: 12345 (mywebapp)
Tasks: 5 (limit: 4915)
Memory: 15.5M
CPU: 2h 30m 15s
CGroup: /system.slice/mywebapp.service
└─12345 /usr/bin/mywebapp
You can also directly inspect the cgroup file system:
systemd-cgls # Displays the cgroup hierarchy managed by systemd
systemd-cgtop # Similar to top, but for cgroups
To see the specific limits applied to a service's cgroup:
# For memory limits on a typical cgroup v2 host
cat /sys/fs/cgroup/system.slice/mywebapp.service/memory.max
# For CPU limits
cat /sys/fs/cgroup/system.slice/mywebapp.service/cpu.max
The exact paths and file names vary by cgroup version and distribution. On cgroup v1 systems, controller-specific paths such as /sys/fs/cgroup/memory/... may still exist. On cgroup v2 systems, the unified hierarchy under /sys/fs/cgroup/... is the normal view.
Modifying Cgroup Limits on the Fly
While it's best practice to set limits in unit files, you can temporarily adjust them using systemctl set-property:
sudo systemctl set-property mywebapp.service CPUQuota=50%
Depending on systemd version and flags, set-property may write a drop-in under /etc/systemd/system.control/ for persistent properties. Use systemctl cat mywebapp.service and systemctl show mywebapp.service -p CPUQuota -p MemoryMax to confirm what happened. For infrastructure-as-code and peer review, an explicit unit drop-in is usually clearer.
Slices for Resource Delegation
Slices are powerful for managing groups of services or applications. You can define resource limits on a slice, and all services or scopes within that slice will inherit or be constrained by those limits.
Example: Creating a dedicated slice for resource-intensive batch jobs:
Create a slice file, e.g., /etc/systemd/system/batch.slice:
[Unit]
Description=Batch Processing Slice
[Slice]
# Limit total CPU for all jobs in this slice to 1 core
CPUQuota=100%
# Limit total memory to 4GB
MemoryMax=4G
Now, you can configure services to run within this slice using Slice= directive in their .service files:
[Unit]
Description=Specific Batch Job
[Service]
ExecStart=/usr/bin/mybatchjob
# Place this service into the batch.slice
Slice=batch.slice
[Install]
WantedBy=multi-user.target
Reload systemd, enable/start the slice if necessary (though it's often activated implicitly), and start the service.
sudo systemctl daemon-reload
sudo systemctl start mybatchjob.service
This approach allows you to group related processes and manage their collective resource consumption.
Best Practices and Considerations
- Start with Incremental Limits: When setting limits, begin with conservative values and gradually increase them as needed. Aggressive limits can destabilize applications.
- Monitor: Regularly monitor your system's resource usage and the impact of your cgroup settings. Tools like
systemd-cgtop,htop,top, andiotopare invaluable. - Understand Cgroup v1 vs. v2: Systemd supports both cgroup v1 and v2. While many directives are similar, v2 offers a unified hierarchy and some behavioral differences. Ensure you are aware of which version your system is using if you encounter complex issues.
- Prioritization vs. Hard Limits: Use
CPUWeightfor prioritization when resources are scarce, andCPUQuotafor strict caps. Similarly,MemoryHighis for pressure before the hard limit, andMemoryMaxis the hard limit. - Service vs. Slice: Use service units for individual applications and slices for managing groups of related applications or resource pools.
- Documentation: Clearly document the resource limits applied to critical services, especially in production environments.
- OOM Killer: Be aware that if a process exceeds its
MemoryMaxlimit, the kernel's Out-Of-Memory (OOM) killer might terminate it, even if it's within a cgroup. Systemd can manage how the OOM killer behaves for specific cgroups using directives likeOOMPolicy=.
A Safer Way to Roll Out Limits
Start with observation. Before adding limits, look at how the service behaves during normal load and during its worst expected load:
systemctl status mywebapp.service
systemd-cgtop
systemctl show mywebapp.service -p MemoryCurrent -p CPUUsageNSec -p TasksCurrent
For memory, a good first move is often MemoryHigh= rather than MemoryMax=:
[Service]
MemoryHigh=1G
MemoryMax=1536M
MemoryHigh= tells the kernel to apply pressure before the service reaches the hard ceiling. MemoryMax= is the wall. If the process crosses it and memory cannot be reclaimed, the kernel may kill a process in the cgroup. That can be exactly what you want for a runaway worker, but it is a bad surprise for a database unless you planned for it.
For CPU, decide whether you want fairness or a hard cap:
[Service]
CPUWeight=50
This lowers priority under contention but still lets the service use idle CPU. For background jobs, that is often better than a quota.
[Service]
CPUQuota=200%
This caps the service at roughly two CPU cores worth of time. That is useful for a noisy batch processor, but it can hurt latency-sensitive applications if worker threads get throttled during traffic spikes.
For process explosions, add a task limit:
[Service]
TasksMax=200
This protects the host from accidental fork storms. Set it high enough for normal thread counts. Java, database, and browser-like workloads can use more tasks than you expect.
Drop-Ins Instead of Editing Vendor Units
Avoid editing unit files shipped by packages under /usr/lib/systemd/system/ or /lib/systemd/system/. Use a drop-in:
sudo systemctl edit mywebapp.service
Then add:
[Service]
MemoryHigh=1G
MemoryMax=1536M
CPUWeight=80
After saving:
sudo systemctl daemon-reload
sudo systemctl restart mywebapp.service
systemctl cat mywebapp.service
systemctl cat shows the vendor unit and your override together. That makes future debugging much easier because the active configuration is visible in one command.
Slices for Teams, Tenants, and Workload Classes
Slices become useful when you stop thinking one service at a time. Suppose a host runs the API, a report generator, and several import workers. You may not care which import worker uses CPU, but you do care that all import work together cannot starve the API.
Create a slice:
# /etc/systemd/system/import.slice
[Unit]
Description=Import and backfill workloads
[Slice]
CPUWeight=30
MemoryHigh=4G
MemoryMax=5G
Put import services inside it:
[Service]
Slice=import.slice
ExecStart=/usr/local/bin/import-worker
Now the group has shared pressure. This is cleaner than putting separate hard caps on every worker and hoping the math still works after someone adds a new one.
There is one naming detail that catches people: slice names encode hierarchy. customer-a.slice is a top-level slice. customer-a-batch.slice is not a child of customer-a.slice; it is just another top-level name. Hierarchical slices use dashes as separators in a specific way, so read systemd.slice(5) before designing a large slice tree.
What Resource Limits Cannot Fix
Cgroups can stop one workload from overwhelming the host, but they cannot make an undersized machine fast. If a database needs more memory for its working set than you allow, it may spend more time reclaiming memory or fail under load. If an API needs short response times, a strict CPU quota can create throttling delays that look like random latency. If a storage device is already saturated, I/O weights may improve fairness but not create throughput.
Treat limits as guardrails. Pair them with application-level settings: database buffer sizes, worker counts, queue concurrency, JVM heap limits, Go GOMEMLIMIT, Node memory flags, or whatever your runtime provides. The best setup is usually both: the application knows its own memory and concurrency model, and systemd protects the rest of the machine if that model breaks.
The Mental Model to Keep
Use service-level limits for one daemon. Use slice-level limits for a group of related workloads. Use weights when you want priority under contention. Use quotas and hard memory caps when you need a firm boundary and are prepared for the consequences. Verify the effective properties with systemctl show, watch behavior with systemd-cgtop, and keep the configuration in drop-ins or unit files that your team can review.