Top Ten Essential Commands for Linux System Monitoring

Learn ten Linux monitoring commands for checking CPU, memory, disk, network sockets, load, and historical system activity.

Top Ten Essential Commands for Linux System Monitoring

When a Linux server feels slow, you need commands that tell you whether the pressure is CPU, memory, disk, network, or load. These Linux monitoring commands help you move from "the server is slow" to a specific next step.

The ten tools below give you quick snapshots, interactive views, and historical data. Use them together instead of trusting one number in isolation.

1. top - Real-time Process Activity

The top command provides a dynamic, real-time view of a running Linux system. It displays a summary of system information and a list of processes or threads currently managed by the Linux kernel. It's often the first tool administrators turn to for a quick overview of system activity.

Key Metrics:

  • CPU usage: us (user), sy (system), ni (nice), id (idle), wa (I/O wait), hi (hardware IRQ), si (software IRQ), st (steal time).
  • Memory usage: Total, free, used, buffers/cache.
  • Swap usage: Total, free, used.
  • Process list: PID, User, PR (priority), NI (nice value), VIRT (virtual memory), RES (resident memory), SHR (shared memory), S (status), %CPU, %MEM, TIME+, COMMAND.

Basic Usage:

top

Practical Examples:

  • Sort by CPU usage: While in top, press P.
  • Sort by memory usage: While in top, press M.
  • Show specific user processes: While in top, press u then type the username.
  • Kill a process: While in top, press k and enter the PID.

Tips:

  • Press 1 to toggle the display of individual CPU cores.
  • Press q to quit top.
  • Use top -bn1 to get a single snapshot (useful for scripting).

2. htop - Interactive Process Viewer

htop is an enhanced, interactive, and user-friendly process viewer that offers many advantages over the traditional top command. It presents a more visually appealing and navigable interface, making it easier to monitor and manage processes.

Key Advantages:

  • Visual meters: CPU, memory, and swap usage are displayed graphically.
  • Scrollable list: You can scroll vertically and horizontally to see all processes and their full command lines.
  • Easy process management: Kill, renice, and other actions can be performed directly using function keys without entering PIDs.
  • Tree view: Processes can be displayed in a tree format to show parent-child relationships.

Basic Usage:

# May require installation:
# sudo apt install htop (Debian/Ubuntu)
# sudo yum install htop (RHEL/CentOS)
htop

Practical Examples:

  • Filter processes: Press F4.
  • Kill a process: Select the process, then press F9.
  • Sort by various columns: Use F6.

Tips:

  • htop is generally preferred for interactive monitoring due to its superior user experience.
  • Customize htop's display options (F2) to suit your workflow.

3. vmstat - Virtual Memory Statistics

The vmstat command reports information about processes, memory, paging, block IO, traps, and CPU activity. It's an excellent tool for identifying memory bottlenecks or high disk I/O.

Key Metrics:

  • r: Number of processes waiting for run time.
  • b: Number of processes sleeping in uninterruptible sleep (typically I/O).
  • swpd: Amount of virtual memory used.
  • free: Amount of idle memory.
  • si / so: Amount of memory swapped in from disk / swapped out to disk.
  • bi / bo: Blocks received from a block device / blocks sent to a block device.
  • wa: Time spent waiting for I/O completion.

Basic Usage:

vmstat 1 5 # Report every 1 second, 5 times

Practical Examples:

  • Show active/inactive memory: vmstat -a
  • Display slabinfo: vmstat -m
  • Show disk statistics: vmstat -d

Tips:

  • High si/so values often indicate memory pressure and excessive swapping, which can severely degrade performance.
  • A consistently high wa percentage suggests an I/O bottleneck.

4. iostat - I/O Statistics

iostat is part of the sysstat package and reports CPU utilization and I/O statistics for devices, partitions, and network file systems. It's crucial for understanding disk performance issues.

Key Metrics:

  • %user, %system, %iowait, %idle: CPU utilization breakdowns.
  • r/s / w/s: Reads/writes per second.
  • rkB/s / wkB/s: Kilobytes read/written per second.
  • await: Average time (in milliseconds) for I/O requests issued to the device to be served.
  • %util: Percentage of elapsed time during which the device had I/O requests in progress.

Basic Usage:

# May require installation:
# sudo apt install sysstat (Debian/Ubuntu)
# sudo yum install sysstat (RHEL/CentOS)
iostat -xz 1 5 # Extended stats, every 1 second, 5 times

Practical Examples:

  • Specific device monitoring: iostat -xz /dev/sda 1
  • Display only CPU utilization: iostat -c
  • Display only device utilization: iostat -d

Tips:

  • A high %util combined with a high await time often points to an I/O bottleneck on that device. On modern SSDs and virtualized storage, confirm with application latency before assuming the disk is saturated.
  • Compare rkB/s and wkB/s with r/s and w/s to understand average I/O size.

5. free - Memory Usage

The free command displays the total amount of free and used physical memory and swap space in the system, as well as the buffers and caches used by the kernel.

Key Metrics:

  • total: Total installed memory.
  • used: Used memory (includes buffers/cache).
  • free: Unused memory.
  • shared: Memory used by tmpfs (shared memory segments).
  • buff/cache: Memory used by kernel buffers and page cache.
  • available: An estimate of how much memory is available for starting new applications, without swapping.

Basic Usage:

free -h # Human-readable output

Practical Examples:

  • Display memory in megabytes: free -m
  • Continuously update every 5 seconds: watch -n 5 free -h

Tips:

  • The available column is the most important metric for understanding how much memory is genuinely free for new processes.
  • Linux aggressively uses available memory for disk caching, so a low free value is normal and often desirable.

6. df - Disk Space Usage

The df command reports the amount of disk space used and available on file systems. It's essential for monitoring storage capacity and preventing disk-full scenarios.

Key Metrics:

  • Filesystem: The name of the file system.
  • Size: Total size of the file system.
  • Used: Amount of disk space used.
  • Avail: Amount of disk space available.
  • Use%: Percentage of disk space used.
  • Mounted on: The mount point of the file system.

Basic Usage:

df -h # Human-readable output

Practical Examples:

  • Show inode usage: df -i (inodes are metadata structures; running out of them can prevent file creation even with free space).
  • Show specific filesystem type: df -hT -t ext4

Tips:

  • Regularly check Use% to prevent file systems from filling up, which can cause application failures and system instability.
  • High inode usage can be an issue with many small files.

7. du - Disk Usage of Files and Directories

The du command estimates file space usage. While df checks total filesystem usage, du is used to find out the size of specific files or directories, which is critical for identifying what is consuming disk space.

Key Metrics:

  • Total size of specified files or directories.

Basic Usage:

du -sh /var/log # Summary, human-readable for /var/log directory

Practical Examples:

  • Show sizes of all subdirectories (one level deep): du -h --max-depth=1 /home/user
  • Find the largest files/directories: du -ah /path/to/check | sort -rh | head -n 10

Tips:

  • Combine du with sort and head to quickly pinpoint disk space hogs.
  • Be mindful when running du on large directories, as it can be resource-intensive.

8. sar - System Activity Reporter

sar is a powerful tool from the sysstat package that collects, reports, or saves system activity information. Unlike top or vmstat which show real-time snapshots, sar excels at providing historical data, making it invaluable for long-term performance analysis and capacity planning.

Key Features:

  • CPU statistics: %user, %nice, %system, %iowait, %steal, %idle.
  • Memory statistics: kbmemfree, kbmemused, kbbuffers, kbcached.
  • Disk I/O: tps, rd_sec/s, wr_sec/s.
  • Network statistics: rxpck/s, txpck/s, rxbyt/s, txbyt/s.
  • Load average, swap activity, kernel activity, and more.

Basic Usage:

# Report CPU utilization every 1 second, 5 times:
sar -u 1 5
# Report disk activity:
sar -d
# Report memory utilization:
sar -r
# Report network statistics:
sar -n DEV

Practical Examples:

  • View a saved CPU activity file: sar -u -f /var/log/sysstat/saDD on many Debian-based systems, or /var/log/sa/saDD on many RHEL-based systems. Replace DD with the day of month.
  • Display all collected data for today: sar -A

Tips:

  • Ensure the sysstat package is installed and configured to collect data regularly for historical analysis.
  • sar can be overwhelming; focus on specific flags (-u, -r, -d, -n) relevant to your investigation.

9. ss (Socket Statistics) - Network Connections

ss is a utility to investigate sockets. It's a faster and more efficient replacement for the older netstat command, providing more detailed information about TCP, UDP, and other socket types, including their state, local/remote addresses, and process IDs.

Key Metrics:

  • State: ESTAB, LISTEN, TIME-WAIT, CLOSE-WAIT, etc.
  • Recv-Q / Send-Q: The receive and send queue sizes.
  • Local Address:Port / Peer Address:Port: The local and remote endpoints.
  • Process Name: The process associated with the socket.

Basic Usage:

ss -tuln # TCP, UDP, listening, numeric ports

Practical Examples:

  • List all TCP connections: ss -t
  • List all UDP connections: ss -u
  • Show processes listening on specific ports: ss -tulnp | grep 80
  • Summarize socket statistics: ss -s

Tips:

  • A high number of TIME-WAIT sockets is not automatically bad; it can be normal on busy TCP services. Pair it with port exhaustion, failed connections, or queue growth before treating it as a problem.
  • Monitor Recv-Q and Send-Q for signs of network buffering issues or slow application processing.

10. uptime - System Uptime and Load Average

The uptime command shows how long the system has been running, the current time, how many users are logged in, and the system load averages for the past 1, 5, and 15 minutes.

Key Metrics:

  • Current time: Self-explanatory.
  • Uptime: How long the system has been running.
  • Users: Number of users currently logged in.
  • Load average: The average number of processes that are either in a runnable or uninterruptible state. This includes processes that are running on the CPU, waiting for CPU, or waiting for disk I/O.
    • 1-minute load average
    • 5-minute load average
    • 15-minute load average

Basic Usage:

uptime

Practical Examples:

  • Often used as a quick health check for a server's general busyness.

Tips:

  • Compare the load average to the number of CPU cores on your system. A load average consistently higher than the number of CPU cores often indicates a CPU or I/O bottleneck.
  • An increasing load average over time (e.g., 1-minute > 5-minute > 15-minute) suggests the system is getting busier.

A Simple Troubleshooting Flow

For a slow server, start with uptime to check load, then use top or htop to find busy processes. Check free -h and vmstat 1 5 for memory pressure, iostat -xz 1 5 for disk latency, and ss -tulnp for listening services or backed-up sockets. If the issue happened earlier, use sar to compare the bad window with a normal one.

The takeaway is simple: each command answers one part of the story. Your job is to line up CPU, memory, disk, and network evidence before you restart services or resize the machine.