Linux Perf Analysis - Quickly Check Your Systems Health (1)

Introduction

I’ve been using Linux for a while. In the early days, I used to have this very cheap and slow computer. I always used to wonder why the hell it was so slow all of a sudden since I was new to Linux and had been a Windows user previously; I had no idea how to even open the terminal and check what’s wrong. Eventually, of course, I learned through tutorials and all… Umm, well that’s the intro, man; I have nothing to say anymore. Let’s just go straight into this schize.

Note: This is a multi-part series. In this part, we’ll cover some basic tools.

Goal

The goal is simple: quickly collect system data and form a rough diagnosis.

We want to identify whether the issue is:

  • CPU issue?
  • Memory issue?
  • Disk issue?
  • Network issue?

uptime

$ uptime
19:41:35 up 10:50,  1 user,  load average: 0.87, 1.06, 1.11
  • uptime is a quick way to check the load average’s over time. As you can see(got the reference??) above there are three numbers, linux updates them continuously, using an exponential moving average.

  • The kernel recalculates it roughly every 5 seconds, each value - 1, 5, 15 is just a different smoothing window

  • So:

    • 1-min load → reacts quickly
    • 5-min load → smoother
    • 15-min load → very slow, has a stable trend
  • See below image, the 1-min load dramatically increased beacuse I started playing some random 4k video, whilst 5-min load increased a bit and 15-min load by just one.

Image

Tip: Use watch to observe load changes live:

$ watch -n 2 uptime

This is especially useful when testing something like this. Learn more about watch using man watch.

Important: Load average is NOT CPU usage. It includes:

  • Running processes
  • Runnable (waiting for CPU)
  • Uninterruptible sleep (usually I/O wait)

What can we infer from this and what to look at?

  • Compare load to CPU count:
    • Load ≈ CPU cores — system is busy but fine
    • Load » CPU cores — contention/bottleneck
  • If load spikes suddenly:
    • Check top or pidstat
  • If load is high but CPU usage is low:
    • Likely I/O bottleneck — check iostat
  • Trend matters:
    • 1-min » 15-min — recent spike
    • All high — sustained pressure

vmstat

bash
$ vmstat 1
procs -----------memory---------- ---swap-- -----io---- -system-- -------cpu-------
r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st gu
1  0 582540 6354152 909084 5165968  0    0   259   408 2377    6  4  1 94  1  0  0
0  0 582540 6337568 909084 5168056  0    0     0     0 2188 7145  1  1 97  0  0  0
0  0 582540 6344640 909084 5168188  0    0     0   944 1445 5891  1  1 98  0  0  0
0  0 582540 6344864 909084 5166072  0    0     0     0 1678 5439  1  1 98  0  0  0
0  0 582540 6356228 909084 5166584  0    0     0     0 2747 10801 2  2 96  0  0  0
1  0 582540 6358864 909092 5166584  0    0     0    92 1777 6262  1  1 98  0  0  0
0  0 582540 6359072 909092 5166072  0    0     0     0 1420 5457  1  1 98  0  0  0
0  0 582540 6356900 909092 5168120  0    0     0  1084 1982 5958  1  1 98  0  0  0
0  0 582540 6357072 909092 5168120  0    0     0     0 1448 5994  1  1 98  0  0  0
0  0 582540 6358244 909092 5166136  0    0     0     0 1676 5427  1  1 98  0  0  0
  • vmstat = Virtual Memory Statistics (not just memory btw).
  • It gives a compact, real-time view of the entire system like: CPU, memory, processes, I/O, and context switching.

Proc

  • r — no. of processes running on CPU and waiting for a turn (doesn't include I/O).
  • b — blocked (waiting on I/O).

Memory and Swap

  • free — free memory in kilobytes.
  • si — swap-in. so — swap-out. If either is non-zero, you're out of memory (mostly relevant when swap devices are configured).
  • swap — swap used (I have ~295MB used).
  • buff — kernel buffers.
  • cache — filesystem cache (important one).

I/O

  • bi — blocks read from disk.
  • bo — blocks written to disk.

CPU

  • us — user CPU %
  • sy — kernel CPU %
  • id — idle %
  • wa — waiting on I/O
  • st — stolen (VM)
  • gu — guest (VMs)

What can we infer from this and what to look at?

  • r > CPU cores → CPU contention
  • b > 0 → I/O blocking → check disk
  • si/so > 0 → memory pressure (bad sign)
  • High wa → disk bottleneck
  • High us → user-space CPU heavy workload
  • High sy → kernel/system overhead

dmesg

bash
$ sudo dmesg | tail
[ 1588.754691] iwlwifi 0000:00:14.3: Unhandled alg: 0x703
[ 1588.754694] iwlwifi 0000:00:14.3: Unhandled alg: 0x703
[ 1588.754697] iwlwifi 0000:00:14.3: Unhandled alg: 0x703
[ 1588.754700] iwlwifi 0000:00:14.3: Unhandled alg: 0x703
[ 1588.754703] iwlwifi 0000:00:14.3: Unhandled alg: 0x703
[ 1602.189160] input: realme Buds Wireless 3 (AVRCP) as /devices/virtual/input/input32
[13068.129622] input: realme Buds Wireless 3 (AVRCP) as /devices/virtual/input/input33
[17264.803519] nvme nvme0: using unchecked data buffer
[17264.807705] block nvme0n1: No UUID available providing old NGUID
  • dmesg shows the kernel ring buffer — informational messages, warnings, errors, and sometimes debug logs.
  • You could see things like: hardware information, driver messages, filesystem events, kernel warnings, and security messages (if on SELinux).

What can we infer from this and what to look at?

Look for hardware errors (disk, GPU, USB), driver failures, and filesystem issues. Filter by severity using:

$ sudo dmesg --level=err,warn
  • Disk errors → check iostat
  • OOM killer → check memory (free, vmstat)
  • Repeated warnings → likely root cause

iostat

bash
$ iostat -xz 1
Linux 6.18.16-1-lts (vv) 	05/08/2026 	_x86_64_	(12 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.08    0.00    1.44    0.64    0.00   93.83

Device            r/s     rkB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wkB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dkB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
loop0            0.00      0.04     0.00   0.00    0.06    15.17    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme0n1         13.03    246.12     5.06  27.96    0.22    18.90    8.33    394.98     9.28  52.72    3.44    47.44    0.00      0.00     0.00   0.00    0.00     0.00    1.13    1.93    0.03   1.19
zram0            0.00      0.07     0.00   0.00    0.00    16.46    0.00      0.00     0.00   0.00    0.00     4.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
  • iostat is used to show I/O metrics.

  • Options:

    • x = extended stats
    • z = hide idle devices
    • 1 = refresh every second
  • Key Fields

    • r/s, w/s, rkB/s, and wkB/s are the delivered reads, writes, read Kbytes, and write Kbytes.
    • user CPU time running user-space programs
    • nice user processes with modified priority (nice)
    • system kernel work (syscalls, interrupts, etc.)
    • iowait CPU waiting for disk I/O
    • steal time stolen by hypervisor (VMs)
    • aqu-sz average queue size

    What can we infer from this and what to look at?

    • %util ~100% → disk saturated
    • High await (>10–20ms SSD, >50ms HDD) → latency issue
    • High aqu-sz → queue buildup
    • Low util but high await → possible driver/fs issue
    • High writes → check logs, journaling, apps

    free

    bash
    total        used        free      shared  buff/cache   available
    Mem:            15Gi       4.9Gi       6.2Gi       1.1Gi       5.7Gi        10Gi
    Swap:          7.7Gi       522Mi       7.1Gi
    
    • Focus on available, not free
    • Low available memory → pressure
    • High swap usage + growing → is a bad sign
    • High cache is GOOD (Linux uses memory efficiently)
    • If swapping:
      • Check vmstat
      • Identify heavy processes (top, pidstat)

    top

    bash
    top - 14:03:35 up  5:44,  1 user,  load average: 0.58, 0.98, 1.14
    Tasks: 329 total, 1 running, 325 sleep, 0 d-sleep, 0 stopped, 3 zombie
    %Cpu(s):  2.2 us,  1.7 sy,  0.0 ni, 95.5 id,  0.2 wa,  0.3 hi,  0.1 si,  0.0 st
    MiB Mem :  15684.2 total,   6405.1 free,   4956.9 used,   5850.3 buff/cache
    MiB Swap:   7842.0 total,   7319.1 free,    522.9 used.  10727.3 avail Mem
    
    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND
    162550 vamsi     20   0 3313788 646444 214152 S   7.3   4.0   2:36.70 Isolated Web Co
    1107 vamsi     20   0  496004  32352  18480 S   2.3   0.2   2:15.87 wireplumber
    1960 vamsi     20   0 1555272 248560 218228 S   2.3   1.5  14:16.67 Hyprland
    28869 vamsi     20   0 1131588 299768 257972 S   2.3   1.9   0:04.43 alacritty
    3366 vamsi     20   0  838444  68296  60576 S   1.7   0.4   1:18.29 Utility Process
    2810 vamsi     20   0   12.3g 849872 542272 S   1.3   5.3  52:18.70 zen
    880 root      20   0  344216  26152  20996 S   1.0   0.2   0:45.65 NetworkManager
    911 root      20   0 3164688 122652  78932 S   1.0   0.8   4:47.72 opensnitchd
    2756 vamsi     20   0  583756  14084  10584 S   1.0   0.1   1:55.32 btop
    2003 vamsi     20   0 2033888 549632 306264 S   0.7   3.4   2:00.91 qs
    157278 vamsi     20   0  377676  81484  15300 S   0.7   0.5   1:48.58 nvim
    775 dbus      20   0    6080   4436   2408 S   0.3   0.0   0:35.84 dbus-broker
    777 avahi     20   0    6704   4304   4000 S   0.3   0.0   0:03.01 avahi-daemon
    881 polkitd   20   0  382936  11712   7628 S   0.3   0.1   0:15.28 polkitd
    927 root      20   0 2380228  57820  38896 S   0.3   0.4   0:17.39 containerd
    2470 vamsi     20   0 1142996 221492 195400 S   0.3   1.4   0:58.89 alacritty
    3832 vamsi      9 -11  201668  23648   9896 S   0.3   0.1   0:32.19 pipewire-pulse
    185532 vamsi     20   0   11144   8180   5892 R   0.3   0.1   0:00.01 top
    1 root      20   0   23460  14068   9732 S   0.0   0.1   0:05.95 systemd
    
    • Identify top CPU consumers
    • Look for:
      • Runaway processes
      • Zombies
      • High memory users
    • CPU breakdown:
      • High us → apps
      • High sy → kernel
      • High wa → disk wait

    mpstat

    bash
    $ mpstat -P ALL 1
    Linux 6.18.16-1-lts (vv) 	05/08/2026 	_x86_64_	(12 CPU)
    
    02:06:06 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
    02:06:07 PM  all    1.84    0.00    0.67    0.17    0.33    0.17    0.00    0.00    0.00   96.82
    02:06:07 PM    0    1.03    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   98.97
    02:06:07 PM    1    0.98    0.00    0.98    0.00    0.00    0.98    0.00    0.00    0.00   97.06
    02:06:07 PM    2    1.03    0.00    1.03    0.00    1.03    0.00    0.00    0.00    0.00   96.91
    02:06:07 PM    3    1.01    0.00    0.00    0.00    1.01    0.00    0.00    0.00    0.00   97.98
    02:06:07 PM    4    2.00    0.00    1.00    1.00    0.00    0.00    0.00    0.00    0.00   96.00
    02:06:07 PM    5    0.00    0.00    0.99    0.00    0.00    0.00    0.00    0.00    0.00   99.01
    02:06:07 PM    6    5.05    0.00    2.02    0.00    0.00    1.01    0.00    0.00    0.00   91.92
    02:06:07 PM    7    1.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00    0.00   98.00
    02:06:07 PM    8    3.00    0.00    1.00    0.00    1.00    0.00    0.00    0.00    0.00   95.00
    02:06:07 PM    9    2.02    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   97.98
    02:06:07 PM   10    1.98    0.00    0.00    0.99    0.99    0.00    0.00    0.00    0.00   96.04
    02:06:07 PM   11    3.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   97.00
    
    $ mpstat -P ALL 1

    What can we infer from this and what to look at?

    • Per-core CPU usage
    • Single-core bottleneck (one core at 100%) or imbalanced workloads
    • High %iowait → disk issue
    • Useful for diagnosing multi-threading issues and CPU pinning problems

    pidstat

    bash
    $ pidstat 1
    Linux 6.18.16-1-lts (vv) 	05/08/2026 	_x86_64_	(12 CPU)
    
    02:09:16 PM   UID       PID    %usr %system  %guest   %wait    %CPU   CPU  Command
    02:09:17 PM  1000      1107    1.98    0.99    0.00    0.00    2.97     8  wireplumber
    02:09:17 PM  1000      1960    0.99    0.00    0.00    0.00    0.99     4  Hyprland
    02:09:17 PM  1000      2003    5.94    1.98    0.00    0.00    7.92     8  qs
    02:09:17 PM  1000      2470    0.00    0.99    0.00    0.00    0.99     8  alacritty
    02:09:17 PM  1000      2810    0.99    0.00    0.00    0.00    0.99     2  zen
    02:09:17 PM  1000      3366    0.99    0.00    0.00    0.00    0.99     7  Utility Process
    02:09:17 PM  1000    157278    0.99    0.00    0.00    0.00    0.99     2  nvim
    02:09:17 PM  1000    162550    4.95    1.98    0.00    0.00    6.93     6  Isolated Web Co
    02:09:17 PM     0    184335    0.00    0.99    0.00    0.00    0.99     3  kworker/u49:1-hci0
    02:09:17 PM  1000    188110    0.00    0.99    0.00    0.00    0.99     7  pidstat
    
    • Per-process breakdown over time
    • Identify:
      • CPU-heavy processes
      • Processes waiting on I/O (%wait)
    • This is better than top for trends
    • Use when:
      • Issue is intermittent
      • Need per-process historical view

    Final Thoughts

    A quick workflow:

    1. uptime → is load high?
    2. vmstat → CPU vs memory vs I/O?
    3. iostat → disk bottleneck?
    4. top / pidstat → which process?
    5. dmesg → any kernel-level issues?

    More advanced tools later 🙂