1. CPU
type: Graph
Unit: short
max: "100"
min: "0"
Label: Percentage
System - cpu 在内核模式下执行的进程占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode="system",instance=~"$node:$port",job=~"$job"}[5m])) * 100
User - cpu 在用户模式下执行的正常进程占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='user',instance=~"$node:$port",job=~"$job"}[5m])) * 100
Nice - cpu 在用户模式下执行的 nice 进程占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='nice',instance=~"$node:$port",job=~"$job"}[5m])) * 100
Idle - cpu 在空闲模式下的占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='idle',instance=~"$node:$port",job=~"$job"}[5m])) * 100
Iowait - cpu 在 io 等待的占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='iowait',instance=~"$node:$port",job=~"$job"}[5m])) * 100
Irq - cpu 在服务中断的占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='irq',instance=~"$node:$port",job=~"$job"}[5m])) * 100
Softirq - cpu 在服务软中断的占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='softirq',instance=~"$node:$port",job=~"$job"}[5m])) * 100
Steal - 在 VM 中运行时其他 VM 占用的本 VM 的 cpu 的占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='steal',instance=~"$node:$port",job=~"$job"}[5m])) * 100
Guest - 运行各种 VM 使用的 CPU 占比
metrics:
sum by (mode)(irate(node_cpu_seconds_total{mode='guest',instance=~"$node:$port",job=~"$job"}[5m])) * 100
2. Memory Stack 内存堆栈 /proc/meminfo
type: Graph
Unit: bytes
min: "0"
Label: Bytes
Apps - 用户空间应用程序使用的内存
metrics:
node_memory_MemTotal_bytes{instance=~"$node:$port",job=~"$job"} - node_memory_MemFree_bytes{instance=~"$node:$port",job=~"$job"}
- node_memory_Buffers_bytes{instance=~"$node:$port",job=~"$job"} - node_memory_Cached_bytes{instance=~"$node:$port",job=~"$job"}
- node_memory_Slab_bytes{instance=~"$node:$port",job=~"$job"} - node_memory_PageTables_bytes{instance=~"$node:$port",job=~"$job"}
- node_memory_SwapCached_bytes{instance=~"$node:$port",job=~"$job"}
PageTables - 用于在虚拟和物理内存地址之间映射的内存
metrics:
node_memory_PageTables_bytes{instance=~"$node:$port",job=~"$job"}
SwapCache - 用于跟踪已从交换区中提取出来但尚未修改的页面的内存
metrics:
node_memory_SwapCached_bytes{instance=~"$node:$port",job=~"$job"}
Slab - 内核用于缓存数据结构以供自己使用的内存(如 inode,dentry 等缓存)
metrics:
node_memory_Slab_bytes{instance=~"$node:$port",job=~"$job"}
Cache - 频繁访问的文件数据或内容的缓存
metrics:
node_memory_Cached_bytes{instance=~"$node:$port",job=~"$job"}
Buffers - 块设备(例如硬盘)缓存
metrics:
node_memory_Buffers_bytes{instance=~"$node:$port",job=~"$job"}
Unused - 未使用的内存大小
metrics:
node_memory_MemFree_bytes{instance=~"$node:$port",job=~"$job"}
Swap - 交换分区使用的空间
metrics:
(node_memory_SwapTotal_bytes{instance=~"$node:$port",job=~"$job"} - node_memory_SwapFree_bytes{instance=~"$node:$port",job=~"$job"})
Harware Corrupted - 内核识别为已损坏或不工作的内存量
metrics:
node_memory_HardwareCorrupted_bytes{instance=~"$node:$port",job=~"$job"}
3. Network Traffic 各个网络接口的传输速率
type: Graph
Unit: bytes/sec
Label: Bytes out(-)/in(+)
{{device}} - Receive 各个网络接口下载速率
metrics:
irate(node_network_receive_bytes_total{instance=~"$node:$port",job=~"$job"}[5m])
{{device}} - Transmit 各个网络接口上传速率
metrics:
irate(node_network_transmit_bytes_total{instance=~"$node:$port",job=~"$job"}[5m])
4. Disk Space Used 所有挂载的文件系统的磁盘空间大小
type: Graph
Unit: bytes
min: "0"
Label: Bytes
metrics:
node_filesystem_size_bytes{instance=~"$node:$port",job=~"$job",device!~'rootfs'} - node_filesystem_avail_bytes{instance=~"$node:$port",job=~"$job",device!~'rootfs'}
5. Disk IOps 磁盘读写
type: Graph
Unit: I/O ops/sec (iops)
Label: IO read(-)/write(+)
{{device}} - Reads completed 磁盘的读取速率(五分钟内)
metrics:
irate(node_disk_reads_completed_total{instance=~"$node:$port",job=~"$job",device=~"[a-z]*[a-z]"}[5m])
{{device}} - Writes completed 磁盘的写入速率(五分钟内)
metrics:
irate(node_disk_writes_completed_total{instance=~"$node:$port",job=~"$job",device=~"[a-z]*[a-z]"}[5m])
6. I/O Usage Read / Write
type: Graph
Unit: bytes
Label: Bytes read(-)/write(+)
成功读取的字节数(五分钟内)
metrics:
irate(node_disk_read_bytes_total{instance=~"$node:$port",job=~"$job",device=~"[a-z]*[a-z]"}[5m])
成功写入的字节数(五分钟内)
metrics:
irate(node_disk_written_bytes_total{instance=~"$node:$port",job=~"$job",device=~"[a-z]*[a-z]"}[5m])
7. I/O Usage Times 执行I / O所花费的总时间。
type: Graph
Unit: ms
Label: Milliseconds
metrics:
irate(node_disk_io_time_seconds_total{instance=~"$node:$port",job=~"$job",device=~"[a-z]*[a-z]"} [5m])
参考资料: