zoukankan      html  css  js  c++  java
  • Redis监控之redis_exporter+prometheus+grafana+alertmanager

    Redis监控之redis_exporter+prometheus+grafana+alertmanager

    redis_exporter安装完后获取的数据太乱阅读太困难,需要配合prometheus和grafana。

    操作系统是CentOS Linux 7。 

    不出意外需要账号密码的默认都是admin/admin

    redis_exporter部署

    下载地址:https://github.com/oliver006/redis_exporter/releases/tag/v1.24.0

    另外的参考地址:

    https://docs.gitlab.com/ee/administration/monitoring/prometheus/redis_exporter.html
    https://github.com/oliver006/redis_exporter

    下载的文件:redis_exporter-v1.24.0.linux-amd64.tar.gz

    解压安装:

    tar -zxvf redis_exporter-v1.24.0.linux-amd64.tar.gz -C /
    mv /redis_exporter-v1.24.0.linux-amd64/ /redis_exporter

    启动redis_exporter

    [root@node1 soft]# cd /redis_exporter/
    [root@node1 redis_exporter]# ./redis_exporter -redis.addr 192.168.1.214:6380 -web.listen-address 192.168.1.178:9121
    INFO[0000] Redis Metrics Exporter v1.24.0    build date: 2021-06-09-01:40:46    sha1: b95cf3b5ce7543119b303766662d1f0400caea94    Go: go1.16.5    GOOS: linux    GOARCH: amd64 
    INFO[0000] Providing metrics at 192.168.1.178:9121/metrics 
    ERRO[0015] Couldn't connect to redis instance

    网上那些一次性写多个地址的方式并不可取,如

    -redis.addr 192.168.1.214:6380,192.168.1.214:6379,192.168.1.214:6381

    每次刷新都会报错ERRO[0001],如下

    [root@node1 redis_exporter]# ./redis_exporter -redis.addr 192.168.1.214:6380,192.168.1.214:6379,192,168.1.214:6381 -web.listen-address 192.168.1.178:9121
    INFO[0000] Redis Metrics Exporter v1.24.0    build date: 2021-06-09-01:40:46    sha1: b95cf3b5ce7543119b303766662d1f0400caea94    Go: go1.16.5    GOOS: linux    GOARCH: amd64 
    INFO[0000] Providing metrics at 192.168.1.178:9121/metrics 
    ERRO[0001] Couldn't connect to redis instance 

    这里就只写一个主节点的地址192.168.1.214:6380,网络资料说的是可以自动获取集群其他节点的信息,不过我这个是主从的目前看也是可以自动获取的。

    访问192.168.1.178:9121/metrics可以看到获取的信息。

    # HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
    # TYPE go_gc_duration_seconds summary
    go_gc_duration_seconds{quantile="0"} 4.4411e-05
    go_gc_duration_seconds{quantile="0.25"} 9.8068e-05
    go_gc_duration_seconds{quantile="0.5"} 0.000130716
    go_gc_duration_seconds{quantile="0.75"} 0.000174814
    go_gc_duration_seconds{quantile="1"} 0.000622031
    go_gc_duration_seconds_sum 0.047733795
    go_gc_duration_seconds_count 326
    # HELP go_goroutines Number of goroutines that currently exist.
    # TYPE go_goroutines gauge
    go_goroutines 10
    # HELP go_info Information about the Go environment.
    # TYPE go_info gauge
    go_info{version="go1.16.5"} 1
    # HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
    # TYPE go_memstats_alloc_bytes gauge
    go_memstats_alloc_bytes 3.17684e+06
    # HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
    # TYPE go_memstats_alloc_bytes_total counter
    go_memstats_alloc_bytes_total 5.85939608e+08
    # HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
    # TYPE go_memstats_buck_hash_sys_bytes gauge
    go_memstats_buck_hash_sys_bytes 1.499842e+06
    # HELP go_memstats_frees_total Total number of frees.
    # TYPE go_memstats_frees_total counter
    go_memstats_frees_total 4.416845e+06
    # HELP go_memstats_gc_cpu_fraction The fraction of this program's available CPU time used by the GC since the program started.
    # TYPE go_memstats_gc_cpu_fraction gauge
    go_memstats_gc_cpu_fraction 4.7848542653098556e-05
    # HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
    # TYPE go_memstats_gc_sys_bytes gauge
    go_memstats_gc_sys_bytes 5.065448e+06
    # HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
    # TYPE go_memstats_heap_alloc_bytes gauge
    go_memstats_heap_alloc_bytes 3.17684e+06
    # HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
    # TYPE go_memstats_heap_idle_bytes gauge
    go_memstats_heap_idle_bytes 6.1833216e+07
    # HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
    # TYPE go_memstats_heap_inuse_bytes gauge
    go_memstats_heap_inuse_bytes 4.620288e+06
    # HELP go_memstats_heap_objects Number of allocated objects.
    # TYPE go_memstats_heap_objects gauge
    go_memstats_heap_objects 4394
    # HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
    # TYPE go_memstats_heap_released_bytes gauge
    go_memstats_heap_released_bytes 6.1087744e+07
    # HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
    # TYPE go_memstats_heap_sys_bytes gauge
    go_memstats_heap_sys_bytes 6.6453504e+07
    # HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
    # TYPE go_memstats_last_gc_time_seconds gauge
    go_memstats_last_gc_time_seconds 1.62787137367609e+09
    # HELP go_memstats_lookups_total Total number of pointer lookups.
    # TYPE go_memstats_lookups_total counter
    go_memstats_lookups_total 0
    # HELP go_memstats_mallocs_total Total number of mallocs.
    # TYPE go_memstats_mallocs_total counter
    go_memstats_mallocs_total 4.421239e+06
    # HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
    # TYPE go_memstats_mcache_inuse_bytes gauge
    go_memstats_mcache_inuse_bytes 4800
    # HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
    # TYPE go_memstats_mcache_sys_bytes gauge
    go_memstats_mcache_sys_bytes 16384
    # HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
    # TYPE go_memstats_mspan_inuse_bytes gauge
    go_memstats_mspan_inuse_bytes 78744
    # HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
    # TYPE go_memstats_mspan_sys_bytes gauge
    go_memstats_mspan_sys_bytes 114688
    # HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
    # TYPE go_memstats_next_gc_bytes gauge
    go_memstats_next_gc_bytes 6.200176e+06
    # HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
    # TYPE go_memstats_other_sys_bytes gauge
    go_memstats_other_sys_bytes 988766
    # HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
    # TYPE go_memstats_stack_inuse_bytes gauge
    go_memstats_stack_inuse_bytes 655360
    # HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
    # TYPE go_memstats_stack_sys_bytes gauge
    go_memstats_stack_sys_bytes 655360
    # HELP go_memstats_sys_bytes Number of bytes obtained from system.
    # TYPE go_memstats_sys_bytes gauge
    go_memstats_sys_bytes 7.4793992e+07
    # HELP go_threads Number of OS threads created.
    # TYPE go_threads gauge
    go_threads 7
    # HELP process_cpu_seconds_total Total user and system CPU time spent in seconds.
    # TYPE process_cpu_seconds_total counter
    process_cpu_seconds_total 18.19
    # HELP process_max_fds Maximum number of open file descriptors.
    # TYPE process_max_fds gauge
    process_max_fds 1024
    # HELP process_open_fds Number of open file descriptors.
    # TYPE process_open_fds gauge
    process_open_fds 13
    # HELP process_resident_memory_bytes Resident memory size in bytes.
    # TYPE process_resident_memory_bytes gauge
    process_resident_memory_bytes 1.1882496e+07
    # HELP process_start_time_seconds Start time of the process since unix epoch in seconds.
    # TYPE process_start_time_seconds gauge
    process_start_time_seconds 1.62786724615e+09
    # HELP process_virtual_memory_bytes Virtual memory size in bytes.
    # TYPE process_virtual_memory_bytes gauge
    process_virtual_memory_bytes 7.30558464e+08
    # HELP process_virtual_memory_max_bytes Maximum amount of virtual memory available in bytes.
    # TYPE process_virtual_memory_max_bytes gauge
    process_virtual_memory_max_bytes 1.8446744073709552e+19
    # HELP redis_active_defrag_running active_defrag_running metric
    # TYPE redis_active_defrag_running gauge
    redis_active_defrag_running 0
    # HELP redis_aof_current_rewrite_duration_sec aof_current_rewrite_duration_sec metric
    # TYPE redis_aof_current_rewrite_duration_sec gauge
    redis_aof_current_rewrite_duration_sec -1
    # HELP redis_aof_enabled aof_enabled metric
    # TYPE redis_aof_enabled gauge
    redis_aof_enabled 0
    # HELP redis_aof_last_bgrewrite_status aof_last_bgrewrite_status metric
    # TYPE redis_aof_last_bgrewrite_status gauge
    redis_aof_last_bgrewrite_status 1
    # HELP redis_aof_last_cow_size_bytes aof_last_cow_size_bytes metric
    # TYPE redis_aof_last_cow_size_bytes gauge
    redis_aof_last_cow_size_bytes 0
    # HELP redis_aof_last_rewrite_duration_sec aof_last_rewrite_duration_sec metric
    # TYPE redis_aof_last_rewrite_duration_sec gauge
    redis_aof_last_rewrite_duration_sec -1
    # HELP redis_aof_last_write_status aof_last_write_status metric
    # TYPE redis_aof_last_write_status gauge
    redis_aof_last_write_status 1
    # HELP redis_aof_rewrite_in_progress aof_rewrite_in_progress metric
    # TYPE redis_aof_rewrite_in_progress gauge
    redis_aof_rewrite_in_progress 0
    # HELP redis_aof_rewrite_scheduled aof_rewrite_scheduled metric
    # TYPE redis_aof_rewrite_scheduled gauge
    redis_aof_rewrite_scheduled 0
    # HELP redis_blocked_clients blocked_clients metric
    # TYPE redis_blocked_clients gauge
    redis_blocked_clients 0
    # HELP redis_client_biggest_input_buf client_biggest_input_buf metric
    # TYPE redis_client_biggest_input_buf gauge
    redis_client_biggest_input_buf 0
    # HELP redis_client_longest_output_list client_longest_output_list metric
    # TYPE redis_client_longest_output_list gauge
    redis_client_longest_output_list 0
    # HELP redis_cluster_enabled cluster_enabled metric
    # TYPE redis_cluster_enabled gauge
    redis_cluster_enabled 0
    # HELP redis_commands_duration_seconds_total Total amount of time in seconds spent per command
    # TYPE redis_commands_duration_seconds_total counter
    redis_commands_duration_seconds_total{cmd="auth"} 1.6e-05
    redis_commands_duration_seconds_total{cmd="client"} 0.119519
    redis_commands_duration_seconds_total{cmd="command"} 0.000553
    redis_commands_duration_seconds_total{cmd="config"} 0.560761
    redis_commands_duration_seconds_total{cmd="del"} 40.078852
    redis_commands_duration_seconds_total{cmd="eval"} 0.000648
    redis_commands_duration_seconds_total{cmd="evalsha"} 52.593835
    redis_commands_duration_seconds_total{cmd="exists"} 0.002163
    redis_commands_duration_seconds_total{cmd="expire"} 4.639735
    redis_commands_duration_seconds_total{cmd="get"} 39.35076
    redis_commands_duration_seconds_total{cmd="hdel"} 0.032488
    redis_commands_duration_seconds_total{cmd="hget"} 4.143723
    redis_commands_duration_seconds_total{cmd="hgetall"} 52.309559
    redis_commands_duration_seconds_total{cmd="hincrby"} 6.253747
    redis_commands_duration_seconds_total{cmd="hlen"} 0.000279
    redis_commands_duration_seconds_total{cmd="hmset"} 97.246473
    redis_commands_duration_seconds_total{cmd="host"} 0.002547
    redis_commands_duration_seconds_total{cmd="hscan"} 0.027941
    redis_commands_duration_seconds_total{cmd="hset"} 0.111718
    redis_commands_duration_seconds_total{cmd="incr"} 0.081717
    redis_commands_duration_seconds_total{cmd="incrby"} 0.790273
    redis_commands_duration_seconds_total{cmd="info"} 472.399096
    redis_commands_duration_seconds_total{cmd="keys"} 0.011277
    redis_commands_duration_seconds_total{cmd="latency"} 0.011697
    redis_commands_duration_seconds_total{cmd="lindex"} 0.003309
    redis_commands_duration_seconds_total{cmd="llen"} 0.000243
    redis_commands_duration_seconds_total{cmd="lrange"} 0.714049
    redis_commands_duration_seconds_total{cmd="lrem"} 0.002257
    redis_commands_duration_seconds_total{cmd="ltrim"} 0.081033
    redis_commands_duration_seconds_total{cmd="pexpire"} 0.053587
    redis_commands_duration_seconds_total{cmd="ping"} 33.619505
    redis_commands_duration_seconds_total{cmd="psync"} 0.010975
    redis_commands_duration_seconds_total{cmd="publish"} 47.437203
    redis_commands_duration_seconds_total{cmd="replconf"} 24.135835
    redis_commands_duration_seconds_total{cmd="rpush"} 0.724147
    redis_commands_duration_seconds_total{cmd="sadd"} 9.122367
    redis_commands_duration_seconds_total{cmd="scan"} 183.549755
    redis_commands_duration_seconds_total{cmd="scard"} 1.271612
    redis_commands_duration_seconds_total{cmd="select"} 12.112273
    redis_commands_duration_seconds_total{cmd="set"} 59.943641
    redis_commands_duration_seconds_total{cmd="setex"} 0.390939
    redis_commands_duration_seconds_total{cmd="setnx"} 5.509553
    redis_commands_duration_seconds_total{cmd="slowlog"} 0.062131
    redis_commands_duration_seconds_total{cmd="smembers"} 0.108663
    redis_commands_duration_seconds_total{cmd="spop"} 0.6798
    redis_commands_duration_seconds_total{cmd="srem"} 0.014079
    redis_commands_duration_seconds_total{cmd="sscan"} 0.002472
    redis_commands_duration_seconds_total{cmd="subscribe"} 1.2e-05
    redis_commands_duration_seconds_total{cmd="ttl"} 0.002117
    redis_commands_duration_seconds_total{cmd="type"} 0.003339
    redis_commands_duration_seconds_total{cmd="unlink"} 0.020745
    # HELP redis_commands_processed_total commands_processed_total metric
    # TYPE redis_commands_processed_total counter
    redis_commands_processed_total 1.27407536e+08
    # HELP redis_commands_total Total number of calls per command
    # TYPE redis_commands_total counter
    redis_commands_total{cmd="auth"} 9
    redis_commands_total{cmd="client"} 79475
    redis_commands_total{cmd="command"} 1
    redis_commands_total{cmd="config"} 4578
    redis_commands_total{cmd="del"} 137331
    redis_commands_total{cmd="eval"} 3
    redis_commands_total{cmd="evalsha"} 1.528261e+06
    redis_commands_total{cmd="exists"} 622
    redis_commands_total{cmd="expire"} 2.031993e+06
    redis_commands_total{cmd="get"} 1.195089e+07
    redis_commands_total{cmd="hdel"} 3209
    redis_commands_total{cmd="hget"} 998016
    redis_commands_total{cmd="hgetall"} 5.695487e+06
    redis_commands_total{cmd="hincrby"} 654030
    redis_commands_total{cmd="hlen"} 76
    redis_commands_total{cmd="hmset"} 6.570541e+06
    redis_commands_total{cmd="host"} 52
    redis_commands_total{cmd="hscan"} 76
    redis_commands_total{cmd="hset"} 6202
    redis_commands_total{cmd="incr"} 7435
    redis_commands_total{cmd="incrby"} 121021
    redis_commands_total{cmd="info"} 3.791154e+06
    redis_commands_total{cmd="keys"} 78
    redis_commands_total{cmd="latency"} 4444
    redis_commands_total{cmd="lindex"} 46
    redis_commands_total{cmd="llen"} 52
    redis_commands_total{cmd="lrange"} 170093
    redis_commands_total{cmd="lrem"} 46
    redis_commands_total{cmd="ltrim"} 3808
    redis_commands_total{cmd="pexpire"} 13934
    redis_commands_total{cmd="ping"} 2.7573152e+07
    redis_commands_total{cmd="psync"} 4
    redis_commands_total{cmd="publish"} 7.048611e+06
    redis_commands_total{cmd="replconf"} 1.4497687e+07
    redis_commands_total{cmd="rpush"} 10005
    redis_commands_total{cmd="sadd"} 559362
    redis_commands_total{cmd="scan"} 1.2812383e+07
    redis_commands_total{cmd="scard"} 258338
    redis_commands_total{cmd="select"} 1.0435721e+07
    redis_commands_total{cmd="set"} 1.8583699e+07
    redis_commands_total{cmd="setex"} 42367
    redis_commands_total{cmd="setnx"} 1.535913e+06
    redis_commands_total{cmd="slowlog"} 8888
    redis_commands_total{cmd="smembers"} 22600
    redis_commands_total{cmd="spop"} 236576
    redis_commands_total{cmd="srem"} 1752
    redis_commands_total{cmd="sscan"} 33
    redis_commands_total{cmd="subscribe"} 2
    redis_commands_total{cmd="ttl"} 670
    redis_commands_total{cmd="type"} 677
    redis_commands_total{cmd="unlink"} 6133
    # HELP redis_config_maxclients config_maxclients metric
    # TYPE redis_config_maxclients gauge
    redis_config_maxclients 10000
    # HELP redis_config_maxmemory config_maxmemory metric
    # TYPE redis_config_maxmemory gauge
    redis_config_maxmemory 0
    # HELP redis_connected_clients connected_clients metric
    # TYPE redis_connected_clients gauge
    redis_connected_clients 86
    # HELP redis_connected_slave_lag_seconds Lag of connected slave
    # TYPE redis_connected_slave_lag_seconds gauge
    redis_connected_slave_lag_seconds{slave_ip="192.168.1.214",slave_port="6379",slave_state="online"} 1
    redis_connected_slave_lag_seconds{slave_ip="192.168.1.214",slave_port="6381",slave_state="online"} 1
    # HELP redis_connected_slave_offset_bytes Offset of connected slave
    # TYPE redis_connected_slave_offset_bytes gauge
    redis_connected_slave_offset_bytes{slave_ip="192.168.1.214",slave_port="6379",slave_state="online"} 2.1943761833e+10
    redis_connected_slave_offset_bytes{slave_ip="192.168.1.214",slave_port="6381",slave_state="online"} 2.1943761833e+10
    # HELP redis_connected_slaves connected_slaves metric
    # TYPE redis_connected_slaves gauge
    redis_connected_slaves 2
    # HELP redis_connections_received_total connections_received_total metric
    # TYPE redis_connections_received_total counter
    redis_connections_received_total 4.7644e+06
    # HELP redis_cpu_sys_children_seconds_total cpu_sys_children_seconds_total metric
    # TYPE redis_cpu_sys_children_seconds_total counter
    redis_cpu_sys_children_seconds_total 1195.64
    # HELP redis_cpu_sys_seconds_total cpu_sys_seconds_total metric
    # TYPE redis_cpu_sys_seconds_total counter
    redis_cpu_sys_seconds_total 12650.77
    # HELP redis_cpu_user_children_seconds_total cpu_user_children_seconds_total metric
    # TYPE redis_cpu_user_children_seconds_total counter
    redis_cpu_user_children_seconds_total 8929.86
    # HELP redis_cpu_user_seconds_total cpu_user_seconds_total metric
    # TYPE redis_cpu_user_seconds_total counter
    redis_cpu_user_seconds_total 8919.24
    # HELP redis_db_avg_ttl_seconds Avg TTL in seconds
    # TYPE redis_db_avg_ttl_seconds gauge
    redis_db_avg_ttl_seconds{db="db11"} 1825.3
    redis_db_avg_ttl_seconds{db="db12"} 71020.336
    redis_db_avg_ttl_seconds{db="db13"} 84212.367
    redis_db_avg_ttl_seconds{db="db14"} 36.304
    redis_db_avg_ttl_seconds{db="db15"} 0
    redis_db_avg_ttl_seconds{db="db4"} 2306.138
    redis_db_avg_ttl_seconds{db="db5"} 0
    redis_db_avg_ttl_seconds{db="db6"} 0
    redis_db_avg_ttl_seconds{db="db7"} 1.422106525e+06
    redis_db_avg_ttl_seconds{db="db9"} 82129.002
    # HELP redis_db_keys Total number of keys by DB
    # TYPE redis_db_keys gauge
    redis_db_keys{db="db0"} 0
    redis_db_keys{db="db1"} 0
    redis_db_keys{db="db10"} 0
    redis_db_keys{db="db11"} 102
    redis_db_keys{db="db12"} 83
    redis_db_keys{db="db13"} 56
    redis_db_keys{db="db14"} 232
    redis_db_keys{db="db15"} 3
    redis_db_keys{db="db16"} 0
    redis_db_keys{db="db17"} 0
    redis_db_keys{db="db18"} 0
    redis_db_keys{db="db19"} 0
    redis_db_keys{db="db2"} 0
    redis_db_keys{db="db3"} 0
    redis_db_keys{db="db4"} 8
    redis_db_keys{db="db5"} 3
    redis_db_keys{db="db6"} 6
    redis_db_keys{db="db7"} 998
    redis_db_keys{db="db8"} 0
    redis_db_keys{db="db9"} 24
    # HELP redis_db_keys_expiring Total number of expiring keys by DB
    # TYPE redis_db_keys_expiring gauge
    redis_db_keys_expiring{db="db0"} 0
    redis_db_keys_expiring{db="db1"} 0
    redis_db_keys_expiring{db="db10"} 0
    redis_db_keys_expiring{db="db11"} 1
    redis_db_keys_expiring{db="db12"} 15
    redis_db_keys_expiring{db="db13"} 2
    redis_db_keys_expiring{db="db14"} 2
    redis_db_keys_expiring{db="db15"} 0
    redis_db_keys_expiring{db="db16"} 0
    redis_db_keys_expiring{db="db17"} 0
    redis_db_keys_expiring{db="db18"} 0
    redis_db_keys_expiring{db="db19"} 0
    redis_db_keys_expiring{db="db2"} 0
    redis_db_keys_expiring{db="db3"} 0
    redis_db_keys_expiring{db="db4"} 8
    redis_db_keys_expiring{db="db5"} 0
    redis_db_keys_expiring{db="db6"} 0
    redis_db_keys_expiring{db="db7"} 960
    redis_db_keys_expiring{db="db8"} 0
    redis_db_keys_expiring{db="db9"} 3
    # HELP redis_defrag_hits defrag_hits metric
    # TYPE redis_defrag_hits gauge
    redis_defrag_hits 0
    # HELP redis_defrag_key_hits defrag_key_hits metric
    # TYPE redis_defrag_key_hits gauge
    redis_defrag_key_hits 0
    # HELP redis_defrag_key_misses defrag_key_misses metric
    # TYPE redis_defrag_key_misses gauge
    redis_defrag_key_misses 0
    # HELP redis_defrag_misses defrag_misses metric
    # TYPE redis_defrag_misses gauge
    redis_defrag_misses 0
    # HELP redis_evicted_keys_total evicted_keys_total metric
    # TYPE redis_evicted_keys_total counter
    redis_evicted_keys_total 0
    # HELP redis_expired_keys_total expired_keys_total metric
    # TYPE redis_expired_keys_total counter
    redis_expired_keys_total 42862
    # HELP redis_exporter_build_info redis exporter build_info
    # TYPE redis_exporter_build_info gauge
    redis_exporter_build_info{build_date="2021-06-09-01:40:46",commit_sha="b95cf3b5ce7543119b303766662d1f0400caea94",golang_version="go1.16.5",version="v1.24.0"} 1
    # HELP redis_exporter_last_scrape_connect_time_seconds exporter_last_scrape_connect_time_seconds metric
    # TYPE redis_exporter_last_scrape_connect_time_seconds gauge
    redis_exporter_last_scrape_connect_time_seconds 0.000938134
    # HELP redis_exporter_last_scrape_duration_seconds exporter_last_scrape_duration_seconds metric
    # TYPE redis_exporter_last_scrape_duration_seconds gauge
    redis_exporter_last_scrape_duration_seconds 0.00479455
    # HELP redis_exporter_last_scrape_error The last scrape error status.
    # TYPE redis_exporter_last_scrape_error gauge
    redis_exporter_last_scrape_error{err=""} 0
    # HELP redis_exporter_scrape_duration_seconds Durations of scrapes by the exporter
    # TYPE redis_exporter_scrape_duration_seconds summary
    redis_exporter_scrape_duration_seconds_sum 1.1995302149999998
    redis_exporter_scrape_duration_seconds_count 237
    # HELP redis_exporter_scrapes_total Current total redis scrapes.
    # TYPE redis_exporter_scrapes_total counter
    redis_exporter_scrapes_total 237
    # HELP redis_instance_info Information about the Redis instance
    # TYPE redis_instance_info gauge
    redis_instance_info{maxmemory_policy="noeviction",os="Linux 3.10.0-957.el7.x86_64 x86_64",process_id="5428",redis_build_id="2d12e85652dc7ce9",redis_mode="standalone",redis_version="4.0.2",role="master",run_id="3f70dd786f2534fae677062ac371f87fd78fe914",tcp_port="6380"} 1
    # HELP redis_keyspace_hits_total keyspace_hits_total metric
    # TYPE redis_keyspace_hits_total counter
    redis_keyspace_hits_total 9.793446e+06
    # HELP redis_keyspace_misses_total keyspace_misses_total metric
    # TYPE redis_keyspace_misses_total counter
    redis_keyspace_misses_total 9.303561e+06
    # HELP redis_last_key_groups_scrape_duration_milliseconds Duration of the last key group metrics scrape in milliseconds
    # TYPE redis_last_key_groups_scrape_duration_milliseconds gauge
    redis_last_key_groups_scrape_duration_milliseconds 0
    # HELP redis_last_slow_execution_duration_seconds The amount of time needed for last slow execution, in seconds
    # TYPE redis_last_slow_execution_duration_seconds gauge
    redis_last_slow_execution_duration_seconds 0.059945
    # HELP redis_latest_fork_seconds latest_fork_seconds metric
    # TYPE redis_latest_fork_seconds gauge
    redis_latest_fork_seconds 0.006136
    # HELP redis_lazyfree_pending_objects lazyfree_pending_objects metric
    # TYPE redis_lazyfree_pending_objects gauge
    redis_lazyfree_pending_objects 0
    # HELP redis_loading_dump_file loading_dump_file metric
    # TYPE redis_loading_dump_file gauge
    redis_loading_dump_file 0
    # HELP redis_master_repl_offset master_repl_offset metric
    # TYPE redis_master_repl_offset gauge
    redis_master_repl_offset 2.1943761833e+10
    # HELP redis_mem_fragmentation_ratio mem_fragmentation_ratio metric
    # TYPE redis_mem_fragmentation_ratio gauge
    redis_mem_fragmentation_ratio 1.12
    # HELP redis_memory_max_bytes memory_max_bytes metric
    # TYPE redis_memory_max_bytes gauge
    redis_memory_max_bytes 0
    # HELP redis_memory_used_bytes memory_used_bytes metric
    # TYPE redis_memory_used_bytes gauge
    redis_memory_used_bytes 1.38829216e+08
    # HELP redis_memory_used_dataset_bytes memory_used_dataset_bytes metric
    # TYPE redis_memory_used_dataset_bytes gauge
    redis_memory_used_dataset_bytes 1.35237404e+08
    # HELP redis_memory_used_lua_bytes memory_used_lua_bytes metric
    # TYPE redis_memory_used_lua_bytes gauge
    redis_memory_used_lua_bytes 37888
    # HELP redis_memory_used_overhead_bytes memory_used_overhead_bytes metric
    # TYPE redis_memory_used_overhead_bytes gauge
    redis_memory_used_overhead_bytes 3.591812e+06
    # HELP redis_memory_used_peak_bytes memory_used_peak_bytes metric
    # TYPE redis_memory_used_peak_bytes gauge
    redis_memory_used_peak_bytes 1.3938588e+08
    # HELP redis_memory_used_rss_bytes memory_used_rss_bytes metric
    # TYPE redis_memory_used_rss_bytes gauge
    redis_memory_used_rss_bytes 1.55652096e+08
    # HELP redis_memory_used_startup_bytes memory_used_startup_bytes metric
    # TYPE redis_memory_used_startup_bytes gauge
    redis_memory_used_startup_bytes 767968
    # HELP redis_migrate_cached_sockets_total migrate_cached_sockets_total metric
    # TYPE redis_migrate_cached_sockets_total gauge
    redis_migrate_cached_sockets_total 0
    # HELP redis_net_input_bytes_total net_input_bytes_total metric
    # TYPE redis_net_input_bytes_total counter
    redis_net_input_bytes_total 2.9809647461e+10
    # HELP redis_net_output_bytes_total net_output_bytes_total metric
    # TYPE redis_net_output_bytes_total counter
    redis_net_output_bytes_total 7.3329383597e+10
    # HELP redis_process_id process_id metric
    # TYPE redis_process_id gauge
    redis_process_id 5428
    # HELP redis_pubsub_channels pubsub_channels metric
    # TYPE redis_pubsub_channels gauge
    redis_pubsub_channels 1
    # HELP redis_pubsub_patterns pubsub_patterns metric
    # TYPE redis_pubsub_patterns gauge
    redis_pubsub_patterns 0
    # HELP redis_rdb_bgsave_in_progress rdb_bgsave_in_progress metric
    # TYPE redis_rdb_bgsave_in_progress gauge
    redis_rdb_bgsave_in_progress 0
    # HELP redis_rdb_changes_since_last_save rdb_changes_since_last_save metric
    # TYPE redis_rdb_changes_since_last_save gauge
    redis_rdb_changes_since_last_save 1670
    # HELP redis_rdb_current_bgsave_duration_sec rdb_current_bgsave_duration_sec metric
    # TYPE redis_rdb_current_bgsave_duration_sec gauge
    redis_rdb_current_bgsave_duration_sec -1
    # HELP redis_rdb_last_bgsave_duration_sec rdb_last_bgsave_duration_sec metric
    # TYPE redis_rdb_last_bgsave_duration_sec gauge
    redis_rdb_last_bgsave_duration_sec 0
    # HELP redis_rdb_last_bgsave_status rdb_last_bgsave_status metric
    # TYPE redis_rdb_last_bgsave_status gauge
    redis_rdb_last_bgsave_status 1
    # HELP redis_rdb_last_cow_size_bytes rdb_last_cow_size_bytes metric
    # TYPE redis_rdb_last_cow_size_bytes gauge
    redis_rdb_last_cow_size_bytes 3.2497664e+07
    # HELP redis_rdb_last_save_timestamp_seconds rdb_last_save_timestamp_seconds metric
    # TYPE redis_rdb_last_save_timestamp_seconds gauge
    redis_rdb_last_save_timestamp_seconds 1.627871113e+09
    # HELP redis_rejected_connections_total rejected_connections_total metric
    # TYPE redis_rejected_connections_total counter
    redis_rejected_connections_total 0
    # HELP redis_repl_backlog_first_byte_offset repl_backlog_first_byte_offset metric
    # TYPE redis_repl_backlog_first_byte_offset gauge
    redis_repl_backlog_first_byte_offset 2.1942713258e+10
    # HELP redis_repl_backlog_history_bytes repl_backlog_history_bytes metric
    # TYPE redis_repl_backlog_history_bytes gauge
    redis_repl_backlog_history_bytes 1.048576e+06
    # HELP redis_repl_backlog_is_active repl_backlog_is_active metric
    # TYPE redis_repl_backlog_is_active gauge
    redis_repl_backlog_is_active 1
    # HELP redis_replica_partial_resync_accepted replica_partial_resync_accepted metric
    # TYPE redis_replica_partial_resync_accepted gauge
    redis_replica_partial_resync_accepted 2
    # HELP redis_replica_partial_resync_denied replica_partial_resync_denied metric
    # TYPE redis_replica_partial_resync_denied gauge
    redis_replica_partial_resync_denied 1
    # HELP redis_replica_resyncs_full replica_resyncs_full metric
    # TYPE redis_replica_resyncs_full gauge
    redis_replica_resyncs_full 2
    # HELP redis_replication_backlog_bytes replication_backlog_bytes metric
    # TYPE redis_replication_backlog_bytes gauge
    redis_replication_backlog_bytes 1.048576e+06
    # HELP redis_second_repl_offset second_repl_offset metric
    # TYPE redis_second_repl_offset gauge
    redis_second_repl_offset -1
    # HELP redis_slave_expires_tracked_keys slave_expires_tracked_keys metric
    # TYPE redis_slave_expires_tracked_keys gauge
    redis_slave_expires_tracked_keys 0
    # HELP redis_slowlog_last_id Last id of slowlog
    # TYPE redis_slowlog_last_id gauge
    redis_slowlog_last_id 12
    # HELP redis_slowlog_length Total slowlog
    # TYPE redis_slowlog_length gauge
    redis_slowlog_length 13
    # HELP redis_start_time_seconds Start time of the Redis instance since unix epoch in seconds.
    # TYPE redis_start_time_seconds gauge
    redis_start_time_seconds 1.620606909e+09
    # HELP redis_target_scrape_request_errors_total Errors in requests to the exporter
    # TYPE redis_target_scrape_request_errors_total counter
    redis_target_scrape_request_errors_total 0
    # HELP redis_up Information about the Redis instance
    # TYPE redis_up gauge
    redis_up 1
    # HELP redis_uptime_in_seconds uptime_in_seconds metric
    # TYPE redis_uptime_in_seconds gauge
    redis_uptime_in_seconds 7.264465e+06
    metrics模板

    这样redis_exporter也就部署完成了。

    设置开机自启动并启动redis_exporter。

    cat <<EOF >/etc/systemd/system/redis_exporter.service
    [Unit]
    Description=Prometheus exporter for Redis metrics.
    
    [Service]
    ExecStart=/redis_exporter/redis_exporter -redis.addr 192.168.1.214:6380 -web.listen-address 192.168.1.178:9121
    Restart=on-failure
    
    [Install]
    WantedBy=multi-user.target
    EOF

    更新配置(记得停止前边手工启动的会话)

    systemctl daemon-reload
    systemctl enable redis_exporter.service
    systemctl restart redis_exporter.service
    systemctl status redis_exporter.service

    prometheus部署

    下载地址:https://github.com/prometheus/prometheus/releases/

    下载的文件:prometheus-2.28.1.linux-amd64.tar.gz

    解压即安装:

    [root@node1 soft]# tar -zxvf prometheus-2.28.1.linux-amd64.tar.gz
    [root@node1 soft]# mv prometheus-2.28.1.linux-amd64 /prometheus
    [root@node1 soft]# cd /prometheus/

    添加配置

    [root@node1 prometheus]# vi /prometheus/prometheus.yml
    添加:
    - job_name: 'redis_exporter_targets' static_configs: - targets: - redis://192.168.1.214:6380 - redis://192.168.1.214:6379 - redis://192.168.1.214:6381 metrics_path: /scrape relabel_configs: - source_labels: [__address__] target_label: __param_target - source_labels: [__param_target] target_label: instance - target_label: __address__ replacement: 192.168.1.178:9121 ## config for scraping the exporter itself - job_name: 'redis_exporter' static_configs: - targets: - 192.168.1.178:9121

    启动prometheus

    [root@node1 prometheus]# ./prometheus
    level=info ts=2021-08-02T02:40:06.001Z caller=main.go:389 msg="No time or size retention was set so using the default time retention" duration=15d
    level=info ts=2021-08-02T02:40:06.002Z caller=main.go:443 msg="Starting Prometheus" version="(version=2.28.1, branch=HEAD, revision=b0944590a1c9a6b35dc5a696869f75f422b107a1)"
    level=info ts=2021-08-02T02:40:06.002Z caller=main.go:448 build_context="(go=go1.16.5, user=root@2915dd495090, date=20210701-15:20:10)"
    level=info ts=2021-08-02T02:40:06.002Z caller=main.go:449 host_details="(Linux 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 node1 (none))"
    level=info ts=2021-08-02T02:40:06.002Z caller=main.go:450 fd_limits="(soft=1024, hard=4096)"
    level=info ts=2021-08-02T02:40:06.003Z caller=main.go:451 vm_limits="(soft=unlimited, hard=unlimited)"
    level=info ts=2021-08-02T02:40:06.012Z caller=web.go:541 component=web msg="Start listening for connections" address=0.0.0.0:9090
    level=info ts=2021-08-02T02:40:06.013Z caller=main.go:824 msg="Starting TSDB ..."
    level=info ts=2021-08-02T02:40:06.015Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1627581602588 maxt=1627588800000 ulid=01FBT12MPWQ0F1HNJTMBJRKVZ4
    level=info ts=2021-08-02T02:40:06.015Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1627588802588 maxt=1627596000000 ulid=01FBT7YBYX56PYJTSCNPDGNF8S
    level=info ts=2021-08-02T02:40:06.015Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1627546183899 maxt=1627581600000 ulid=01FBT7YCAJW6HK1ZWQT3GAXSHM
    level=info ts=2021-08-02T02:40:06.015Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1627596000000 maxt=1627603200000 ulid=01FC279X74Q6P1KKRSK6FSE4ZE
    level=info ts=2021-08-02T02:40:06.015Z caller=repair.go:57 component=tsdb msg="Found healthy block" mint=1627603202588 maxt=1627610400000 ulid=01FC279X9GBYG0VD4FS6MP8V0E
    level=info ts=2021-08-02T02:40:06.017Z caller=tls_config.go:191 component=web msg="TLS is disabled." http2=false
    level=info ts=2021-08-02T02:40:06.032Z caller=head.go:780 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
    level=info ts=2021-08-02T02:40:06.035Z caller=head.go:794 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=2.833274ms
    level=info ts=2021-08-02T02:40:06.035Z caller=head.go:800 component=tsdb msg="Replaying WAL, this may take a while"
    level=warn ts=2021-08-02T02:40:06.098Z caller=head.go:767 component=tsdb msg="Unknown series references" samples=15293 exemplars=0
    level=info ts=2021-08-02T02:40:06.098Z caller=head.go:826 component=tsdb msg="WAL checkpoint loaded"
    level=info ts=2021-08-02T02:40:06.116Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=31 maxSegment=34
    level=info ts=2021-08-02T02:40:06.117Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=32 maxSegment=34
    level=info ts=2021-08-02T02:40:06.131Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=33 maxSegment=34
    level=info ts=2021-08-02T02:40:06.131Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=34 maxSegment=34
    level=info ts=2021-08-02T02:40:06.131Z caller=head.go:860 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=62.785484ms wal_replay_duration=33.30167ms total_replay_duration=98.993811ms
    level=info ts=2021-08-02T02:40:06.140Z caller=main.go:851 fs_type=XFS_SUPER_MAGIC
    level=info ts=2021-08-02T02:40:06.140Z caller=main.go:854 msg="TSDB started"
    level=info ts=2021-08-02T02:40:06.140Z caller=main.go:981 msg="Loading configuration file" filename=prometheus.yml
    level=info ts=2021-08-02T02:40:06.150Z caller=main.go:1012 msg="Completed loading of configuration file" filename=prometheus.yml totalDuration=9.905156ms remote_storage=12.884µs web_handler=860ns query_engine=7.018µs scrape=1.041656ms scrape_sd=149.496µs notify=76.325µs notify_sd=43.197µs rules=7.250541ms
    level=info ts=2021-08-02T02:40:06.150Z caller=main.go:796 msg="Server is ready to receive web requests."
    level=info ts=2021-08-02T02:40:13.942Z caller=compact.go:509 component=tsdb msg="write block resulted in empty block" mint=1627610400000 maxt=1627617600000 duration=23.036437ms
    level=info ts=2021-08-02T02:40:13.946Z caller=head.go:967 component=tsdb msg="Head GC completed" duration=3.883036ms
    level=info ts=2021-08-02T02:40:13.950Z caller=checkpoint.go:97 component=tsdb msg="Creating checkpoint" from_segment=31 to_segment=32 mint=1627617600000
    level=info ts=2021-08-02T02:40:14.059Z caller=head.go:1064 component=tsdb msg="WAL checkpoint complete" first=31 last=32 duration=109.568447ms

      

    访问192.168.1.178:9090可以看到获取的信息。

    加入开机启动服务

    vim /etc/systemd/system/prometheus.service
    [Unit]
    Description=Prometheus Monitoring System
     
    [Service]
    ExecStart=/prometheus/prometheus 
      --config.file=/prometheus/prometheus.yml 
      --web.listen-address=:9090
     
    Restart=on-failure
    [Install]
    WantedBy=multi-user.target

      

    停止前边前台方式的启动方法./prometheus。

    启动服务,设置开机自启,并检查服务开启状态。

    systemctl daemon-reload
    systemctl enable prometheus
    systemctl start prometheus
    systemctl status prometheus
    
    [root@node1 prometheus]# cat /etc/systemd/system/prometheus.service
    [Unit]
    Description=Prometheus Monitoring System
     
    [Service]
    ExecStart=/prometheus/prometheus 
      --config.file=/prometheus/prometheus.yml 
      --web.listen-address=:9090
     
    Restart=on-failure
    [Install]
    [root@node1 prometheus]# systemctl status prometheus
    ● prometheus.service - Prometheus Monitoring System
       Loaded: loaded (/etc/systemd/system/prometheus.service; static; vendor preset: disabled)
       Active: active (running) since Mon 2021-08-02 11:23:31 CST; 3min 26s ago
     Main PID: 30494 (prometheus)
       CGroup: /system.slice/prometheus.service
               └─30494 /prometheus/prometheus --config.file=/prometheus/prometheus.yml --web.listen-address=:9090
    
    Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.664Z caller=head.go:780 component=tsdb msg="Replaying on-disk memory mappable chunks if any"
    Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.664Z caller=head.go:794 component=tsdb msg="On-disk memory mappable chunks replay completed" duration=18.782µs
    Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.664Z caller=head.go:800 component=tsdb msg="Replaying WAL, this may take a while"
    Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.665Z caller=head.go:854 component=tsdb msg="WAL segment loaded" segment=0 maxSegment=0
    Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.665Z caller=head.go:860 component=tsdb msg="WAL replay completed" checkpoint_replay_duration=67.616µs wal_replay_…ration=815.018µs
    Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.668Z caller=main.go:851 fs_type=XFS_SUPER_MAGIC
    Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.668Z caller=main.go:854 msg="TSDB started"
    Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.668Z caller=main.go:981 msg="Loading configuration file" filename=/prometheus/prometheus.yml
    Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.677Z caller=main.go:1012 msg="Completed loading of configuration file" filename=/prometheus/prometheus.yml totalDuration=8.9289…ms
    Aug 02 11:23:31 node1 prometheus[30494]: level=info ts=2021-08-02T03:23:31.677Z caller=main.go:796 msg="Server is ready to receive web requests."
    Hint: Some lines were ellipsized, use -l to show in full.

      

    关于报警功能的实现,需要部署alertmanager来配合实现。

    至此,prometheus也部署完成。

    alertmanager部署

    下载地址:官网下载GitHub下载

    下载文件:alertmanager-0.22.2.linux-amd64.tar.gz

    解压安装:

    [root@node1 soft]# tar -zxvf alertmanager-0.22.2.linux-amd64.tar.gz -C /
    alertmanager-0.22.2.linux-amd64/
    alertmanager-0.22.2.linux-amd64/alertmanager.yml
    alertmanager-0.22.2.linux-amd64/LICENSE
    alertmanager-0.22.2.linux-amd64/NOTICE
    alertmanager-0.22.2.linux-amd64/alertmanager
    alertmanager-0.22.2.linux-amd64/amtool
    [root@node1 soft]# mv /alertmanager-0.22.2.linux-amd64/ /alertmanager
    [root@node1 soft]# cd /alertmanager/
    [root@node1 alertmanager]# ll
    total 47788
    -rwxr-xr-x 1 3434 3434 27074026 Jun  2 15:51 alertmanager
    -rw-r--r-- 1 3434 3434      348 Jun  2 15:56 alertmanager.yml
    -rwxr-xr-x 1 3434 3434 21839682 Jun  2 15:52 amtool
    -rw-r--r-- 1 3434 3434    11357 Jun  2 15:56 LICENSE
    -rw-r--r-- 1 3434 3434      457 Jun  2 15:56 NOTICE

      

    配置邮件发送信息,也有其他的如钉钉的,这里以邮件为例子。

    注意:smtp_smarthost不同邮箱是不一样的。

    vi /alertmanager/alertmanager.yml
    global:
      resolve_timeout: 5m
      smtp_smarthost: 'smtp.exmail.qq.com:465'
      smtp_from: 'zhaokm@xxxxxxx.xx'
      smtp_auth_username: 'zhaokm@xxxxxxx.xx'
      smtp_auth_password: '邮箱密码'
      smtp_require_tls: false
    
    route:
      group_by: ['alertname']
      group_wait: 5s
      group_interval: 5s
      repeat_interval: 5m
      receiver: 'email'
    receivers:
    - name: 'email'
      email_configs:
      - to: 'zhaokm@xxxxxxx.xx'
        send_resolved: true
    inhibit_rules:
      - source_match:
          severity: 'critical'
        target_match:
          severity: 'warning'
        equal: ['alertname', 'dev', 'instance']

     配置开机启动

    cat > /etc/systemd/system/alertmanager.service << "EOF"
    [Unit]
    Description=alertmanager
    After=local-fs.target network-online.target network.target
    Wants=local-fs.target network-online.target network.target
     
    [Service]
    ExecStart=/alertmanager/alertmanager --config.file=/alertmanager/alertmanager.yml
    Restart=on-failure
    [Install]
    WantedBy=multi-user.target
    EOF

     生效配置

    [root@node1 alertmanager]# systemctl daemon-reload
    [root@node1 alertmanager]# systemctl enable alertmanager
    Created symlink from /etc/systemd/system/multi-user.target.wants/alertmanager.service to /etc/systemd/system/alertmanager.service.
    [root@node1 alertmanager]# systemctl start alertmanager
    [root@node1 alertmanager]# systemctl status alertmanager
    ● alertmanager.service - alertmanager
       Loaded: loaded (/etc/systemd/system/alertmanager.service; enabled; vendor preset: disabled)
       Active: active (running) since Mon 2021-08-02 15:14:58 CST; 3s ago
     Main PID: 9825 (alertmanager)
       CGroup: /system.slice/alertmanager.service
               └─9825 /alertmanager/alertmanager --config.file=/alertmanager/alertmanager.yml
    
    Aug 02 15:14:58 node1 systemd[1]: Started alertmanager.
    Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.005Z caller=main.go:221 msg="Starting Alertmanager" version="(version=0.22.2, branch=HEAD, revision=44f8adc06af5...8273f2922051)"
    Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.005Z caller=main.go:222 build_context="(go=go1.16.4, user=root@b595c7f32520, date=20210602-07:50:37)"
    Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.006Z caller=cluster.go:184 component=cluster msg="setting advertise address explicitly" addr=192.168.1.178 port=9094
    Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.009Z caller=cluster.go:671 component=cluster msg="Waiting for gossip to settle..." interval=2s
    Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.110Z caller=coordinator.go:113 component=configuration msg="Loading configuration file" file=/alertmanager/alertmanager.yml
    Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.111Z caller=coordinator.go:126 component=configuration msg="Completed loading of configuration file" file=/alert...ertmanager.yml
    Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.122Z caller=main.go:514 msg=Listening address=:9093
    Aug 02 15:14:59 node1 alertmanager[9825]: level=info ts=2021-08-02T07:14:59.122Z caller=tls_config.go:191 msg="TLS is disabled." http2=false
    Aug 02 15:15:01 node1 alertmanager[9825]: level=info ts=2021-08-02T07:15:01.009Z caller=cluster.go:696 component=cluster msg="gossip not settled" polls=0 before=0 now=1 elapsed=2.000791462s
    Hint: Some lines were ellipsized, use -l to show in full.

    访问192.168.1.178:9093可以看到告警web界面。

    修改prometheus的配置,让prometheus监控alertmanager。

    vi /prometheus/prometheus.yml
    尾部添加
      - job_name: 'alertmanager'
        static_configs:
          - targets: ['192.168.1.178:9093']

    修改prometheus的配置,让prometheus连接alertmanager。

    vi /prometheus/prometheus.yml
    修改
    # Alertmanager configuration
    alerting:
      alertmanagers:
      - static_configs:
        - targets:
          - 192.168.1.178:9093

      

    开启告警配置,这个是prometheus里边配置的。

    vi /prometheus/prometheus.yml
    修改
    # Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
    rule_files:
      - "redis.yml"

      

    redis.yml报警规则配置,一些阈值自己定义:

    vi /prometheus/redis.yml
    groups:
    - name:  Redis
      rules: 
        - alert: RedisDown
          expr: redis_up  == 0
          for: 5m
          labels:
            severity: error
          annotations:
            summary: "Redis down (instance {{ $labels.instance }})"
            description: "Redis 挂了啊,mmp
      VALUE = {{ $value }}
      LABELS: {{ $labels }}"
        - alert: MissingBackup
          expr: time() - redis_rdb_last_save_timestamp_seconds > 60 * 60 * 24
          for: 5m
          labels:
            severity: error
          annotations:
            summary: "Missing backup (instance {{ $labels.instance }})"
            description: "Redis has not been backuped for 24 hours
      VALUE = {{ $value }}
      LABELS: {{ $labels }}"       
        - alert: OutOfMemory
          expr: redis_memory_used_bytes / redis_total_system_memory_bytes * 100 > 90
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Out of memory (instance {{ $labels.instance }})"
            description: "Redis is running out of memory (> 90%)
      VALUE = {{ $value }}
      LABELS: {{ $labels }}"
        - alert: ReplicationBroken
          expr: delta(redis_connected_slaves[1m]) < 0
          for: 5m
          labels:
            severity: error
          annotations:
            summary: "Replication broken (instance {{ $labels.instance }})"
            description: "Redis instance lost a slave
      VALUE = {{ $value }}
      LABELS: {{ $labels }}"
        - alert: TooManyConnections
          expr: redis_connected_clients > 10
          for: 1m
          labels:
            severity: warning
          annotations:
            summary: "Too many connections (instance {{ $labels.instance }})"
            description: "Redis instance has too many connections
      VALUE = {{ $value }}
      LABELS: {{ $labels }}"       
        - alert: NotEnoughConnections
          expr: redis_connected_clients < 5
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "Not enough connections (instance {{ $labels.instance }})"
            description: "Redis instance should have more connections (> 5)
      VALUE = {{ $value }}
      LABELS: {{ $labels }}"
        - alert: RejectedConnections
          expr: increase(redis_rejected_connections_total[1m]) > 0
          for: 5m
          labels:
            severity: error
          annotations:
            summary: "Rejected connections (instance {{ $labels.instance }})"
            description: "Some connections to Redis has been rejected
      VALUE = {{ $value }}
      LABELS: {{ $labels }}"

    报警如下:

    grafana部署

    下载地址:https://grafana.com/grafana/download?edition=oss

    官方安装指南:
    https://grafana.com/docs/grafana/latest/installation/rpm/#2-start-the-server

    由于是rpm包,安装起来非常方便。

    依赖包缺啥安装啥。

    yum install -y fontconfig
    yum install -y urw-fonts
    rpm -ivh grafana-8.0.6-1.x86_64.rpm 

      

    设置开机自启动并开启grafana

    /bin/systemctl daemon-reload
    /bin/systemctl enable grafana-server.service
    /bin/systemctl start grafana-server.service
    
    [root@node1 soft]# which grafana-server
    /usr/sbin/grafana-server
    [root@node1 soft]# which grafana-cli
    /usr/sbin/grafana-cli

      

    查看状态

    [root@node1 soft]# systemctl status grafana-server
    ● grafana-server.service - Grafana instance
       Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; enabled; vendor preset: disabled)
       Active: active (running) since Thu 2021-07-29 15:58:38 CST; 4min 37s ago
         Docs: http://docs.grafana.org
     Main PID: 6884 (grafana-server)
       CGroup: /system.slice/grafana-server.service
               └─6884 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile=/var/run/grafana/grafana-server.pid --packaging=rpm cfg:default.paths.logs=/var/log/grafana cfg:default.paths.data=/var/lib/grafana cfg:default...
    
    Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="migrations completed" logger=migrator performed=330 skipped=0 duration=1.710091718s
    Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Created default admin" logger=sqlstore user=admin
    Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Created default organization" logger=sqlstore
    Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Starting plugin search" logger=plugins
    Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Registering plugin" logger=plugins id=grafana-plugin-admin-app
    Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Registering plugin" logger=plugins id=input
    Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="External plugins directory created" logger=plugins directory=/var/lib/grafana/plugins
    Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="Live Push Gateway initialization" logger=live.push_http
    Jul 29 15:58:38 node1 systemd[1]: Started Grafana instance.
    Jul 29 15:58:38 node1 grafana-server[6884]: t=2021-07-29T15:58:38+0800 lvl=info msg="HTTP Server Listen" logger=http.server address=[::]:3000 protocol=http subUrl= socket=

    访问192.168.1.178:3000就可以访问web版的。

    配置数据源。

      

    下载仪表盘:

    https://grafana.com/grafana/dashboards/763 --用这个
    https://grafana.com/grafana/dashboards/12980
    https://grafana.com/grafana/dashboards/12776

    导入仪表盘:
    要导入仪表板,请单击侧面菜单中的 + 图标,然后单击导入,选择数据源后确定。

    最终:

    注意:Memory Usage这个图表,一直是∞%。是因为redis_memory_max_bytes 获取的值为0,导致 redis_memory_used_bytes / redis_memory_max_bytes 结果不正常。

     解决办法:将redis_memory_max_bytes 改为服务器的真实内存大小。

    更改计算公式,其中8370298880为free -b显示的实际的物理内存大小:

    redis_memory_used_bytes{instance=~"$instance"}  / 8370298880

    参考链接:

    Prometheus 监控Redis的正确姿势(redis集群)

    Prometheus监控平台Alertmanager配置告警

    yam文本格式检测工具:http://www.bejson.com/validators/yaml_editor/

    https://www.cnblogs.com/biaopei/p/12096705.html

    https://www.jianshu.com/p/924cdd4e8603

  • 相关阅读:
    Vue 项目目录结构分析
    Vue 项目环境搭建
    Vue 组件
    Vue 指令
    Vue 实例成员
    Django 静态文件
    MySQL 索引详解
    Django 请求生命周期
    18. 4Sum (JAVA)
    16. 3Sum Closest (JAVA)
  • 原文地址:https://www.cnblogs.com/PiscesCanon/p/15088904.html
Copyright © 2011-2022 走看看