zoukankan      html  css  js  c++  java
  • Prometheus + Grafana(十三)系统监控之Cassandra

    前言

    利用jmx_exporter的方式对cassandra进行监控。

    配置JavaAgent

    cassandra 集群下的所有節點都要進行如下配置

    • 上传

    下载并上传jmx_prometheus_javaagent-0.12.0.jar安装包到cassandra集群$CASSANDRA_HOME/lib/目录下

    下载地址:https://github.com/prometheus/jmx_exporter/blob/master/README.md

    • 配置

    1、增加配置文件cassandra-jmx.yml到cassandra集群 conf/ 目錄下

    lowercaseOutputName: true
    lowercaseOutputLabelNames: true
    whitelistObjectNames: [
    "org.apache.cassandra.metrics:type=ColumnFamily,name=RangeLatency,*",
    "org.apache.cassandra.metrics:type=ColumnFamily,name=LiveSSTableCount,*",
    "org.apache.cassandra.metrics:type=ColumnFamily,name=SSTablesPerReadHistogram,*",
    "org.apache.cassandra.metrics:type=ColumnFamily,name=SpeculativeRetries,*",
    "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOnHeapSize,*",
    "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableSwitchCount,*",
    "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableLiveDataSize,*",
    "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableColumnsCount,*",
    "org.apache.cassandra.metrics:type=ColumnFamily,name=MemtableOffHeapSize,*",
    "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalsePositives,*",
    "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterFalseRatio,*",
    "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterDiskSpaceUsed,*",
    "org.apache.cassandra.metrics:type=ColumnFamily,name=BloomFilterOffHeapMemoryUsed,*",
    "org.apache.cassandra.metrics:type=ColumnFamily,name=SnapshotsSize,*",
    "org.apache.cassandra.metrics:type=ColumnFamily,name=TotalDiskSpaceUsed,*",
    "org.apache.cassandra.metrics:type=CQL,name=RegularStatementsExecuted,*",
    "org.apache.cassandra.metrics:type=CQL,name=PreparedStatementsExecuted,*",
    "org.apache.cassandra.metrics:type=Compaction,name=PendingTasks,*",
    "org.apache.cassandra.metrics:type=Compaction,name=CompletedTasks,*",
    "org.apache.cassandra.metrics:type=Compaction,name=BytesCompacted,*",
    "org.apache.cassandra.metrics:type=Compaction,name=TotalCompactionsCompleted,*",
    "org.apache.cassandra.metrics:type=ClientRequest,name=Latency,*",
    "org.apache.cassandra.metrics:type=ClientRequest,name=Unavailables,*",
    "org.apache.cassandra.metrics:type=ClientRequest,name=Timeouts,*",
    "org.apache.cassandra.metrics:type=Storage,name=Exceptions,*",
    "org.apache.cassandra.metrics:type=Storage,name=TotalHints,*",
    "org.apache.cassandra.metrics:type=Storage,name=TotalHintsInProgress,*",
    "org.apache.cassandra.metrics:type=Storage,name=Load,*",
    "org.apache.cassandra.metrics:type=Connection,name=TotalTimeouts,*",
    "org.apache.cassandra.metrics:type=ThreadPools,name=CompletedTasks,*",
    "org.apache.cassandra.metrics:type=ThreadPools,name=PendingTasks,*",
    "org.apache.cassandra.metrics:type=ThreadPools,name=ActiveTasks,*",
    "org.apache.cassandra.metrics:type=ThreadPools,name=TotalBlockedTasks,*",
    "org.apache.cassandra.metrics:type=ThreadPools,name=CurrentlyBlockedTasks,*",
    "org.apache.cassandra.metrics:type=DroppedMessage,name=Dropped,*",
    "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=HitRate,*",
    "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Hits,*",
    "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Requests,*",
    "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Entries,*",
    "org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Size,*",
    #"org.apache.cassandra.metrics:type=Streaming,name=TotalIncomingBytes,*",
    #"org.apache.cassandra.metrics:type=Streaming,name=TotalOutgoingBytes,*",
    "org.apache.cassandra.metrics:type=Client,name=connectedNativeClients,*",
    "org.apache.cassandra.metrics:type=Client,name=connectedThriftClients,*",
    "org.apache.cassandra.metrics:type=Table,name=WriteLatency,*",
    "org.apache.cassandra.metrics:type=Table,name=ReadLatency,*",
    "org.apache.cassandra.net:type=FailureDetector,*",
    ]
    #blacklistObjectNames: ["org.apache.cassandra.metrics:type=ColumnFamily,*"]
    rules:
      - pattern: org.apache.cassandra.metrics<type=(Connection|Streaming), scope=(S*), name=(S*)><>(Count|Value)
        name: cassandra_$1_$3
        labels:
          address: "$2"
      - pattern: org.apache.cassandra.metrics<type=(ColumnFamily), name=(RangeLatency)><>(Mean)
        name: cassandra_$1_$2_$3
      - pattern: org.apache.cassandra.net<type=(FailureDetector)><>(DownEndpointCount)
        name: cassandra_$1_$2
      - pattern: org.apache.cassandra.metrics<type=(Keyspace), keyspace=(S*), name=(S*)><>(Count|Mean|95thPercentile)
        name: cassandra_$1_$3_$4
        labels:
          "$1": "$2"
      - pattern: org.apache.cassandra.metrics<type=(Table), keyspace=(S*), scope=(S*), name=(S*)><>(Count|Mean|95thPercentile)
        name: cassandra_$1_$4_$5
        labels:
          "keyspace": "$2"
          "table": "$3"
      - pattern: org.apache.cassandra.metrics<type=(ClientRequest), scope=(S*), name=(S*)><>(Count|Mean|95thPercentile)
        name: cassandra_$1_$3_$4
        labels:
          "type": "$2"
      - pattern: org.apache.cassandra.metrics<type=(S*)(?:, ((?!scope)S*)=(S*))?(?:, scope=(S*))?,
          name=(S*)><>(Count|Value)
        name: cassandra_$1_$5
        labels:
          "$1": "$4"
          "$2": "$3"
    View Code

    2、修改cassandra配置文件  conf/cassandra-env.sh,

    增加javaagent :

    JVM_OPTS="$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/jamm-0.3.0.jar -javaagent:$CASSANDRA_HOME/lib/jmx_prometheus_javaagent-0.12.0.jar=7070:${CASSANDRA_HOME}/conf/cassandra-jmx.yml"

    注:7070端口就是给promephues收集信息的端口

    • 启动

    重啟cassandra 服務,启动成功后,可以访问 http://10.x.xx.100:7070/metrics/ ,(IP和端口要改成相应环境的)

    看抓取的信息如下:

     

    Prometheus配置

    • 配置

    修改prometheus组件的prometheus.yml加入cassandra监控:

    vi /usr/local/prometheus-2.15.1/prometheus.yml

     

    • 启动验证

    先kill掉Prometheus进程,用以下命令重启它,然后查看targets:

    cd /usr/local/prometheus-2.15.1
    nohup ./prometheus --config.file=prometheus.yml &

     

    注:State=UP,说明成功

    Grafana配置

    • 导入仪表盘模板

    导入 https://grafana.com/dashboards/5408 仪表盘,再结合自身业务修改过的最终仪表盘:

     

    这里需要注意下,grafana的cassandra metric dashboard的json(https://grafana.com/grafana/dashboards/5408)有一些不正确的地方,需要人为修改下。

    • 预警指标

    序号

    预警名称

    预警规则

    描述

    1

    内存预警

    当内存使用达到阈值【>80%】时进行预警

    2

    Gc耗时预警

    当Gc耗时达到阈值【>0.3s】时进行预警

    3

    Gc次数预警

    当每秒Gc次数达到阈值【>5】时进行预警

  • 相关阅读:
    Getting Started with LINQ in C# 章节概况
    LA 2572 Viva Confetti (Geometry.Circle)
    uva 10652 Board Wrapping (Convex Hull, Easy)
    poj 2743 && LA 3403 Mobile Computing (mideasy Search)
    poj 3525 Most Distant Point from the Sea (DC2 + Half Plane)
    poj 3134 && LA 3621 Power Calculus (迭代加深深度优先搜索)
    LA 4728 Squares (二维凸包+旋转卡壳)
    uva 10256 The Great Divide (Convex Hull, Simple)
    hdu 2454 Degree Sequence of Graph G
    poj 1041 John's trip (Euler Circuit)
  • 原文地址:https://www.cnblogs.com/caoweixiong/p/12736815.html
Copyright © 2011-2022 走看看