zoukankan      html  css  js  c++  java
  • kubernetes之收集集群的events,监控集群行为

    一、概述

    线上部署的k8s已经扛过了双11的洗礼,期间先是通过对网络和监控的优化顺利度过了双11并且表现良好。先简单介绍一下我们kubernetes的使用方式:

        物理机系统:Ubuntu-16.04(kernel 升级到4.17)

        kuberneets-version:1.13.2

        网络组件:calico(采用的是BGP模式+bgp reflector)

        kube-proxy:使用的是ipvs模式

        监控:prometheus+grafana

        日志: fluentd + ES

        metrics: metrics-server

        HPA:cpu + memory

        告警:钉钉

        CI/CD: gitlab-ci/gitlab-runner

        应用管理工具:helm、chartmuseum(不建议直接使用helm,helm charts可读性很差,学习成本较高)

        由于k8s、物理环境共存,需要打通通网络提供访问:kube-gateway

    有的地方涉及到公司内部的东西不方便写出来,但是绝大部分在我之前的博客都有介绍,有兴趣的可以参考一下。

     

    自己的反思:

    开始的时候,k8s集群在线上跑了一段时间,但是我发现我对集群内部的变化没有办法把控的很清楚,比如某个pod被重新调度了、某个node节点上的imagegc失败了、某个hpa被触发了等等,而这些都是可以通过events拿到的,但是events并不是永久存储的,它包含了集群各种资源的状态变化,所以我们可以通过收集分析events来了解整个集群内部的变化,经过一番探索找到一个开源的eventrouter来收集events事件,经过一些改造使其符合我们的业务场景,更名为eventrouter-kafka(https://github.com/cuishuaigit/eventrouter-kafka)直接将修改配置直传kafka,而不是需要各种配置,感觉原版的配置有些繁琐不是很好用,而我们的日志也是走kafka队列的,减轻ES的写压力。现在的events收集流程:

    eventrouter---->kafka---->logstash(过滤、解析)----->ES------elastalert---->钉钉

    经过添加上面的收集events使k8s集群又完善了一步。

     

    二、简述流程

    1、部署eventrouter

    eventrouter是使用golang写的,可以根据自己的需求二次开发,部署很简单,参考:https://github.com/cuishuaigit/eventrouter-kafka。这里就不细述了。

     

    2、kafka集群

    参考:https://github.com/cuishuaigit/k8s-kafka

     

    3、logstash

    现在相应版本的logstash,下载地址:https://www.elastic.co/guide/en/logstash/6.5/installing-logstash.html

    然后进行配置,这里贴一下我的测试配置:

    input{
       kafka{
          bootstrap_servers => ["kafka-0.kafka-svc.kafka.svc.cluster.local:9092,kafka-1.kafka-svc.kafka.svc.cluster.local:9092,kafka-2.kafka-svc.kafka.svc.cluster.local:9092"]
          client_id => "eventrouter-prod"
          #auto_offset_reset => "latest"
          group_id => "eventrouter"
          consumer_threads => 2
          #decorate_events  => true
          id => "eventrouter"
          topics => ["eventrouter"]
    }
    }
    
    
    filter {
      if [message] =~ 'DNSConfigForming' {
         drop { }
      }
      json {
        source => "message"
      }
      mutate {
        remove_field => [ "message","old_event" ]
    }
    }
    
    
    output{
     elasticsearch {
                            hosts => "10.4.9.28:9200"
                            index => "eventrouter-%{+YYYY-MM-dd}"
                     }
    }

     

    4、ES

    version: '2'
    services:
      elasticsearch:
        image: docker.elastic.co/elasticsearch/elasticsearch:6.5.1
        container_name: elasticsearch
        environment:
          - cluster.name=docker-cluster
          - bootstrap.memory_lock=true
          - "ES_JAVA_OPTS=-Xms4096m -Xmx4096m"
    
        ulimits:
          memlock:
            soft: -1
            hard: -1
        volumes:
          - /data/es1:/usr/share/elasticsearch/data
          - /data/backups:/usr/share/elasticsearch/backups
          - /data/longterm_backups:/usr/share/elasticsearch/longterm_backups
          - ./config/jvm.options:/usr/share/elasticsearch/config/jvm.options
        ports:
          - "9200:9200"
        networks:
          - esnet
    #  elasticsearch2:
    #    image: docker.elastic.co/elasticsearch/elasticsearch:6.5.1
    #    container_name: elasticsearch2
    #    environment:
    #      - cluster.name=docker-cluster
    #      - bootstrap.memory_lock=true
    #      - "ES_JAVA_OPTS=-Xms512m -Xmx512m"
    #      - "discovery.zen.ping.unicast.hosts=elasticsearch"
    #    ulimits:
    #      memlock:
    #        soft: -1
    #        hard: -1
    #    volumes:
    #      - /data/es2:/usr/share/elasticsearch/data
    #    networks:
    #      - esnet
      kibana:
        image: docker.elastic.co/kibana/kibana:6.5.1
        container_name: kibana
        environment:
          SERVER_NAME: kibana
          SERVER_HOST: "0.0.0.0"
          ELASTICSEARCH_URL: http://elasticsearch:9200
          XPACK_MONITORING_UI_CONATINER_ELASTICSEARCH_ENABLED: "true"
        volumes:
          - /data/plugin:/usr/share/kibana/plugin
          - /tmp/:/etc/archives
        ports:
          - "5601:5601"
        networks:
          - esnet
        depends_on:
          - elasticsearch
    networks:
     esnet:
       driver: bridge

     

    cat config/jvm.properties

    ## JVM configuration
    
    ################################################################
    ## IMPORTANT: JVM heap size
    ################################################################
    ##
    ## You should always set the min and max JVM heap
    ## size to the same value. For example, to set
    ## the heap to 4 GB, set:
    ##
    ## -Xms4g
    ## -Xmx4g
    ##
    ## See https://www.elastic.co/guide/en/elasticsearch/reference/current/heap-size.html
    ## for more information
    ##
    ################################################################
    
    # Xms represents the initial size of total heap space
    # Xmx represents the maximum size of total heap space
    
    -Xms2g
    -Xmx2g
    
    ################################################################
    ## Expert settings
    ################################################################
    ##
    ## All settings below this section are considered
    ## expert settings. Don't tamper with them unless
    ## you understand what you are doing
    ##
    ################################################################
    
    ## GC configuration
    -XX:+UseConcMarkSweepGC
    -XX:CMSInitiatingOccupancyFraction=75
    -XX:+UseCMSInitiatingOccupancyOnly
    
    ## G1GC Configuration
    # NOTE: G1GC is only supported on JDK version 10 or later.
    # To use G1GC uncomment the lines below.
    # 10-:-XX:-UseConcMarkSweepGC
    # 10-:-XX:-UseCMSInitiatingOccupancyOnly
    # 10-:-XX:+UseG1GC
    # 10-:-XX:InitiatingHeapOccupancyPercent=75
    
    ## optimizations
    
    # pre-touch memory pages used by the JVM during initialization
    -XX:+AlwaysPreTouch
    
    ## basic
    
    # explicitly set the stack size
    -Xss1m
    
    # set to headless, just in case
    -Djava.awt.headless=true
    
    # ensure UTF-8 encoding by default (e.g. filenames)
    -Dfile.encoding=UTF-8
    
    # use our provided JNA always versus the system one
    -Djna.nosys=true
    
    # turn off a JDK optimization that throws away stack traces for common
    # exceptions because stack traces are important for debugging
    -XX:-OmitStackTraceInFastThrow
    
    # flags to configure Netty
    -Dio.netty.noUnsafe=true
    -Dio.netty.noKeySetOptimization=true
    -Dio.netty.recycler.maxCapacityPerThread=0
    
    # log4j 2
    -Dlog4j.shutdownHookEnabled=false
    -Dlog4j2.disable.jmx=true
    
    -Djava.io.tmpdir=${ES_TMPDIR}
    
    ## heap dumps
    
    # generate a heap dump when an allocation from the Java heap fails
    # heap dumps are created in the working directory of the JVM
    -XX:+HeapDumpOnOutOfMemoryError
    
    # specify an alternative path for heap dumps; ensure the directory exists and
    # has sufficient space
    -XX:HeapDumpPath=data
    
    # specify an alternative path for JVM fatal error logs
    -XX:ErrorFile=logs/hs_err_pid%p.log
    
    ## JDK 8 GC logging
    
    8:-XX:+PrintGCDetails
    8:-XX:+PrintGCDateStamps
    8:-XX:+PrintTenuringDistribution
    8:-XX:+PrintGCApplicationStoppedTime
    8:-Xloggc:logs/gc.log
    8:-XX:+UseGCLogFileRotation
    8:-XX:NumberOfGCLogFiles=32
    8:-XX:GCLogFileSize=64m
    
    # JDK 9+ GC logging
    9-:-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m
    # due to internationalization enhancements in JDK 9 Elasticsearch need to set the provider to COMPAT otherwise
    # time/date parsing will break in an incompatible way for some date patterns and locals
    9-:-Djava.locale.providers=COMPAT
    
    # temporary workaround for C2 bug with JDK 10 on hardware with AVX-512
    10-:-XX:UseAVX=2

     

    5、elastalert

    部署参考https://github.com/Yelp/elastalert.git

    使用:

    mkdir  /etc/elastalert

    将clone的elastalert目录下面的config.yaml.example拷贝到上面创建的目录里面:

    cpoy  elastalert/config.yaml.example     /etc/elastalert/config.yaml

    只需要修改:

    rules_folder、es_host、es_port,如果设置了用户密码,还需要修改。

     

    创建rules

    mkdir /etc/elastalert/rules
    

     

    6、钉钉

    创建机器人参考我其他的博客,获取token,下载钉钉plugin, https://github.com/xuyaoqiang/elastalert-dingtalk-plugin

    将elastalert_modules拷贝到/etc/elastalert目录下面

    cp  -r elastalert-dingtalk-plugin/elastalert_modules   /etc/elastalert/elastalert

     

    rules example

    # Alert when the rate of events exceeds a threshold
    
    # (Optional)
    # Elasticsearch host
    es_host: 10.2.9.28
    
    # (Optional)
    # Elasticsearch port
    es_port: 9200
    
    # (OptionaL) Connect with SSL to Elasticsearch
    #use_ssl: True
    
    # (Optional) basic-auth username and password for Elasticsearch
    #es_username: someusername
    #es_password: somepassword
    
    # (Required)
    # Rule name, must be unique
    name: Other event frequency rule
    
    # (Required)
    # Type of alert.
    # the frequency rule type alerts when num_events events occur with timeframe time
    type: frequency
    
    # (Required)
    # Index to search, wildcard supported
    index: eventrouter-*
    
    # (Required, frequency specific)
    # Alert when this many documents matching the query occur within a timeframe
    num_events: 5
    
    # (Required, frequency specific)
    # num_events must occur within this amount of time to trigger an alert
    timeframe:
      #hours: 4
      minutes: 15
    # (Required)
    # A list of Elasticsearch filters used for find events
    # These filters are joined with AND and nested in a filtered query
    # For more info: http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html
    filter:
    #- term:
    #    some_field: "some_value"
    - query:
        query_string:
          query: "event.type: Warning NOT event.involvedObject.kind: Node"
    # (Required)
    # The alert is use when a match is found
    
    #smtp_host: smtp.exmail.qq.com
    #smtp_port: 25
    #smtp_auth_file: /etc/elastalert/smtp_auth_file.yaml
    #email_reply_to: ci@qq.com
    #from_addr: ci@qq.com
    realert:
      minutes: 5
    exponential_realert:
      hours: 1
    
    alert:
    #- "email"
    - "elastalert_modules.dingtalk_alert.DingTalkAlerter"
    dingtalk_webhook: "https://oapi.dingtalk.com/robot/send?access_token=47194e6904c6e3133a9080980984444c8e5d7745e1f76c12cefa99c8c8ac718dd88d4c"
    dingtalk_msgtype: "text"
    
    alert_text_type: alert_text_only
    alert_text: "
       ====elastalert message====
    
    
       EventTime>>:  {0}
    
    
       Event_involvedObject_name>>:  {1}
    
    
       Event_involvedObject_kind>>:  {2}
    
    
       Event_involvedObject_namespace>>:  {3}
    
    
    
       Message>>:  {4}
    
    
       Event_reason>>: {5}
    
    
       verb>>: {6}
    "
    
    alert_text_args:
    - "@timestamp"
    - event.involvedObject.name
    - event.source.component
    - event.involvedObject.namespace
    - event.message
    - event.reason
    - verb
    # (required, email specific)
    # a list of email addresses to send alerts to
    #email:
    #- "ci@qq.com"

     

    自己定制的告警消息格式:

    alert:
    #- "email"
    - "elastalert_modules.dingtalk_alert.DingTalkAlerter"
    dingtalk_webhook: "https://oapi.dingtalk.com/robot/send?access_token=47194e6904c6e3133a9080980984444c8e5d7745e1f76c12cefa99c8c8ac718dd88d4c"
    dingtalk_msgtype: "text"
    
    alert_text_type: alert_text_only
    alert_text: "
       ====elastalert message====
    
    
       EventTime>>:  {0}
    
    
       Event_involvedObject_name>>:  {1}
    
    
       Event_involvedObject_kind>>:  {2}
    
    
       Event_involvedObject_namespace>>:  {3}
    
    
    
       Message>>:  {4}
    
    
       Event_reason>>: {5}
    
    
       verb>>: {6}
    "
    
    alert_text_args:
    - "@timestamp"
    - event.involvedObject.name
    - event.source.component
    - event.involvedObject.namespace
    - event.message
    - event.reason
    - verb

    详细信息参考官网:https://elastalert.readthedocs.io/en/latest/recipes/writing_filters.html#writingfilters

     

     

  • 相关阅读:
    ajax相关知识总结
    http协议
    sass基础常用指南
    自定义上传图片样式并实现上传立即展示该图片
    HTML5 History 模式
    网页打印样式CSS
    session和cookie相关知识总结
    第二个冲刺周期
    软件工程学习进度表(第十三周)
    软件工程学习进度表(第十二周)
  • 原文地址:https://www.cnblogs.com/cuishuai/p/10573586.html
Copyright © 2011-2022 走看看