zoukankan      html  css  js  c++  java
  • elasticsearch数据过期删除处理

    一、概述

    使用elasticsearch收集日志进行处理,时间久了,很老的数据就没用了或者用途不是很大,这个时候就要对过期数据进行清理.这里介绍两种方式清理这种过期的数据。

    1、curator

    关于版本:

     

    安装:

    https://www.elastic.co/guide/en/elasticsearch/client/curator/current/installation.html

    我使用的是ubuntu系统,所以参考的是https://www.elastic.co/guide/en/elasticsearch/client/curator/current/apt-repository.html

    wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
    
    vim  /etc/apt/sources.list.d/curator.list
    deb [arch=amd64] https://packages.elastic.co/curator/5/debian stable main
    
    sudo apt-get update && sudo apt-get install elasticsearch-curator

     我使用的是elasticsearch-6.5.1,所以安装的是curator5.

    安装完成后会生成两个命令:curator、curator_cli,这里我们只先用到curator。

    需要创建配置文件:有两个文件一个是config、一个是action

    mkdir  {/etc/curator,/data/curator}

    config:

    # cat config_file.yml
    client:
      hosts:
        - 127.0.0.1
      port: 9200
      url_prefix:
      use_ssl: False
      certficate:
      client_cert:
      client_key:
      ssl_no_validate: False
      http_auth:
      timeout:
      master_only: true
    logging:
      loglevel: INFO
      logfile: "/data/curator/action.log"
      logformat: default

    action:

    # cat action_file.yml

    ---
    actions:
      1:
        action: delete_indices
        description: >-
          Delete indices older than 7 days (based on index name), for logstash-
          prefixed indices. Ignore the error if the filter does not result in an
          actionable list of indices (ignore_empty_list) and exit cleanly.
        options:
          ignore_empty_list: True
          timeout_override:
          continue_if_exception: False
          disable_action: False
        filters:
        - filtertype: pattern
          kind: regex
          value: '^apm-6.5.1-transaction-|^apm-6.5.1-span-'
          exclude:
        - filtertype: age
          source: name
          direction: older
          timestring: '%Y.%m.%d'
          unit: days
          unit_count: 15
          exclude:
    
      2:
        action: delete_indices
        description: >-
          Delete indices older than 7 days (based on index name), for logstash-
          prefixed indices. Ignore the error if the filter does not result in an
          actionable list of indices (ignore_empty_list) and exit cleanly.
        options:
          ignore_empty_list: True
          timeout_override:
          continue_if_exception: False
          disable_action: False
        filters:
        - filtertype: pattern
          kind: prefix
          value: loadbalance-api-
          exclude:
        - filtertype: age
          source: name
          direction: older
          timestring: '%Y-%m-%d'
          unit: days
          unit_count: 20
          exclude:

     

    ---
    actions:
      1:
        action: delete_indices
        description: >-
          Delete indices older than 7 days (based on index name), for logstash-
          prefixed indices. Ignore the error if the filter does not result in an
          actionable list of indices (ignore_empty_list) and exit cleanly.
        options:
          ignore_empty_list: True
          timeout_override:
          continue_if_exception: False
          disable_action: False
        filters:
        - filtertype: pattern
          kind: regex
          value: 'fluentd-k8s-(2019.02.11|2019.02.12)$'
          exclude: true
        - filtertype: pattern
          kind: prefix
          value: fluentd-k8s-
          exclude:
        - filtertype: age
          source: name
          direction: older
          timestring: '%Y.%m.%d'
          unit: days
          unit_count: 15
          exclude:

     可以设置多个action,每个都以不同的数字分割,使用不同的清理策略,具体可以参考https://www.elastic.co/guide/en/elasticsearch/client/curator/5.6/actions.html

    注意自己的index的格式,比如我这里的时间格式有两种:

    注意匹配,否则那个action就返回空列表,从而不会删除。

    这个历史数据重要的会先落地到hdfs,然后在删除。这个日期根据自己服务器的磁盘和日志的重要性自己规划。重要的比如双11的数据不想删除,想留下来可以写到exclude里面,

    或者做一个snapshot备份。接下来设置一个定时任务去删除就好了。

    crontab -e
    *  *  */25 * *  curator --config /etc/curator/config_file.yml  /etc/curator/action_file.yml

     

    2、使用脚本删除

     

    # cat es-dele-indices.sh
    #!/bin/bash
    #delete elasticsearch indices
    searchIndex=fluentd-k8s
    elastic_url=127.0.0.1
    elastic_port=9200
    
    date2stamp(){
      date --utc --date "$1" +%s
    }
    
    dateDiff(){
      case $1 in
        -s)  sec=1;     shift;;
        -m)  sec=60;    shift;;
        -h)  sec=3600;  shift;;
        -d)  sec=86400; shift;;
         *)  sec=86400; shift;;
      esac
      dte1=$(date2stamp $1)
      dte2=$(date2stamp $2)
      diffSec=$((dte2-dte1))
      if ((diffSec < 0)); then abs=-1; else abs=1; fi
      echo $((diffSec/sec*abs))
    }
    
    for index in $(curl -s "${elastic_url}:${elastic_port}/_cat/indices?v" | grep -E " ${searchIndex}-20[0-9][0-9].[0-1][0-9].[0-3][0-9]" | awk '{     print $3 }');do
      date=$(echo ${index: -10}|sed 's/./-/g')
      cond=$(date +%Y-%m-%d)
      diff=$(dateDiff -d $date $cond)
      echo -n "${index} (${diff})"
      if [ $diff -gt 1 ]; then
        #echo "/ DELETE"
        curl -XDELETE "${elastic_url}:${elastic_port}/${index}?pretty"
      else
        echo ""
      fi
    done

     

  • 相关阅读:
    break-continue
    函数定义
    函数类型
    为何要继承SpringBootServletInitializer,为何要实现configure这方法
    查询一个表中的两个字段值相同的数据
    数据库中查出来的时间多8小时&查询数据正常展示少8小时
    @JsonFormat与@DateTimeFormat注解的使用
    用js获取当前月份的天数
    js获取当前年,月,日,时,分,秒
    maven配置和安装
  • 原文地址:https://www.cnblogs.com/cuishuai/p/10009091.html
Copyright © 2011-2022 走看看