zoukankan      html  css  js  c++  java
  • elasticsearch 索引 red 状态恢复 green

    方案一

    找到状态为 red 的索引

    curl -X GET "http://172.xxx.xxx.174:9288/_cat/indices?v="
    
    red    open   index                          5   1    3058268        97588      2.6gb          1.3gb
    

    状态为 red 是无法对外提供服务的,说明有主节点没有分配到对应的机子上。

    找到 UNASSIGNED 节点

    _cat/shards 能够看到节点的分配情况

    curl -X GET "http://172.xxx.xxx.174:9288/_cat/shards"
    
    index                            shard prirep state        docs   store   ip             node         
    index                      1    p     STARTED     764505 338.6mb 172.xxx.xxx.174 Calypso      
    index                      1    r     STARTED     764505 338.6mb 172.xxx.xxx.89  Savage Steel
    index                      2    p     STARTED     763750 336.6mb 172.xxx.xxx.174 Calypso      
    index                      2    r     STARTED     763750 336.6mb 172.xxx.xxx.88  Temugin      
    index                      3    p     STARTED     764537 340.2mb 172.xxx.xxx.89  Savage Steel
    index                      3    r     STARTED     764537 340.2mb 172.xxx.xxx.88  Temugin      
    index                      4    p     STARTED     765476 339.3mb 172.xxx.xxx.89  Savage Steel
    index                      4    r     STARTED     765476 339.3mb 172.xxx.xxx.88  Temugin      
    index                      0    p     UNASSIGNED                                             
    index                      0    r     UNASSIGNED    
    

    index 有一个主节点 0 和一个副本 0 处于 UNASSIGNED 状态,也就是没有分配到机子上,因为主节点没有分配到机子上,所以状态为 red
    ip 列可以看出一共有三台机子,尾数分别为 17489 以及 88。一共有 10index 所以对应的 elasticsearchindex.number_of_shards: 5index.number_of_replicas: 1。一共有 10 个分片,可以按照 3,3,4 这样分配到三台不同的机子上。8889 机子都分配多个节点,所以可以将另外一个主节点分配到 174 机子上。

    找出机子的 id

    找到 174 机子对应的 id,后续重新分配主节点得要用到

    curl -X GET "http://172.xxx.xxx.174:9288/_nodes/process?v="
    {
      "cluster_name": "es2.3.2-titan-cl",
      "nodes": {
        "Leivp0laTYSqvMVm49SulQ": {
          "name": "Calypso",
          "transport_address": "172.xxx.xxx.174:9388",
          "host": "172.xxx.xxx.174",
          "ip": "172.xxx.xxx.174",
          "version": "2.3.2",
          "build": "b9e4a6a",
          "http_address": "172.xxx.xxx.174:9288",
          "process": {
            "refresh_interval_in_millis": 1000,
            "id": 32130,
            "mlockall": false
          }
        },
        "EafIS3ByRrm4g-14KmY_wg": {
          "name": "Savage Steel",
          "transport_address": "172.xxx.xxx.89:9388",
          "host": "172.xxx.xxx.89",
          "ip": "172.xxx.xxx.89",
          "version": "2.3.2",
          "build": "b9e4a6a",
          "http_address": "172.xxx.xxx.89:9288",
          "process": {
            "refresh_interval_in_millis": 1000,
            "id": 7560,
            "mlockall": false
          }
        },
        "tojQ9EiXS0m6ZP16N7Ug3A": {
          "name": "Temugin",
          "transport_address": "172.xxx.xxx.88:9388",
          "host": "172.xxx.xxx.88",
          "ip": "172.xxx.xxx.88",
          "version": "2.3.2",
          "build": "b9e4a6a",
          "http_address": "172.xxx.xxx.88:9288",
          "process": {
            "refresh_interval_in_millis": 1000,
            "id": 47701,
            "mlockall": false
          }
        }
      }
    }
    

    174 机子对应的 idLeivp0laTYSqvMVm49SulQ

    为了简单也可以直接将该主分片放到 master 机子上,但是如果节点过于集中肯定会影响性能,同时会影响宕机后数据丢失的可能性,所以建议根据机子目前节点的分布情况重新分配。

    curl -X GET "http://172.xxx.xxx.174:9288/_cat/master?v="
    id                     host          ip            node         
    EafIS3ByRrm4g-14KmY_wg 172.xxx.xxx.89 172.xxx.xxx.89 Savage Steel
    

    分配 UNASSIGNED 节点到机子

    得要找到 UNASSIGNED 状态的主分片才能够重新分配,如果重新分配不是 UNASSIGNED 状态的主分片,例如我视图重新分配 shard 1 会出现如下的错误。

    curl -X POST -d '{
        "commands" : [ {
          "allocate" : {
              "index" : "index",
              "shard" : 1,
              "node" : "EafIS3ByRrm4g-14KmY_wg",
              "allow_primary" : true
          }
        }]
    }' "http://172.xxx.xxx.174:9288/_cluster/reroute"
    
    {
      "error": {
        "root_cause": [
          {
            "type": "remote_transport_exception",
            "reason": "[Savage Steel][172.xxx.xxx.89:9388][cluster:admin/reroute]"
          }
        ],
        "type": "illegal_argument_exception",
        "reason": "[allocate] failed to find [index][1] on the list of unassigned shards"
      },
      "status": 400
    }
    

    重新分配 index shard 0 到某一台机子。_cluster/reroute 的参数 allow_primary 得要小心,有概率会导致数据丢失。具体的看看官方文档该接口的说明吧。

    curl -X POST -d '{
        "commands" : [ {
          "allocate" : {
              "index" : "index",
              "shard" : 0,
              "node" : "Leivp0laTYSqvMVm49SulQ",
              "allow_primary" : true
          }
        }]
    }' "http://172.xxx.xxx.174:9288/_cluster/reroute"
    
    {
      "acknowledged": true,
      .........
      "index": {
        "shards": {
          "0": [
            {
              "state": "INITIALIZING",
              "primary": true,
              "node": "Leivp0laTYSqvMVm49SulQ",
              "relocating_node": null,
              "shard": 0,
              "index": "index",
              "version": 1,
              "allocation_id": {
                "id": "wk5q0CryQpmworGFalfWQQ"
              },
              "unassigned_info": {
                "reason": "INDEX_CREATED",
                "at": "2017-03-23T12:27:33.405Z",
                "details": "force allocation from previous reason INDEX_REOPENED, null"
              }
            },
            {
              "state": "UNASSIGNED",
              "primary": false,
              "node": null,
              "relocating_node": null,
              "shard": 0,
              "index": "index",
              "version": 1,
              "unassigned_info": {
                "reason": "INDEX_REOPENED",
                "at": "2017-03-23T11:56:25.568Z"
              }
            }
          ]
          }
        }
        .............
    }
    

    输出结果只罗列出了关键部分,主节点处于 INITIALIZING 状态,在看看索引的状态

    curl -X GET "http://172.xxx.xxx.174:9288/_cat/indices?v="
    
    green  open   index                          5   1    3058268        97588      2.6gb          1.3gb
    

    索引状态已经为 green,恢复正常使用。

    以上参考 ELASTICSEARCH几个问题的解决

    方案二

    导致集群变red,很可能是因为集群中有机子宕机了,其中一部分数据没有同步完成,因此将之前宕机的机子起来,和现有集群同步完成,集群也就恢复了。
    另外也可以找一台空的机子,与现有的机子组成集群,索引会自动平衡,如果集群没有数据丢失,也是可以将集群恢复正常。

    欢迎转载,但请注明本文链接,谢谢你。
    2017.3.24 12:15

  • 相关阅读:
    Unity WebGL打包发布报错
    Makefile:4: *** missing separator. Stop.
    Unity使用VSCode没有代码提示/代码无法折叠
    Unreal Engine is exiting due to D3D device being lost
    使用Doxygen生成UE4的chm格式API文档
    'UTextRenderComponent::SetText': Passing text as FString is deprecated, please use FText instead (likely via a LOCTEXT)
    f4v格式视频播放失败
    Unity自定义Button组件Transition
    mysql安装步骤
    ansible 安装
  • 原文地址:https://www.cnblogs.com/xiaoheike/p/6610848.html
Copyright © 2011-2022 走看看