zoukankan      html  css  js  c++  java
  • Hbase合并Region的过程中出现永久RIT的解决

    在合并Region的过程中出现永久RIT怎么办?笔者在生产环境中就遇到过这种情况,在批量合并Region的过程中,出现了永久MERGING_NEW的情况,虽然这种情况不会影响现有集群的正常的服务能力,但是如果集群有某个节点发生重启,那么可能此时该RegionServer上的Region是没法均衡的。因为在RIT状态时,HBase是不会执行Region负载均衡的,即使手动执行balancer命令也是无效的。

    如果不解决这种RIT情况,那么后续有HBase节点相继重启,这样会导致整个集群的Region验证不均衡,这是很致命的,对集群的性能将会影响很大。经过查询HBase JIRA单,发现这种MERGING_NEW永久RIT的情况是触发了HBASE-17682的BUG,需要打上该Patch来修复这个BUG,其实就是HBase源代码在判断业务逻辑时,没有对MERGING_NEW这种状态进行判断,直接进入到else流程中了。源代码如下:

    for (RegionState state : regionsInTransition.values()) {
            HRegionInfo hri = state.getRegion();
            if (assignedRegions.contains(hri)) {
              // Region is open on this region server, but in transition.
              // This region must be moving away from this server, or splitting/merging.
              // SSH will handle it, either skip assigning, or re-assign.
              LOG.info("Transitioning " + state + " will be handled by ServerCrashProcedure for " + sn);
            } else if (sn.equals(state.getServerName())) {
              // Region is in transition on this region server, and this
              // region is not open on this server. So the region must be
              // moving to this server from another one (i.e. opening or
              // pending open on this server, was open on another one.
              // Offline state is also kind of pending open if the region is in
              // transition. The region could be in failed_close state too if we have
              // tried several times to open it while this region server is not reachable)
              if (state.isPendingOpenOrOpening() || state.isFailedClose() || state.isOffline()) {
                LOG.info("Found region in " + state +
                  " to be reassigned by ServerCrashProcedure for " + sn);
                rits.add(hri);
              } else if(state.isSplittingNew()) {
                regionsToCleanIfNoMetaEntry.add(state.getRegion());
              } else {
                LOG.warn("THIS SHOULD NOT HAPPEN: unexpected " + state);
              }
            }
          }

    修复之后代码:

    for (RegionState state : regionsInTransition.values()) {
            HRegionInfo hri = state.getRegion();
            if (assignedRegions.contains(hri)) {
              // Region is open on this region server, but in transition.
              // This region must be moving away from this server, or splitting/merging.
              // SSH will handle it, either skip assigning, or re-assign.
              LOG.info("Transitioning " + state + " will be handled by ServerCrashProcedure for " + sn);
            } else if (sn.equals(state.getServerName())) {
              // Region is in transition on this region server, and this
              // region is not open on this server. So the region must be
              // moving to this server from another one (i.e. opening or
              // pending open on this server, was open on another one.
              // Offline state is also kind of pending open if the region is in
              // transition. The region could be in failed_close state too if we have
              // tried several times to open it while this region server is not reachable)
              if (state.isPendingOpenOrOpening() || state.isFailedClose() || state.isOffline()) {
                LOG.info("Found region in " + state +
                  " to be reassigned by ServerCrashProcedure for " + sn);
                rits.add(hri);
              } else if(state.isSplittingNew()) {
                regionsToCleanIfNoMetaEntry.add(state.getRegion());
              } else if (isOneOfStates(state, State.SPLITTING_NEW, State.MERGING_NEW)) {
                 regionsToCleanIfNoMetaEntry.add(state.getRegion());
               }else {
                LOG.warn("THIS SHOULD NOT HAPPEN: unexpected " + state);
              }
            }
          }

    但是,这里有一个问题,目前该JIRA单只是说了需要去修复BUG,打Patch。但是,实际生产情况下,面对这种RIT情况,是不可能长时间停止集群,影响应用程序读写的。那么,有没有临时的解决办法,先临时解决当前的MERGING_NEW这种永久RIT,之后在进行HBase版本升级操作。

    办法是有的,在分析了MERGE合并的流程之后,发现HBase在执行Region合并时,会先生成一个初始状态的MERGING_NEW。整个Region合并流程如下:

    从流程图中可以看到,MERGING_NEW是一个初始化状态,在Master的内存中,而处于Backup状态的Master内存中是没有这个新Region的MERGING_NEW状态的,那么可以通过对HBase的Master进行一个主备切换,来临时消除这个永久RIT状态。而HBase是一个高可用的集群,进行主备切换时对用户应用来说是无感操作。因此,面对MERGING_NEW状态的永久RIT可以使用对HBase进行主备切换的方式来做一个临时处理方案。之后,我们在对HBase进行修复BUG,打Patch进行版本升级。

  • 相关阅读:
    Codeforces 1301F Super Jaber (多源bfs)
    分治法 实现归并排序
    分治法 解决最大字段和问题
    JS 禁用页面右键菜单
    泛型类型参数的限制: where使用方法
    JavaScript ReferenceError: Can’t find variable: __doPostBack
    EF 真分页
    EF 中 IEnumberable<> 和 IQueryable的区别
    C++ 字符串处理 重要函数
    Response.Redirect 导致 Session 丢失 Don't redirect after setting a Session variable
  • 原文地址:https://www.cnblogs.com/niutao/p/10627600.html
Copyright © 2011-2022 走看看