zoukankan      html  css  js  c++  java
  • 【原创】大数据基础之Hadoop(3)hdfs diskbalancer

    hdfs单个节点内多个磁盘不均衡时(比如新加磁盘),需要手工进行diskbalancer操作,命令如下

    # hdfs diskbalancer -help plan
    usage: hdfs diskbalancer -plan <hostname> [options]
    Creates a plan that describes how much data should be moved between disks.
     
     
        --bandwidth <arg>             Maximum disk bandwidth (MB/s) in integer
                                      to be consumed by diskBalancer. e.g. 10
                                      MB/s.
        --maxerror <arg>              Describes how many errors can be
                                      tolerated while copying between a pair
                                      of disks.
        --out <arg>                   Local path of file to write output to,
                                      if not specified defaults will be used.
        --plan <arg>                  Hostname, IP address or UUID of datanode
                                      for which a plan is created.
        --thresholdPercentage <arg>   Percentage of data skew that is
                                      tolerated before disk balancer starts
                                      working. For example, if total data on a
                                      2 disk node is 100 GB then disk balancer
                                      calculates the expected value on each
                                      disk, which is 50 GB. If the tolerance
                                      is 10% then data on a single disk needs
                                      to be more than 60 GB (50 GB + 10%
                                      tolerance value) for Disk balancer to
                                      balance the disks.
        --v                           Print out the summary of the plan on
                                      console
    

    其中thresholdPercentage的注释有歧义,看起来是根据绝对值进行均衡的,查看代码

    org.apache.hadoop.hdfs.server.diskbalancer.datamodel.DiskBalancerVolumeSet

    /**
     * Computes Volume Data Density. Adding a new volume changes
     * the volumeDataDensity for all volumes. So we throw away
     * our priority queue and recompute everything.
     *
     * we discard failed volumes from this computation.
     *
     * totalCapacity = totalCapacity of this volumeSet
     * totalUsed = totalDfsUsed for this volumeSet
     * idealUsed = totalUsed / totalCapacity
     * dfsUsedRatio = dfsUsedOnAVolume / Capacity On that Volume
     * volumeDataDensity = idealUsed - dfsUsedRatio
     */
    public void computeVolumeDataDensity() {
      long totalCapacity = 0;
      long totalUsed = 0;
      sortedQueue.clear();
     
      // when we plan to re-distribute data we need to make
      // sure that we skip failed volumes.
      for (DiskBalancerVolume volume : volumes) {
        if (!volume.isFailed() && !volume.isSkip()) {
     
          if (volume.computeEffectiveCapacity() < 0) {
            skipMisConfiguredVolume(volume);
            continue;
          }
     
          totalCapacity += volume.computeEffectiveCapacity();
          totalUsed += volume.getUsed();
        }
      }
     
      if (totalCapacity != 0) {
        this.idealUsed = truncateDecimals(totalUsed /
            (double) totalCapacity);
      }
     
      for (DiskBalancerVolume volume : volumes) {
        if (!volume.isFailed() && !volume.isSkip()) {
          double dfsUsedRatio =
              truncateDecimals(volume.getUsed() /
                  (double) volume.computeEffectiveCapacity());
     
          volume.setVolumeDataDensity(this.idealUsed - dfsUsedRatio);
          sortedQueue.add(volume);
        }
      }
    }
     
     
    /**
     * Computes whether we need to do any balancing on this volume Set at all.
     * It checks if any disks are out of threshold value
     *
     * @param thresholdPercentage - threshold - in percentage
     *
     * @return true if balancing is needed false otherwise.
     */
    public boolean isBalancingNeeded(double thresholdPercentage) {
      double threshold = thresholdPercentage / 100.0d;
     
      if(volumes == null || volumes.size() <= 1) {
        // there is nothing we can do with a single volume.
        // so no planning needed.
        return false;
      }
     
      for (DiskBalancerVolume vol : volumes) {
        boolean notSkip = !vol.isFailed() && !vol.isTransient() && !vol.isSkip();
        Double absDensity =
            truncateDecimals(Math.abs(vol.getVolumeDataDensity()));
     
        if ((absDensity > threshold) && notSkip) {
          return true;
        }
      }
      return false;
    }
    

    主要有两个函数,

    computeVolumeDataDensity:查看一个盘的数据密度,计算方法为 当前盘的空间占用比例(dfsUsedRatio)- 所有盘的空间占用比例(idealUsed)
    isBalancingNeeded:判断一个盘是否需要均衡,即数据密度的绝对值是否超过参数设置(thresholdPercentage)

    所以实际均衡的时候考虑的是空间占用比例,而不是空间占用绝对值


    ---------------------------------------------------------------- 结束啦,我是大魔王先生的分割线 :) ----------------------------------------------------------------
    • 由于大魔王先生能力有限,文中可能存在错误,欢迎指正、补充!
    • 感谢您的阅读,如果文章对您有用,那么请为大魔王先生轻轻点个赞,ありがとう
  • 相关阅读:
    ubuntu16.04安装破解pycharm
    python解压,压缩,以及存数据库的相关操作
    cocoapods Error
    swift项目导入OC框架
    实现全屏滑动返回效果
    Storyboard & XIB 自己的理解
    View & Controller 一些方法的执行顺序
    Touch ID 实现
    Apple Pay 初探
    ReactiveCocoa学习
  • 原文地址:https://www.cnblogs.com/barneywill/p/15226155.html
Copyright © 2011-2022 走看看