zoukankan      html  css  js  c++  java
  • HDFS块检查命令Fsck机理的分析

    前言


    在HDFS中,所有的文件都是以block块的概念而存在的,那么在这样海量的文件数据的情况下,难免会发生一些文件块损坏的现象,那么有什么好的办法去发现呢.答案是使用HDFS的fsck相关的命令.这个命令独立于dfsadmin的命令,可能会让部分人不知道HDFS中还存在这样的命令,本文就来深度挖掘一下这个命令的特殊的用处和内在机理的实现.

    Fsck命令


    其实说到fsck命令本身,熟悉Linux操作系统的人,可能或多或少听到过或使用过这个命令.Fsck命令的全称为file system check,更加类似的是一种修复命令.当然,本文不会讲大量的关于操作系统的fsck怎么用,而是HDFS下的fsck的使用,在bin/hdfs fsck下还是有很多可选参数的.

    Fsck参数使用


    本人在测试集群中输入hdfs fsck命令,获取了帮助信息,在此信息中展示了最全的参数使用说明:

    $ hdfs fsck
    Usage: hdfs fsck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]]
        <path>  start checking from this path
        -move   move corrupted files to /lost+found
        -delete delete corrupted files
        -files  print out files being checked
        -openforwrite   print out files opened for write
        -includeSnapshots   include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it
        -list-corruptfileblocks print out list of missing blocks and files they belong to
        -blocks print out block report
        -locations  print out locations for every block
        -racks  print out network topology for data-node locations
        -storagepolicies    print out storage policy summary for the blocks
    
        -blockId    print out which file this blockId belongs to, locations (nodes, racks) of this block, and other diagnostics info (under replicated, corrupted or not, etc)

    简单的总结一下,首先是必填参数和命令名:

    bin/hdfs fsck <path>

    然后是一堆的可选参数:

    • -move: 移动损坏的文件到/lost+found目录下
    • -delete: 删除损坏的文件
    • -files: 输出正在被检测的文件
    • -openforwrite: 输出检测中的正在被写的文件
    • -includeSnapshots: 检测的文件包括系统snapShot快照目录下的
    • -list-corruptfileblocks: 输出损坏的块及其所属的文件
    • -blocks: 输出block的详细报告
    • -locations: 输出block的位置信息
    • -racks: 输出block的网络拓扑结构信息
    • -storagepolicies: 输出block的存储策略信息
    • -blockId: 输出指定blockId所属块的状况,位置等信息

    具体参数功能对应到相应的程序会在下文的分析中进行详细的阐述.

    Fsck过程调用


    Fsck过程的调用指的是从终端机器输入到最终fsck在HDFS内部被执行的整个过程.中间穿过的类的其实不多,本人做了一张简图:
    这里写图片描述
    上图的调用形式,可以说是三层调用的结构.DFSck就是暴露在最外层的类.我们再来规整规整中间的过程.

    • 输入fsck 直接调用到的就是此类.DFSck内部会发送http请求的方式,根据参数构造URL请求地址,发送到下一个处理对象中.
    • 下一个处理对象就是FsckServlet.FsckServlet在这里相当于一个过渡者,马上调用真正操作类NamenodeFsck.
    • NamenodeFsck在这里会取出请求参数,然后在HDFS内部做真正的fsck检测操作.

    Fsck原理分析


    Fsck原理分析将会展示更加细致的fsck过程调用.按照上小节的提到的3层调用,同样我们也分为3个层次的渐近性的分析.

    DFSck请求构造


    你可以把此类想象成DFSAdmin.首先进入命令输入处理入口方法:

    public int run(final String[] args) throws IOException {
        if (args.length == 0) {
          printUsage(System.err);
          return -1;
        }
    
        try {
          return UserGroupInformation.getCurrentUser().doAs(
              new PrivilegedExceptionAction<Integer>() {
                @Override
                public Integer run() throws Exception {
                  return doWork(args);
                }
              });
        } catch (InterruptedException e) {
          throw new IOException(e);
        }
      }

    在doWork方法中,马上就看到了对于参数的判别分类,同时开始构造不同的参数请求.

    private int doWork(final String[] args) throws IOException {
        final StringBuilder url = new StringBuilder();
    
        url.append("/fsck?ugi=").append(ugi.getShortUserName());
        String dir = null;
        boolean doListCorruptFileBlocks = false;
        for (int idx = 0; idx < args.length; idx++) {
          if (args[idx].equals("-move")) { url.append("&move=1"); }
          else if (args[idx].equals("-delete")) { url.append("&delete=1"); }
          else if (args[idx].equals("-files")) { url.append("&files=1"); }
          else if (args[idx].equals("-openforwrite")) { url.append("&openforwrite=1"); }
          else if (args[idx].equals("-blocks")) { url.append("&blocks=1"); }
          else if (args[idx].equals("-locations")) { url.append("&locations=1"); }
          else if (args[idx].equals("-racks")) { url.append("&racks=1"); }
          else if (args[idx].equals("-storagepolicies")) { url.append("&storagepolicies=1"); }
          ...

    不同类型的参数后面接的参数值也不一定相同,比如-blockId后面则会跟连续的blockId.

    ...
    } else if (args[idx].equals("-blockId")) {
            StringBuilder sb = new StringBuilder();
            idx++;
            while(idx < args.length && !args[idx].startsWith("-")){
              sb.append(args[idx]);
              sb.append(" ");
              idx++;
            }
            url.append("&blockId=").append(URLEncoder.encode(sb.toString(), "UTF-8"));
    ...        

    请求url构造好之后,就会发起请求

    URL path = new URL(url.toString());
        URLConnection connection;
        try {
          connection = connectionFactory.openConnection(path, isSpnegoEnabled);
        } catch (AuthenticationException e) {
          throw new IOException(e);
        }

    然后获取响应回复,直接输出到终端上.

    InputStream stream = connection.getInputStream();
        BufferedReader input = new BufferedReader(new InputStreamReader(stream, "UTF-8"));
        String line = null;
        String lastLine = null;
        int errCode = -1;
        try {
          while ((line = input.readLine()) != null) {
            out.println(line);
            lastLine = line;
          }
        } finally {
          input.close();
        }

    OK,DFSck最外层面的调用过就走通了.

    FsckServlet请求处理

    上个步骤中url请求会转到FsckServlet中处理,类似代理人的角色,然后马上调用NamenodeFsck进行处理

    /** Handle fsck request */
      @Override
      public void doGet(HttpServletRequest request, HttpServletResponse response
          ) throws IOException {
        @SuppressWarnings("unchecked")
        final Map<String,String[]> pmap = request.getParameterMap();
        ...
    
        final UserGroupInformation ugi = getUGI(request, conf);
        try {
          ugi.doAs(new PrivilegedExceptionAction<Object>() {
            @Override
            public Object run() throws Exception {
              NameNode nn = NameNodeHttpServer.getNameNodeFromContext(context);
    
              final FSNamesystem namesystem = nn.getNamesystem();
              final BlockManager bm = namesystem.getBlockManager();
              final int totalDatanodes = 
                  namesystem.getNumberOfDatanodes(DatanodeReportType.LIVE); 
              new NamenodeFsck(conf, nn,
                  bm.getDatanodeManager().getNetworkTopology(), pmap, out,
                  totalDatanodes, remoteAddress).fsck();
    
              return null;
            }
          });
        } catch (InterruptedException e) {
          response.sendError(400, e.getMessage());
        }
      }

    NamenodeFsck的fsck处理

    上节中最后一个步骤最终调用的就是NamenodeFsck的fsck方法.在进入这个方法之前,先看一下,这个类的一些关键变量.

      private String lostFound = null;
      private boolean lfInited = false;
      private boolean lfInitedOk = false;
      private boolean showFiles = false;
      private boolean showOpenFiles = false;
      private boolean showBlocks = false;
      private boolean showLocations = false;
      private boolean showRacks = false;
      private boolean showStoragePolcies = false;
      private boolean showCorruptFileBlocks = false;

    这些布尔类型的变量对应的就是控制fsck帮助信息所展示的各个参数.个人感觉fsck方法内部的处理顺序看起来有点乱,为了便于大家的理解,这里对指定参数进行指定分析的方式,就不转行对照的描述了.

    -list-corruptfileblocks


    第一个参数方法-list-corruptfileblocks,展示丢失/损坏的块.

            if (showCorruptFileBlocks) {
              listCorruptFileBlocks();
              return;
            }

    然后调用到同名方法listCorruptFileBlocks.

      private void listCorruptFileBlocks() throws IOException {
        Collection<FSNamesystem.CorruptFileBlockInfo> corruptFiles = namenode.
          getNamesystem().listCorruptFileBlocks(path, currentCookie);
        int numCorruptFiles = corruptFiles.size();
        ...
        out.println("Cookie:	" + currentCookie[0]);
        for (FSNamesystem.CorruptFileBlockInfo c : corruptFiles) {
          out.println(c.toString());
        }
        out.println("
    
    The filesystem under path '" + path + "' has " + filler
            + " CORRUPT files");
        out.println();
      }

    此方法最终会调用到FSNamesystem的listCorruptFileBlocks方法,注意这里还传入了一个特别的参数currentCookie.这个参数的作用可是非常的巧妙的.进入FSNamesystem的方法,首先初始化对象损坏文件块对象:

    ArrayList<CorruptFileBlockInfo> corruptFiles = new ArrayList<CorruptFileBlockInfo>();

    方法返回的对象也即是此对象.
    然后进入关键的损坏文件的判断逻辑

        // Do a quick check if there are any corrupt files without taking the lock
        if (blockManager.getMissingBlocksCount() == 0) {
          if (cookieTab[0] == null) {
            cookieTab[0] = String.valueOf(getIntCookie(cookieTab[0]));
          }
          if (LOG.isDebugEnabled()) {
            LOG.debug("there are no corrupt file blocks.");
          }
          return corruptFiles;
        }

    blockManager的getMissingBlocksCount方法取的就是损坏块队列的大小.

      public long getMissingBlocksCount() {
        // not locking
        return this.neededReplications.getCorruptBlockSize();
      }

    如果此方法的Count返回值有值,就是大于0,则方法执行继续

          // 获取损坏块的block迭代器
          final Iterator<Block> blkIterator = blockManager.getCorruptReplicaBlockIterator();
          // 取出cookie值作为标记位,跳过标记下标之前的文件,代表已经浏览过
          int skip = getIntCookie(cookieTab[0]);
          for (int i = 0; i < skip && blkIterator.hasNext(); i++) {
            blkIterator.next();
          }
    
          while (blkIterator.hasNext()) {
            Block blk = blkIterator.next();
            final INode inode = (INode)blockManager.getBlockCollection(blk);
            //更新skip跳过值
            skip++;
            if (inode != null && blockManager.countNodes(blk).liveReplicas() == 0) {
              String src = FSDirectory.getFullPathName(inode);
              if (src.startsWith(path)){
                corruptFiles.add(new CorruptFileBlockInfo(src, blk));
                count++;
                if (count >= DEFAULT_MAX_CORRUPT_FILEBLOCKS_RETURNED)
                  break;
              }
            }
          }
          //更新cookie标记值
          cookieTab[0] = String.valueOf(skip);

    cookie的作用就是如上注释所说,获取到此返回损坏文件列表后,会在上一方法中将结果输出

    for (FSNamesystem.CorruptFileBlockInfo c : corruptFiles) 
        {
          out.println(c.toString());
        }

    fsck -path默认处理方法


    fsck的默认处理方法指的就是fsck+path的方法,为什么紧接着讲这个方法呢,因为fsck的path方法处理也包括了扫描损坏块的方法,但是在逻辑上与-list-corruptfiles竟然还不一样,这一点本人在阅读的时候,也是感到比较吃惊的.首先大家传入的path会被传入到内部方法check中处理

            Result res = new Result(conf);
    
            check(path, file, res);
    
            out.println(res);
            out.println(" Number of data-nodes:		" + totalDatanodes);
            out.println(" Number of racks:		" + networktopology.getNumOfRacks());

    然后会进行目录,文件的判断,如果是目录,则进行递归调用

        if (file.isDir()) {
          // 如果快照目录包含此路径,则递归快照目录下的path
          if (snapshottableDirs != null && snapshottableDirs.contains(path)) {
            String snapshotPath = (path.endsWith(Path.SEPARATOR) ? path : path
                + Path.SEPARATOR)
                + HdfsConstants.DOT_SNAPSHOT_DIR;
            HdfsFileStatus snapshotFileInfo = namenode.getRpcServer().getFileInfo(
                snapshotPath);
            check(snapshotPath, snapshotFileInfo, res);
          }
          ...
          do {
            assert lastReturnedName != null;
            thisListing = namenode.getRpcServer().getListing(
                path, lastReturnedName, false);
            if (thisListing == null) {
              return;
            }
            HdfsFileStatus[] files = thisListing.getPartialListing();
            //递归变量此path的子文件,如果此path是目录的话
            for (int i = 0; i < files.length; i++) {
              check(path, files[i], res);
            }
            lastReturnedName = thisListing.getLastName();
          } while (thisListing.hasMore());
          return;
        }

    在接下来的分析检测文件时,会进行相应指标的统计值更新

    isOpen = blocks.isUnderConstruction();
        if (isOpen && !showOpenFiles) {
          // We collect these stats about open files to report with default options
          res.totalOpenFilesSize += fileLen;
          res.totalOpenFilesBlocks += blocks.locatedBlockCount();
          res.totalOpenFiles++;
          return;
        }
        res.totalFiles++;
        res.totalSize += fileLen;
        res.totalBlocks += blocks.locatedBlockCount();

    下面是关键的判断path下所属的block块中的损坏块的判断逻辑:

    ...
    for (LocatedBlock lBlk : blocks.getLocatedBlocks()) {
          ExtendedBlock block = lBlk.getBlock();
          boolean isCorrupt = lBlk.isCorrupt();
          String blkName = block.toString();
    ...

    这里直接利用了LocatedBlock内部的isCorrupt的方法,然后进行corrupt计数累加

          // Check if block is Corrupt
          if (isCorrupt) {
            corrupt++;
            res.corruptBlocks++;
            out.print("
    " + path + ": CORRUPT blockpool " + block.getBlockPoolId() + 
                " block " + block.getBlockName()+"
    ");
          }

    而且在这里,missing块的判断逻辑是独立于corrupt块的.

    // 重新进行块副本数的统计
    NumberReplicas numberReplicas =
              namenode.getNamesystem().getBlockManager().countNodes(block.getLocalBlock());
          // 获取存在的副本数
          int liveReplicas = numberReplicas.liveReplicas();
          // 如果当前副本数确实为0,则表明已经是missing块
          if (liveReplicas == 0) {
            report.append(" MISSING!");
            res.addMissing(block.toString(), block.getNumBytes());
            missing++;
            missize += block.getNumBytes();
          } else {

    重新回顾以上check方法中的这2类块判断逻辑,第二个missing块的判断逻辑,我个人认为是没有问题的,但是第一个corrupt的判断我个人感觉可能有点问题,尽管说LocatedBlock提供了内部方法isCorrupt,但是我在查询isCorrupt的调用处时发现绝大多数情况下都是false参数默认传入的,而且在数据实时性和有效性上而言,这个方法是没有-list-corruptfiles参数来的快与准的(个人观点,可能理解不同).因为-list-corruptfiles直接是从FSNamesystem类中取的,一方面代表的已经是最新的损坏数据情况了.

    fsck -delete/-move


    这2个命令作用是找到损坏块之后,打算要做什么事情,就是下面2行代码所控制的:

    ...
    } else {
            if (doMove) copyBlocksToLostFound(parent, file, blocks);
            if (doDelete) deleteCorruptedFile(path);
          }
    ...

    LostFound指的是/lost+found目录,下,就是说-move参数会将损坏块文件,移至此目录下,而-delet则会调用直接删除的方法

      private void deleteCorruptedFile(String path) {
        try {
          namenode.getRpcServer().delete(path, true);
          LOG.info("Fsck: deleted corrupt file " + path);
        } catch (Exception e) {
          LOG.error("Fsck: error deleting corrupted file " + path, e);
          internalError = true;
        }
      }

    其实这2个命令的还是比较有用的.如果集群中存在大量损坏块数据的情况时,如果不及时进行清理,会出现大量客户端读写操作的失败,因为元数据虽然存在,但是真实数据已经损坏,读写操作必然会抛出异常.

    fsck辅助显示参数


    以上几个是fsck的主要参数,下面是一些辅助的次要一些的参数.

    • -locations/-racks

      if (showLocations || showRacks) {
            StringBuilder sb = new StringBuilder("[");
            for (int j = 0; j < locs.length; j++) {
              if (j > 0) { sb.append(", "); }
              if (showRacks)
                sb.append(NodeBase.getPath(locs[j]));
              else
                sb.append(locs[j]);
            }
            sb.append(']');
            report.append(" " + sb.toString());
          }
    • -storagepolicies

              if (this.showStoragePolcies) {
            storageTypeSummary = new StoragePolicySummary(
                namenode.getNamesystem().getBlockManager().getStoragePolicies());
          }
      
          ...
      
          if (this.showStoragePolcies) {
            out.print(storageTypeSummary.toString());
          }
    • -includeSnapshots
      此参数会获取到namenode快照中的目录信息

            if (snapshottableDirs != null) {
          SnapshottableDirectoryStatus[] snapshotDirs = namenode.getRpcServer()
              .getSnapshottableDirListing();
          if (snapshotDirs != null) {
            for (SnapshottableDirectoryStatus dir : snapshotDirs) {
              snapshottableDirs.add(dir.getFullPath().toString());
            }
          }
        }

    在这些参数执行期间,会伴随着输出结果的直接输出,所以你会看到路线的信息被展示范,输出的最末端,会给出总结报告,如下所示

     Total size:    88.13 KB
     Total dirs:    14
     Total files:   20
     Total symlinks:        0
     Total blocks (validated):  20 (avg. block size 4512 B)
      ********************************
      UNDER MIN REPL'D BLOCKS:  20 (100.0 %)
      dfs.namenode.replication.min: 1
      CORRUPT FILES:    20
      MISSING BLOCKS:   20
      MISSING SIZE:     88.13 KB
      CORRUPT BLOCKS:   20
      ********************************
     Minimally replicated blocks:   0 (0.0 %)
     Over-replicated blocks:    0 (0.0 %)
     Under-replicated blocks:   0 (0.0 %)
     Mis-replicated blocks:     0 (0.0 %)
     Default replication factor:    0
     Average block replication: 0.0
     Corrupt blocks:        20
     Missing replicas:      0
     Number of data-nodes:      0
     Number of racks:       0
    FSCK ended at Tue Mar 29 11:10:33 CST 2016 in 10 milliseconds
    
    
    The filesystem under path '/' is CORRUPT

    OK,NamenodeFsck的处理过程和参数控制就是如上所述,方法集中在fsck和check2个方法内,其间根据所选参数进行选择性中间结果输出,下面是一张简图
    这里写图片描述
    希望本文能给大家对HDFS的fsck命令相关的理解与使用带来帮助.

  • 相关阅读:
    辞职信(转贴)
    去掉控件上显示聚焦框
    静态构造函数
    用Excel 公式求 金额的差额
    2020/2/6学习总结
    2020/2/3学习总结
    2020/2/7学习总结
    2020/1/31学习总结
    2020/2/5学习总结
    2020/2/4学习总结
  • 原文地址:https://www.cnblogs.com/bianqi/p/12183777.html
Copyright © 2011-2022 走看看