zoukankan      html  css  js  c++  java
  • 强制DataNode向NameNode上报blocks

    正常情况下,什么时候上报blocks,是由NameNode通过回复心跳响应的方式触发的。
    一次机房搬迁中,原机房hadoop版本为2.7.2,新机房版本为2.8.0,采用先扩容再缩容的方式搬迁。由于新旧机房机型不同和磁盘数不同,操作过程搞混过hdfs-site.xml,因为两种不同的机型,hdfs-site.xml不便做到相同,导致了NameNode报大量“missing block”。


    然而依据NameNode所报信息,在DataNode能找到那些被标记为“missing”的blocks。修复配置问题后,“missing block”并没有消失。结合DataNode源代码,估计是因为DataNode没有向NameNode上报blocks。


    结合DataNode的源代码,发现了HDFS自带的工具triggerBlockReport,它可以强制指定的DataNode向NameNode上报块,使用方法为:
    hdfs dfsadmin -triggerBlockReport datanode_host:ipc_port
    如:hdfs dfsadmin -triggerBlockReport 192.168.31.35:50020


    正常情况下NameNode启动时,会要求DataNode上报一次blocks(通过fullBlockReportLeaseId值来控制),相关源代码如下:


    DataNode相关代码(BPServiceActor.java):
    private void offerService() throws Exception {
        HeartbeatResponse resp = sendHeartBeat(requestBlockReportLease); // 向NameNode发向心跳
        long fullBlockReportLeaseId = resp.getFullBlockReportLeaseId(); // 心跳响应
        boolean forceFullBr = scheduler.forceFullBlockReport.getAndSet(false); // triggerBlockReport强制上报仅一次有效
        if (forceFullBr) {
            LOG.info("Forcing a full block report to " + nnAddr);
        }
        if ((fullBlockReportLeaseId != 0) || forceFullBr) {
            cmds = blockReport(fullBlockReportLeaseId);
            fullBlockReportLeaseId = 0;
        }
    }


    // NameNode相关代码(FSNamesystem.java):
    /**
    * The given node has reported in.  This method should:
    * 1) Record the heartbeat, so the datanode isn't timed out
    * 2) Adjust usage stats for future block allocation

    * If a substantial amount of time passed since the last datanode 
    * heartbeat then request an immediate block report.  

    * @return an array of datanode commands 
    * @throws IOException
    */
    HeartbeatResponse handleHeartbeat(DatanodeRegistration nodeReg,
      StorageReport[] reports, long cacheCapacity, long cacheUsed,
      int xceiverCount, int xmitsInProgress, int failedVolumes,
      VolumeFailureSummary volumeFailureSummary,
      boolean requestFullBlockReportLease) throws IOException {
        readLock();
        try {
            //get datanode commands
            final int maxTransfer = blockManager.getMaxReplicationStreams() - xmitsInProgress;
            DatanodeCommand[] cmds = blockManager.getDatanodeManager().handleHeartbeat(
                nodeReg, reports, blockPoolId, cacheCapacity, cacheUsed,
                xceiverCount, maxTransfer, failedVolumes, volumeFailureSummary);


            long fullBlockReportLeaseId = 0;
            if (requestFullBlockReportLease) {
                fullBlockReportLeaseId =  blockManager.requestBlockReportLeaseId(nodeReg);
            }
            //create ha status
            final NNHAStatusHeartbeat haState = new NNHAStatusHeartbeat(
                haContext.getState().getServiceState(),
                getFSImage().getCorrectLastAppliedOrWrittenTxId());


            return new HeartbeatResponse(cmds, haState, rollingUpgradeInfo, fullBlockReportLeaseId);
        } finally {
            readUnlock("handleHeartbeat");
        }
    }
  • 相关阅读:
    SignalR实现服务器与客户端的实时通信
    UIWebView全解
    查漏补缺
    Django的生命周期图解
    权限系统(第一次测试)
    Django权限管理测试
    Django_自带的admin管理页面
    django笔记整理
    cookie/session(过时的写法)
    图书管理系统设置登录验证(cookies)
  • 原文地址:https://www.cnblogs.com/aquester/p/9891502.html
Copyright © 2011-2022 走看看