zoukankan      html  css  js  c++  java
  • Formatting HDFS

    Working on hadoop, especially on test clusters, I have managed to break my HDFS layer and sometimes with no possible redemption, or at least none that I wanted to invest time in. For whatever other reason sometimes you just want to scratch your HDFS and start anew.

    Without going on too much details, which is outside the point of this blog post. HDFS is mainly composed of 2 types of elements:

    • Namenode: At high level the namenode stores the HDFS namespace, think of it as your file system tree.
    • Datanode: this is where your data is actually stored

    The Namenode: /hadoop/hdfs/namenode/current

    Capture d’écran 2015-07-20 à 09.35.34

    All new edits are written to the the edit log and regularly merged out to an FSImage file, for more concise management. An fsimage file represents the file system state after all modifications up to a specific transaction ID.   The seen_txid file, has the last seen transaction.                                           VERSION: contains cluster and hdfs IDs.

    For a  more detailled explanation: Hdfs metadata

    The Datanode: /hadoop/hdfs/data/current

    Capture d’écran 2015-07-20 à 09.42.17 In our example we will only focus on VERSIOn very close to the namenode VERSION.

    Hdfs non HA formatting

    In non HA everything is simple enough.

    1. Stop the HDFS Service
    2. run hadoop namenode -format​ (as user hdfs)
    3. clear the data directory on all datanodes
    4. restart hdfs

    At this point your HDFS layer is empty and if you check the VERSION of namenodes and datanodes they should coincide

    Hdfs HA formatting

    In HA things get a little more complicated. In HA Standby and Active namenodes have a shared storage managed by the journal node service. HA relies on a failover scenario to swap from StandBy to Active Namenode and as any other system in hadoop this uses zookeeper. As you can see a couple more pieces need to made aware of a formatting action.

    The initial steps are very close

    1. Stop the Hdfs service
    2. Start only the journal nodes (as they will need to be made aware of the formatting)
    3. On the first namenode (as user hdfs)
      1.  hadoop namenode -format​ 
      2. hdfs namenode -initializeSharedEdits -force (for the journal nodes)
      3. hdfs zkfc -formatZK -force (to force zookeeper to reinitialise)​
      4. restart that first namenode
    4. On the second namenode
      1. hdfs namenode -bootstrapStandby -force ​(force synch with first namenode)
    5. On every datanode clear the data directory
    6. Restart the HDFS service

    This was a very simple step by step guide to formatting. In a later article we will cover actually repairing common errors in HDFS

  • 相关阅读:
    目标检测算法的进展
    基础 | batchnorm原理及代码详解
    MTCNN人脸及特征点检测---代码应用详解(基于ncnn架构)
    Android.mk文件c++头文件包含问题
    linux下的find文件查找命令与grep文件内容查找命令
    TensorFlow基础笔记(15) 编译TensorFlow.so,提供给C++平台调用
    TensorFlow基础笔记(0) tensorflow的基本数据类型操作
    tensorflow函数解析:Session.run和Tensor.eval的区别
    TensorFlow基础笔记(14) 网络模型的保存与恢复_mnist数据实例
    SQL Server 2005/2008遍历所有表更新统计信息
  • 原文地址:https://www.cnblogs.com/felixzh/p/10797011.html
Copyright © 2011-2022 走看看