zoukankan      html  css  js  c++  java
  • HDFS

    HDFS集群主要由管理文件系统元数据的NameNode和存储实际数据的DataNode组成.

    HDFS架构描述了NameNode,DataNodes与客户端的基本交互.
    客户端与NameNode联系以进行文件元数据或文件修改,并直接与DataNode执行实际的文件I / O。


    Hadoop一些显著的特性:
    1)Hadoop, including HDFS, is well suited for distributed storage and distributed processing using commodity hardware. It is fault tolerant, scalable, and extremely simple to expand. MapReduce, well known for its simplicity and applicability for large set of distributed applications, is an integral part of Hadoop.

    2)HDFS is highly configurable with a default configuration well suited for many installations. Most of the time, configuration needs to be tuned only for very large clusters.

    3)Hadoop is written in Java and is supported on all major platforms.

    4)Hadoop supports shell-like commands to interact with HDFS directly.

    5)The NameNode and Datanodes have built in web servers that makes it easy to check current status of the cluster.

    6)New features and improvements are regularly implemented in HDFS. The following is a subset of useful features in HDFS:

    7)File permissions and authentication.
    8)Rack awareness: to take a node’s physical location into account while scheduling tasks and allocating storage.
    9)Safemode: an administrative mode for maintenance.
    10)fsck: a utility to diagnose health of the file system, to find missing files or blocks.
    11)fetchdt: a utility to fetch DelegationToken and store it in a file on the local system.
    12)Balancer: tool to balance the cluster when the data is unevenly distributed among DataNodes.
    13)Upgrade and rollback: after a software upgrade, it is possible to rollback to HDFS’ state before the upgrade in case of unexpected problems.
    14)Secondary NameNode: performs periodic checkpoints of the namespace and helps keep the size of file containing log of HDFS modifications within certain limits at the NameNode.
    15)Checkpoint node: performs periodic checkpoints of the namespace and helps minimize the size of the log stored at the NameNode containing changes to the HDFS. Replaces the role previously filled by the Secondary NameNode, though is not yet battle hardened. The NameNode allows multiple Checkpoint nodes simultaneously, as long as there are no Backup nodes registered with the system.
    16)Backup node: An extension to the Checkpoint node. In addition to checkpointing it also receives a stream of edits from the NameNode and maintains its own in-memory copy of the namespace, which is always in sync with the active NameNode namespace state. Only one Backup node may be registered with the NameNode at once.


    Web界面
    每个NameNode和DataNode都运行了一个内部web服务器.
    默认配置下,NameNode首页为:http://namenode-name:50070/
    也可以浏览HDFS文件系统(使用"Browse the file system")

    Shell命令:
     bin/hdfs dfs -help              #Hadoop shell所支持的命令列表
     bin/hdfs dfs -help command-name #显示某个命令的详细帮助信息

    dfsadmin命令
    bin/hdfs dfsadmin -help

    hdfs dfsadmin -printTopology  # 输出集群的拓扑

    Although the Hadoop framework is implemented in Java™, MapReduce applications need not be written in Java.

    Hadoop Streaming is a utility which allows users to create and run jobs with any executables (e.g. shell utilities) as the mapper and/or the reducer.

    Hadoop Pipes is a SWIG-compatible C++ API to implement MapReduce applications (non JNI™ based).

  • 相关阅读:
    【开发工具 idea】值得推荐的15款idea插件
    【python pip】一招解决pip下载过慢问题
    【python pip】一招解决移动python安装路径pip不可用问题
    xmake新增对Qt编译环境支持
    xmake入门,构建项目原来可以如此简单
    xmake新增对Cuda代码编译支持
    不同编译器对预编译头文件的处理
    xmake v2.1.9版本发布,增加可视化图形菜单配置
    xmake-vscode插件开发过程记录
    xmake-vscode插件开发过程记录
  • 原文地址:https://www.cnblogs.com/datapool/p/6142925.html
Copyright © 2011-2022 走看看