zoukankan      html  css  js  c++  java
  • 二、HDFS 架构

    源自:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html

    HDFS has a master/slave architecture. An HDFS cluster consists of a single NameNode, a master server that manages the file system namespace and regulates access to files by clients. In addition, there are a number of DataNodes, usually one per node in the cluster, which manage storage attached to the nodes that they run on. HDFS exposes a file system namespace and allows user data to be stored in files. Internally, a file is split into one or more blocks and these blocks are stored in a set of DataNodes. The NameNode executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to DataNodes. The DataNodes are responsible for serving read and write requests from the file system’s clients. The DataNodes also perform block creation, deletion, and replication upon instruction from the NameNode.

    namenode:存储系统的元数据(用于描述数据的数据,内存),例如 文件命名空间/block到datanode的映射.负责管理datanode

    datanode:用于存储数据块的节点.负责响应客户端对块的读写请求,向namenode汇报自己块信息.

    block:数据块,是对文件拆分的最小单位,表示一个切分尺度默认值128MB,每个数据块的默认副本因子是3通过

    dfs.replication进行配置,用户可以通过dfs.blocksize设置块大小

    rack机架,使用机架对存储节点做物理编排,用于优化存储和计算.查看机架

    [root@CentOS ~]# hdfs dfsadmin -printTopology
    Rack: /default-rack
       192.168.169.139:50010 (CentOS)
    

    为什么说HDFS不擅长存储小文件?

        文件      	 namenode占用(内存) 	 datanode占用磁盘 
    128MB 单个文件  	  1个block元数据信息  	128MB  *  副本因子
    

    128MB 10000个文件 10000个block元数据信息 128MB * 副本因子

    因为Namenode是使用单机的内存存储元数据,因此导致namenode内存紧张.

    NameNode和Secondary Namenode的关系?

    辅助NameNode整理Edits和Fsimage文件,加速NameNode启动过程.

    HDFS Shell

    [root@CentOS ~]# hdfs dfs -help     
    Usage: hadoop fs [generic options]
    	[-appendToFile <localsrc> ... <dst>]
    	[-cat [-ignoreCrc] <src> ...]  #
    	[-checksum <src> ...]          #
    	[-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]    #
    	[-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]     #
    	[-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]   #
    	[-cp [-f] [-p | -p[topax]] <src> ... <dst>]               #
    	[-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]      #
    	[-help [cmd ...]]
    	[-ls [-d] [-h] [-R] [<path> ...]]   #
    	[-mkdir [-p] <path> ...]                                              #新建文件夹
    	[-moveFromLocal <localsrc> ... <dst>]
    	[-moveToLocal <src> <localdst>]
    	[-mv <src> ... <dst>]
    	[-put [-f] [-p] [-l] <localsrc> ... <dst>]    #
    	[-rm [-f] [-r|-R] [-skipTrash] <src> ...]     #
    	[-rmdir [--ignore-fail-on-non-empty] <dir> ...]
    	[-tail [-f] <file>]                                                    #
    	[-text [-ignoreCrc] <src> ...]
    	[-touchz <path> ...]                                #
    	[-usage [cmd ...]]
    

    hdfs dfs -ls / 这条执行会列出/目录下的文件和目录

    hdfs dfs -ls -R /这条会列出/目录下的左右文件,由于有-R参数,会在文件夹和子文件夹下执行ls操作。

    [root@CentOS sysconfig]# hdfs dfs -mkdir -p /tt/test     #新建文件夹
    [root@CentOS ~]# touch 123.txt
    [root@CentOS ~]# vi 123.txt
    [root@CentOS ~]# hdfs dfs -copyFromLocal ~/123.txt /tt   #复制文件到hdfs
    [root@CentOS ~]# hdfs dfs -cat /tt/test/123.txt          #查看文件
    雲想衣山花形容
    
    
    [root@CentOS 123123]# hdfs dfs -copyToLocal /tt/test/123.txt /usr/local/222.txt     #可以把hdfs中的文件copy到本地
    [root@CentOS 123123]# cd ..
    [root@CentOS local]# ls
    123123  222.txt  bin  etc  games  include  lib  lib64  libexec  sbin  share  src
    [root@CentOS local]# hdfs dfs -put 123123 /tt          #将本地文件或目录(eg:123123)上传到HDFS中的路径( /tt)
    
    [root@CentOS local]# hdfs dfs -ls /tt/                     #查看文件夹下的目录
    Found 2 items
    -rw-r--r--   1 root supergroup         22 2019-01-03 04:18 /tt/123.txt
    -rw-r--r--   1 root supergroup          0 2019-01-03 04:28 /tt/777.txt
    
    [root@CentOS local]# hdfs dfs -rm -f /tt/123.txt            #删除文件
    19/01/03 03:54:55 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
    Deleted /tt/123.txt
    [root@CentOS local]# hdfs dfs -rm -r /tt/test               #删除文件夹
    19/01/03 03:55:58 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 0 minutes, Emptier interval = 0 minutes.
    Deleted /tt/test
    
    [root@CentOS ~]# hdfs dfs -checksum /tt                      #查看文件大小
    checksum: `/tt': Is a directory
    [root@CentOS ~]# hdfs dfs -checksum /tt/123.txt
    /tt/123.txt     MD5-of-0MD5-of-512CRC32C        000002000000000000000000790c2cd6e313015e7896c41d37dce4d5
    
    [root@CentOS local]# hdfs dfs -cp /tt/123.txt /             #拷贝一个文件到另一个文件
    
    [root@CentOS local]# hdfs dfs -touchz /tt/777.txt           #创建文件
    
    [root@CentOS local]# hdfs dfs -tail /tt/123.txt             #显示文件最后的1KB内容到标准输出。
    雲想衣山花形容
    
    [root@CentOS local]# hdfs dfs -get /tt/777.txt /usr/local   #.将文件或目录从HDFS中的路径(/tt/777.txt)拷贝到本地文件路径(/usr/local)
    [root@CentOS local]# ls
    123123  222.txt  777.txt 
    [root@CentOS local]# hdfs dfs -ls -R  /tt/                 #递归地显示子目录下的内容。
    -rw-r--r--   1 root supergroup         22 2019-01-03 04:18 /tt/123.txt
    -rw-r--r--   1 root supergroup          0 2019-01-03 04:28 /tt/777.txt
    drwxr-xr-x   - root supergroup          0 2019-01-03 04:40 /tt/test
    -rw-r--r--   1 root supergroup         22 2019-01-03 04:40 /tt/test/222.txt
    [root@CentOS local]# hdfs dfs -chmod -R 755 /tt/123.txt
    [root@CentOS local]# hdfs dfs -ls -R  /tt/
    -rwxr-xr-x   1 root supergroup         22 2019-01-03 04:18 /tt/123.txt
    

    更多参考:http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/FileSystemShell.html#appendToFile

  • 相关阅读:
    浅谈软件测试流程
    在9个点上画10条直线,要求每条直线上至少有三个点
    word中快速插入时间
    多核处理器时,__rdtsc()的使用编程珠玑第一章
    解决 error LNK2019: 无法解析的外部符号 问题
    修改IE代理
    overload重载 override覆盖 overwirte重写
    几个题目
    12个球一个天平,现知道只有一个和其它的重量不同,问怎样称才能用三次就找到那个球。
    在link.c中已经include了头文件了,为什么使用ld还无法识别mian和printf?
  • 原文地址:https://www.cnblogs.com/adrien/p/10222602.html
Copyright © 2011-2022 走看看