  • Hadoop常见重要命令行操作及命令作用


    [root@master ~]# hadoop --help
    Usage: hadoop [--config confdir] COMMANDwhere COMMAND is one of:
      fs                   run a generic filesystem user client
      version              print the version
      jar <jar>            run a jar file
      checknative [-a|-h]  check native hadoop and compression libraries availability
      distcp <srcurl> <desturl> copy file or directories recursively
      archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
      classpath            prints the class path needed to get the
                           Hadoop jar and the required libraries
      daemonlog            get/set the log level for each daemon
      CLASSNAME            run the class named CLASSNAME
    Most commands print help when invoked w/o parameters.
    [root@master ~]# hadoop version
    Subversion git@github.com:hortonworks/hadoop.git -r b07b2906c36defd389c8b5bd22bebc1bead8115b
    Compiled by jenkins on 2014-01-09T05:18Z
    Compiled with protoc 2.5.0From source with checksum 704f1e463ebc4fb89353011407e965
    This command was run using /usr/lib/hadoop/hadoop-common-
    [root@master liguodong]# hadoop jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-examples- pi 10 100
    Number of Maps  = 10
    Samples per Map = 100
    Wrote input for Map #0
    Wrote input for Map #1
    Wrote input for Map #2...
    Job Finished in 19.715 seconds
    Estimated value of Pi is 3.14800000000000000000
    [root@master liguodong]# hadoop checknative -a15/06/03 10:28:07 INFO bzip2.Bzip2Factory: Successfully loaded & initialized native-bzip2 library system-native
    15/06/03 10:28:07 INFO zlib.ZlibFactory: Successfully loaded & initialized native-zlib library
    Native library checking:
    hadoop: true /usr/lib/hadoop/lib/native/libhadoop.so.1.0.0zlib:   true /lib64/libz.so.1snappy: true /usr/lib64/libsnappy.so.1lz4:    true revision:43bzip2:  true /lib64/libbz2.so.1
    文件归档 Archive

    Hadoop Archives (HAR files)是在0.18.0版本中引入的,它的出现就是为了 
    通过HAR来读取一个文件并不会比直接从HDFS中读取文件高效,而且实际上可能还会稍微低效一点,因为对每一个HAR文件的访问都需要完成两层读取,index文件的读取和文件本身数据的读取。并且尽管HAR文件可以被用来作为MapReduce job的input,但是并没有特殊的方法来使maps将HAR文件中打包的文件当作一个HDFS文件处理。 
    创建文件 hadoop archive -archiveName xxx.har -p /src /dest 
    查看内容 hadoop fs -lsr har:///dest/xxx.har

    [root@master liguodong]# hadoop archive
    archive -archiveName NAME -p <parent path> <src>* <dest>
    [root@master liguodong]# hadoop fs -lsr /liguodong
    drwxrwxrwx   - hdfs      hdfs          0 2015-05-04 19:40 /liguodong/output
    -rwxrwxrwx   3 hdfs      hdfs          0 2015-05-04 19:40 /liguodong/output/_SUCCESS
    -rwxrwxrwx   3 hdfs      hdfs         23 2015-05-04 19:40 /liguodong/output/part-r-00000
    [root@master liguodong]# hadoop archive -archiveName liguodong.har -p /liguodong output /liguodong/har
    [root@master liguodong]# hadoop fs -lsr /liguodong
    drwxr-xr-x   - root      hdfs          0 2015-06-03 11:15 /liguodong/har
    drwxr-xr-x   - root      hdfs          0 2015-06-03 11:15 /liguodong/har/liguodong.har
    -rw-r--r--   3 root      hdfs          0 2015-06-03 11:15 /liguodong/har/liguodong.har/_SUCCESS
    -rw-r--r--   5 root      hdfs        254 2015-06-03 11:15 /liguodong/har/liguodong.har/_index
    -rw-r--r--   5 root      hdfs         23 2015-06-03 11:15 /liguodong/har/liguodong.har/_masterindex
    -rw-r--r--   3 root      hdfs         23 2015-06-03 11:15 /liguodong/har/liguodong.har/part-0drwxrwxrwx   - hdfs      hdfs          0 2015-05-04 19:40 /liguodong/output
    -rwxrwxrwx   3 hdfs      hdfs          0 2015-05-04 19:40 /liguodong/output/_SUCCESS
    -rwxrwxrwx   3 hdfs      hdfs         23 2015-05-04 19:40 /liguodong/output/part-r-00000
    [root@master liguodong]# hadoop fs -lsr har:///liguodong/har/liguodong.har
    lsr: DEPRECATED: Please use 'ls -R' instead.
    drwxr-xr-x   - root hdfs          0 2015-05-04 19:40 har:///liguodong/har/liguodong.har/output
    -rw-r--r--   3 root hdfs          0 2015-05-04 19:40 har:///liguodong/har/liguodong.har/output/_SUCCESS
    -rw-r--r--   3 root hdfs         23 2015-05-04 19:40 har:///liguodong/har/liguodong.har/output/part-r-00000
    [root@master liguodong]# hadoop archive -archiveName liguodong2.har -p /liguodong/output /liguodong/har
    [root@master liguodong]# hadoop fs -lsr har:///liguodong/har/liguodong2.har
    -rw-r--r--   3 root hdfs          0 2015-05-04 19:40 har:///liguodong/har/liguodong2.har/_SUCCESS
    -rw-r--r--   3 root hdfs         23 2015-05-04 19:40 har:///liguodong/har/liguodong2.har/part-r-00000
    [root@master /]# hdfs  --help
    Usage: hdfs [–config confdir] COMMAND 
    where COMMAND is one of: 
    dfs run a filesystem command on the file systems supported in Hadoop. 
    namenode -format format the DFS filesystem 
    secondarynamenode run the DFS secondary namenode 
    namenode run the DFS namenode 
    journalnode run the DFS journalnode 
    zkfc run the ZK Failover Controller daemon 
    datanode run a DFS datanode 
    dfsadmin run a DFS admin client 
    haadmin run a DFS HA admin client 
    fsck run a DFS filesystem checking utility 
    balancer run a cluster balancing utility 
    jmxget get JMX exported values from NameNode or DataNode. 
    oiv apply the offline fsimage viewer to an fsimage 
    oev apply the offline edits viewer to an edits file 
    fetchdt fetch a delegation token from the NameNode 
    getconf get config values from configuration 
    groups get the groups which users belong to 
    snapshotDiff diff two snapshots of a directory or diff the 
    current directory contents with a snapshot 
    lsSnapshottableDir list all snapshottable dirs owned by the current user 
    Use -help to see options 
    portmap run a portmap service 
    nfs3 run an NFS version 3 gateway
    [root@master liguodong]# hdfs fsck /liguodong
    Connecting to namenode via http://master:50070
    FSCK started by root (auth:SIMPLE) from / for path /liguodong at Wed Jun 03 10:43:41 CST 2015
    ...........Status: HEALTHY
     Total size:    1559 B
     Total dirs:    7
     Total files:   11
     Total symlinks:                0
     Total blocks (validated):      7 (avg. block size 222 B)
    The filesystem under path '/liguodong' is HEALTHY
    [root@master liguodong]# hdfs fsck /liguodong -files -blocks
    • 1



    命令:hdfs balancer,也可以动过脚本启动均衡器。 

    [root@master liguodong]# hdfs balancer
    • 1

    hdfs dfsadmin


    [root@master liguodong]# hdfs dfsadmin
    Usage: java DFSAdmin
    Note: Administrative commands can only be run as the HDFS superuser.
               [-safemode enter | leave | get | wait]
               [-allowSnapshot <snapshotDir>]
               [-disallowSnapshot <snapshotDir>]
               [-restoreFailedStorage true|false|check]
               [-metasave filename]
               [-refreshNamenodes datanodehost:port]
               [-deleteBlockPool datanode-host:port blockpoolId [force]]
               [-setQuota <quota> <dirname>...<dirname>]
               [-clrQuota <dirname>...<dirname>]
               [-setSpaceQuota <quota> <dirname>...<dirname>]
               [-clrSpaceQuota <dirname>...<dirname>]
               [-setBalancerBandwidth <bandwidth in bytes per second>]
               [-fetchImage <local directory>]
               [-help [cmd]]
    oiv(offline image viewer的缩写),用于将fsimage文件的内容转储到指定文件中以便于阅读,该工具还提供了只读的WebHDFS API以允许离线分析和检查hadoop集群的命名空间。oiv在处理非常大的fsimage文件时是相当快的,如果该工具不能够处理fsimage,它会直接退出。该工具不具备向后兼容性,比如使用hadoop-2.4版本的oiv不能处理hadoop-2.3版本的fsimage,只能使用hadoop-2.3版本的oiv。就像它的名称所提示的(offline),oiv也不需要hadoop集群处于运行状态。oiv具体语法可以通过在命令行输入hdfs oiv查看。


    [root@master current]# pwd
    [root@master current]# hdfs oiv  -i fsimage_0000000000000053234 -o fsimage.ls
    [root@master current]# cat fsimage.ls
    -rwxrwxrwx  3    oozie       hdfs     890168 2015-04-28 17:41 /user/oozie/share/lib/pig/jaxb-impl-2.2.3-1.jar
    -rwxrwxrwx  3    oozie       hdfs     201124 2015-04-28 17:41 /user/oozie/share/lib/pig/jdo-api-3.0.1.jar
    -rwxrwxrwx  3    oozie       hdfs     130458 2015-04-28 17:41 /user/oozie/share/lib/pig/jersey-client-1.9.jar
    [root@master current]# hdfs oiv -i fsimage_0000000000000053234 -p XML -o fsimage.xml
    [root@master current]# more fsimage.xml
    FileDistribution分析命名空间中文件大小的工具。为了运行该工具需要通过指定最大文件大小和段数定义一个整数范围[0,maxSize],该整数范围根据段数分割为若干段[0, s[1], …, s[n-1], maxSize],处理器计算有多少文件落入每个段中([s[i-1], s[i]),大于maxSize的文件总是落入最后的段中,即(s[n-1], maxSize)。输出文件被格式化为由tab分隔的包含Size列和NumFiles列的表,其中Size表示段的起始,NumFiles表示文件大小落入该段的文件数量。在使用FileDistribution处理器时还需要指定该处理器的参数maxSize和step,若未指定,默认为0。

    [root@master current]# hdfs oiv -i fsimage_0000000000000053234 -o fsimage.fd -p FileDistribution 1000 step 5
    Files processed: 1  Current: /app-logs/ambari-qa/logs/application_1430219478244_0003/slave2_45454
    totalFiles = 534
    totalDirectories = 199
    totalBlocks = 537
    totalSpace = 1151394477
    maxFileSize = 119107289
    [root@master current]# more fsimage.fd
    Size    NumFiles
    0       22
    2097152 491
    4194304 13
    6291456 2
    8388608 1
    10485760        3
    12582912        0
    oev是(offline edits viewer(离线edits查看器)的缩写),该工具只操作文件因而并不需要hadoop集群处于运行状态。该工具提供了几个输出处理器,用于将输入文件转换为相关格式的输出文件,可以使用参数-p指定。 

    [root@master current]# hdfs oev -i edits_0000000000000042778-0000000000000042779 -o edits.xml
    [root@master current]# cat edits.xml
    [root@master liguodong]# yarn --help
    Usage: yarn [--config confdir] COMMANDwhere COMMAND is one of:
      resourcemanager      run the ResourceManager
      nodemanager          run a nodemanager on each slave
      rmadmin              admin tools
      version              print the version
      jar <jar>            run a jar fileapplication          prints application(s) report/kill application
      node                 prints node report(s)
      logs                 dump container logs
      classpath            prints the class path needed to get the
                           Hadoop jar and the required libraries
      daemonlog            get/set the log level for each daemon
      CLASSNAME            run the class named CLASSNAME
    Most commands print help when invoked w/o parameters.
    yarn application -list  
    • 1

    如需杀死当前某个作业,使用kill application-id的命令如下:

     yarn application -kill application_1437456051228_1725  
    • 1


    [root@slave1 mapreduce]# mapred
    Usage: mapred [--config confdir] COMMANDwhere COMMAND is one of:
      pipes                run a Pipes job
      job                  manipulate MapReduce jobs
      queue                get information regarding JobQueues
      classpath            prints the class path needed for running
                           mapreduce subcommands
      historyserver        run job history servers as a standalone daemon
      distcp <srcurl> <desturl> copy file or directories recursively
      archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
    Most commands print help when invoked w/o parameters.

