  • Hadoop Shell 介绍

    以 hadoop 2.7.3 为例  

    bin 目录下是最基础的集群管理脚本, 用户可通过该脚本完成各种功能, 如 HDFS 管理, MapReduce 作业管理等.

      作为入门, 先介绍bin 目录下的 hadoop 脚本的使用方法, 如下所示:   参考 官网的 Hadoop 命令参考 

    Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
      CLASSNAME            run the class named CLASSNAME
      where COMMAND is one of:
      fs                   run a generic filesystem user client
      version              print the version
      jar <jar>            run a jar file
                           note: please use "yarn jar" to launch
                                 YARN applications, not this command.
      checknative [-a|-h]  check native hadoop and compression libraries availability
      distcp <srcurl> <desturl> copy file or directories recursively
      archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
      classpath            prints the class path needed to get the
      credential           interact with credential providers
                           Hadoop jar and the required libraries
      daemonlog            get/set the log level for each daemon
      trace                view and modify Hadoop tracing settings
    Most commands print help when invoked w/o parameters.

            hadoop 对应在 hadoop-2.7.3/bin/hadoop , 相关 shell 代码如下: ( fs 对应 org.apache.hadoop.fs.FsShell , jar 对应 org.apache.hadoop.util.RunJar )

    // 这段在 hadoop-2.7.3/bin/hadoop
        # the core commands
        if [ "$COMMAND" = "fs" ] ; then
        elif [ "$COMMAND" = "version" ] ; then
        elif [ "$COMMAND" = "jar" ] ; then
          if [[ -n "${YARN_OPTS}" ]] || [[ -n "${YARN_CLIENT_OPTS}" ]]; then
            echo "WARNING: Use "yarn jar" to launch YARN applications." 1>&2

      bin 目录下的 hdfs 脚本的使用方法, 如下所示:  参考 官网的 HDFS 命令参考 

    Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
           where COMMAND is one of:
      dfs                  run a filesystem command on the file systems supported in Hadoop.
      classpath            prints the classpath
      namenode -format     format the DFS filesystem
      secondarynamenode    run the DFS secondary namenode
      namenode             run the DFS namenode
      journalnode          run the DFS journalnode
      zkfc                 run the ZK Failover Controller daemon
      datanode             run a DFS datanode
      dfsadmin             run a DFS admin client
      haadmin              run a DFS HA admin client
      fsck                 run a DFS filesystem checking utility
      balancer             run a cluster balancing utility
      jmxget               get JMX exported values from NameNode or DataNode.
      mover                run a utility to move block replicas across
                           storage types
      oiv                  apply the offline fsimage viewer to an fsimage
      oiv_legacy           apply the offline fsimage viewer to an legacy fsimage
      oev                  apply the offline edits viewer to an edits file
      fetchdt              fetch a delegation token from the NameNode
      getconf              get config values from configuration
      groups               get the groups which users belong to
      snapshotDiff         diff two snapshots of a directory or diff the
                           current directory contents with a snapshot
      lsSnapshottableDir   list all snapshottable dirs owned by the current user
                            Use -help to see options
      portmap              run a portmap service
      nfs3                 run an NFS version 3 gateway
      cacheadmin           configure the HDFS cache
      crypto               configure HDFS encryption zones
      storagepolicies      list/get/set block storage policies
      version              print the version
    Most commands print help when invoked w/o parameters.

      bin 目录下的 mapred 脚本的使用方法, 如下所示: 参考 官网的 MapReduce 命令参考 

    Usage: mapred [--config confdir] [--loglevel loglevel] COMMAND
           where COMMAND is one of:
      pipes                run a Pipes job
      job                  manipulate MapReduce jobs
      queue                get information regarding JobQueues
      classpath            prints the class path needed for running
                           mapreduce subcommands
      historyserver        run job history servers as a standalone daemon
      distcp <srcurl> <desturl> copy file or directories recursively
      archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
      hsadmin              job history server admin interface
    Most commands print help when invoked w/o parameters.

      bin 目录下的 yarn 脚本的使用方法, 如下所示: 参考 官网的 YARN 命令 

    Usage: yarn [--config confdir] [COMMAND | CLASSNAME]
      CLASSNAME                             run the class named CLASSNAME
      where COMMAND is one of:
      resourcemanager -format-state-store   deletes the RMStateStore
      resourcemanager                       run the ResourceManager
      nodemanager                           run a nodemanager on each slave
      timelineserver                        run the timeline server
      rmadmin                               admin tools
      sharedcachemanager                    run the SharedCacheManager daemon
      scmadmin                              SharedCacheManager admin tools
      version                               print the version
      jar <jar>                             run a jar file
      application                           prints application(s)
                                            report/kill application
      applicationattempt                    prints applicationattempt(s)
      container                             prints container(s) report
      node                                  prints node report(s)
      queue                                 prints queue information
      logs                                  dump container logs
      classpath                             prints the class path needed to
                                            get the Hadoop jar and the
                                            required libraries
      cluster                               prints cluster information
      daemonlog                             get/set the log level for each
    Most commands print help when invoked w/o parameters.

      bin 目录下的 rcc 脚本的使用方法, 如下所示:

    Usage: rcc --language [java|c++] ddl-files

      其中, --config 用于设置Hadoop 配置文件目录. 默认目录为 ${HADOOP_HOME}/etc/hadoop. 而 COMMAND 是具体的某个命令, 常用的是 hadoop 的管理命令 fs, 作业提交命令 jar 等. CLASSNAME 指运行名为 CLASSNAME 的类 .

