zoukankan      html  css  js  c++  java
  • HDFS命令整理持续添加

    hdfs中有很多常用命令,持续记录一下。

    基本命令 

    基本命令就是hadoop fs开头或hdfs dfs开头,两者效果相同,可以通过'hadoop fs -help 命令'或'hdfs dfs -help 命令'来查看具体命令的解释。

    [hadoop@node01 ~]$ hadoop fs
    Usage: hadoop fs [generic options]
        [-appendToFile <localsrc> ... <dst>]
        [-cat [-ignoreCrc] <src> ...]
        [-checksum <src> ...]
        [-chgrp [-R] GROUP PATH...]
        [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
        [-chown [-R] [OWNER][:[GROUP]] PATH...]
        [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
        [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
        [-count [-q] [-h] [-v] [-x] <path> ...]
        [-cp [-f] [-p | -p[topax]] <src> ... <dst>]
        [-createSnapshot <snapshotDir> [<snapshotName>]]
        [-deleteSnapshot <snapshotDir> <snapshotName>]
        [-df [-h] [<path> ...]]
        [-du [-s] [-h] [-x] <path> ...]
        [-expunge]
        [-find <path> ... <expression> ...]
        [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
        [-getfacl [-R] <path>]
        [-getfattr [-R] {-n name | -d} [-e en] <path>]
        [-getmerge [-nl] <src> <localdst>]
        [-help [cmd ...]]
        [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [<path> ...]]
        [-mkdir [-p] <path> ...]
        [-moveFromLocal <localsrc> ... <dst>]
        [-moveToLocal <src> <localdst>]
        [-mv <src> ... <dst>]
        [-put [-f] [-p] [-l] <localsrc> ... <dst>]
        [-renameSnapshot <snapshotDir> <oldName> <newName>]
        [-rm [-f] [-r|-R] [-skipTrash] <src> ...]
        [-rmdir [--ignore-fail-on-non-empty] <dir> ...]
        [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
        [-setfattr {-n name [-v value] | -x name} <path>]
        [-setrep [-R] [-w] <rep> <path> ...]
        [-stat [format] <path> ...]
        [-tail [-f] <file>]
        [-test -[defsz] <path>]
        [-text [-ignoreCrc] <src> ...]
        [-touchz <path> ...]
        [-usage [cmd ...]]
    
    Generic options supported are
    -conf <configuration file>     specify an application configuration file
    -D <property=value>            use value for given property
    -fs <local|namenode:port>      specify a namenode
    -jt <local|resourcemanager:port>    specify a ResourceManager
    -files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
    -libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
    -archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
    
    The general command line syntax is
    bin/hadoop command [genericOptions] [commandOptions]
    
    You have new mail in /var/spool/mail/root

    比较常用的通用选项就是-ls、-put、-get、-cat、-rm、-mkdir等。

    (1)设置副本数,可以使用如下命令设置临时生效,永久生效需要在hdfs-site.xml中配置。

    # 5个副本
    [hadoop@node01 ~]$ hadoop fs -setrep -R 5 /readme.txt
    20/02/12 19:11:09 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Replication 5 set: /readme.txt
    You have new mail in /var/spool/mail/root
    # 默认3,变成5
    [hadoop@node01 ~]$ hadoop fs -ls 5 /readme.txt
    20/02/12 19:11:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    ls: `5': No such file or directory
    -rw-r--r--   5 hadoop supergroup         36 2019-10-23 22:58 /readme.txt

    ...TOADD

    hdfs和getconf使用

    (1)获取namenode节点名称

    [hadoop@node01 ~]$ hdfs getconf -namenodes
    20/02/12 18:50:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    node01

    (2)获取hdfs最小块信息

    [hadoop@node01 ~]$ hdfs getconf -confKey dfs.namenode.fs-limits.min-block-size
    20/02/12 18:51:41 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    1048576

    (3)获取hdfs的namenode的RPC地址

    [hadoop@node01 ~]$ hdfs getconf -nnRpcAddresses
    20/02/12 18:52:43 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    node01:8020

    ...TOADD

    hdfs和dfsadmin使用

    这个结合使用将获取超级权限。

    [hadoop@node01 ~]$ hdfs dfsadmin
    Usage: hdfs dfsadmin
    Note: Administrative commands can only be run as the HDFS superuser.
        [-report [-live] [-dead] [-decommissioning]]
        [-safemode <enter | leave | get | wait>]
        [-saveNamespace]
        [-rollEdits]
        [-restoreFailedStorage true|false|check]
        [-refreshNodes]
        [-setQuota <quota> <dirname>...<dirname>]
        [-clrQuota <dirname>...<dirname>]
        [-setSpaceQuota <quota> <dirname>...<dirname>]
        [-clrSpaceQuota <dirname>...<dirname>]
        [-finalizeUpgrade]
        [-rollingUpgrade [<query|prepare|finalize>]]
        [-refreshServiceAcl]
        [-refreshUserToGroupsMappings]
        [-refreshSuperUserGroupsConfiguration]
        [-refreshCallQueue]
        [-refresh <host:ipc_port> <key> [arg1..argn]
        [-reconfig <datanode|...> <host:ipc_port> <start|status|properties>]
        [-printTopology]
        [-refreshNamenodes datanode_host:ipc_port]
        [-deleteBlockPool datanode_host:ipc_port blockpoolId [force]]
        [-setBalancerBandwidth <bandwidth in bytes per second>]
        [-fetchImage <local directory>]
        [-allowSnapshot <snapshotDir>]
        [-disallowSnapshot <snapshotDir>]
        [-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
        [-getDatanodeInfo <datanode_host:ipc_port>]
        [-metasave filename]
        [-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
        [-listOpenFiles]
        [-help [cmd]]
    
    Generic options supported are
    -conf <configuration file>     specify an application configuration file
    -D <property=value>            use value for given property
    -fs <local|namenode:port>      specify a namenode
    -jt <local|resourcemanager:port>    specify a ResourceManager
    -files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
    -libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
    -archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
    
    The general command line syntax is
    bin/hadoop command [genericOptions] [commandOptions]
    
    You have new mail in /var/spool/mail/root

    (1)查看当前模式,是否是安全模式,进入安全模式后,需手动离开安全模式

    # 查看解释
    [hadoop@node01 ~]$ hdfs dfsadmin -help safemode
    -safemode <enter|leave|get|wait>:  Safe mode maintenance command.
            Safe mode is a Namenode state in which it
                1.  does not accept changes to the name space (read-only)
                2.  does not replicate or delete blocks.
            Safe mode is entered automatically at Namenode startup, and
            leaves safe mode automatically when the configured minimum
            percentage of blocks satisfies the minimum replication
            condition.  Safe mode can also be entered manually, but then
            it can only be turned off manually as well.
    # 进入安全模式
    [hadoop@node01 ~]$ hdfs dfsadmin -safemode enter
    20/02/12 18:58:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Safe mode is ON
    You have new mail in /var/spool/mail/root
    # 离开安全模式
    [hadoop@node01 ~]$ hdfs dfsadmin -safemode leave
    20/02/12 18:58:57 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Safe mode is OFF
    # 查看状态
    [hadoop@node01 ~]$ hdfs dfsadmin -safemode get
    20/02/12 18:59:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Safe mode is OFF

    (2)可以配合hdfs dfsadmin完成文件快照的创建。

    文件快照:

    ① 可以对目录和整个hdfs系统做快照

    ②无意删除重要文件后,可以通过快照恢复

    ③快照不会拷贝block数据,只对block列表和文件大小做快照

    允许快照

    # 应用在目录
    [root@hadoop01 ~]# hdfs dfsadmin -allowSnapshot /testSnapshot 20/02/16 21:35:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Allowing snaphot on /testSnapshot succeeded

    创建快照

    # 创建快照,快照名mysnapshot
    [root@hadoop01 ~]# hdfs dfs -createSnapshot /testSnapshot mysnapshot 20/02/16 21:37:03 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Created snapshot /testSnapshot/.snapshot/mysnapshot

    查看快照

    # 查看快照,快照在隐藏目录.snapshot下
    [root@hadoop01 ~]# hadoop fs -ls /testSnapshot/.snapshot 20/02/16 21:37:31 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items drwxr-xr-x - root supergroup 0 2020-02-16 21:37 /testSnapshot/.snapshot/mysnapshot

    重命名快照

    [root@hadoop01 ~]# hdfs dfs -renameSnapshot /testSnapshot mysnapshot newnameSnapshot
    20/02/16 21:38:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    [root@hadoop01 ~]# hadoop fs -ls /testSnapshot/.snapshot
    20/02/16 21:39:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Found 1 items
    drwxr-xr-x   - root supergroup          0 2020-02-16 21:37 /testSnapshot/.snapshot/newnameSnapshot

    模拟误删文件

    [root@hadoop01 ~]# hadoop fs -rm /testSnapshot/log.txt
    20/02/16 21:39:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    20/02/16 21:39:49 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
    Moved: 'hdfs://hadoop01:9000/testSnapshot/log.txt' to trash at: hdfs://hadoop01:9000/user/root/.Trash/Current

    通过快照恢复

    # 确认删除了lot.txt文件
    [root@hadoop01 ~]# hadoop fs -ls /testSnapshot 20/02/16 21:40:05 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable # 恢复文件在快照中
    [root@hadoop01
    ~]# hadoop fs -ls /testSnapshot/.snapshot/newnameSnapshot 20/02/16 21:40:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 root supergroup 12 2020-02-16 21:31 /testSnapshot/.snapshot/newnameSnapshot/log.txt
    # 恢复快照 [root@hadoop01
    ~]# hadoop fs -cp /testSnapshot/.snapshot/newnameSnapshot/log.txt /testSnapshot 20/02/16 21:43:13 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 20/02/16 21:43:14 WARN hdfs.DFSClient: DFSInputStream has been closed already
    # 文件恢复 [root@hadoop01
    ~]# hadoop fs -ls /testSnapshot 20/02/16 21:43:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable Found 1 items -rw-r--r-- 1 root supergroup 12 2020-02-16 21:43 /testSnapshot/log.txt

    快照删除

    # 删除快照
    [root@hadoop01 ~]# hdfs dfs -deleteSnapshot /testSnapshot newnameSnapshot 20/02/16 21:57:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable # 快照已删除
    [root@hadoop01
    ~]# hadoop fs -ls /testSnapshot/.snapshot 20/02/16 21:57:51 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

    ...TOADD

    hdfs和fsck使用

    查看命令提示,可以根据它获取一些关键信息。

    [hadoop@node01 ~]$ hdfs fsck
    Usage: DFSck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]] [-maintenance]
        <path>    start checking from this path
        -move    move corrupted files to /lost+found
        -delete    delete corrupted files
        -files    print out files being checked
        -openforwrite    print out files opened for write
        -includeSnapshots    include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it
        -list-corruptfileblocks    print out list of missing blocks and files they belong to
        -blocks    print out block report
        -locations    print out locations for every block
        -racks    print out network topology for data-node locations
    
        -maintenance    print out maintenance state node details
        -blockId    print out which file this blockId belongs to, locations (nodes, racks) of this block, and other diagnostics info (under replicated, corrupted or not, etc)
    
    Please Note:
        1. By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually  tagged CORRUPT or HEALTHY depending on their block allocation status
        2. Option -includeSnapshots should not be used for comparing stats, should be used only for HEALTH check, as this may contain duplicates if the same file present in both original fs tree and inside snapshots.
    
    Generic options supported are
    -conf <configuration file>     specify an application configuration file
    -D <property=value>            use value for given property
    -fs <local|namenode:port>      specify a namenode
    -jt <local|resourcemanager:port>    specify a ResourceManager
    -files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
    -libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
    -archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
    
    The general command line syntax is
    bin/hadoop command [genericOptions] [commandOptions]
    
    Generic options supported are
    -conf <configuration file>     specify an application configuration file
    -D <property=value>            use value for given property
    -fs <local|namenode:port>      specify a namenode
    -jt <local|resourcemanager:port>    specify a ResourceManager
    -files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
    -libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
    -archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.
    
    The general command line syntax is
    bin/hadoop command [genericOptions] [commandOptions]

    (1)显示hdfs块信息

    [hadoop@node01 ~]$ hdfs fsck  /readme.txt -files -blocks -locations
    20/02/12 19:03:00 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Connecting to namenode via http://node01:50070/fsck?ugi=hadoop&files=1&blocks=1&locations=1&path=%2Freadme.txt
    FSCK started by hadoop (auth:SIMPLE) from /192.168.200.100 for path /readme.txt at Wed Feb 12 19:03:01 CST 2020
    /readme.txt 36 bytes, 1 block(s):  OK
    # blockpool信息,副本在数据节点的信息
    0. BP-1783492158-192.168.200.100-1571483500510:blk_1073741846_1022 len=36 Live_repl=3 [DatanodeInfoWithStorage[192.168.200.100:50010,DS-76c283ae-9025-4959-98bc-69d064c3f3ef,DISK], DatanodeInfoWithStorage[192.168.200.120:50010,DS-467b3182-8386-4beb-ad37-14392958e81a,DISK], DatanodeInfoWithStorage[192.168.200.110:50010,DS-ba33d23f-c8b4-4d4d-bd64-78a86e6dc2ac,DISK]]
    
    Status: HEALTHY
     Total size:    36 B
     Total dirs:    0
     Total files:    1
     Total symlinks:        0
     Total blocks (validated):    1 (avg. block size 36 B)
     Minimally replicated blocks:    1 (100.0 %)
     Over-replicated blocks:    0 (0.0 %)
     Under-replicated blocks:    0 (0.0 %)
     Mis-replicated blocks:        0 (0.0 %)
     Default replication factor:    3
     Average block replication:    3.0
     Corrupt blocks:        0
     Missing replicas:        0 (0.0 %)
     Number of data-nodes:        3
     Number of racks:        1
    FSCK ended at Wed Feb 12 19:03:01 CST 2020 in 4 milliseconds
    
    # 健康
    The filesystem under path '/readme.txt' is HEALTHY

    (2)查看是否有文件损坏

    [hadoop@node01 ~]$ hdfs fsck -list-corruptfileblocks
    20/02/12 19:13:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Connecting to namenode via http://node01:50070/fsck?ugi=hadoop&listcorruptfileblocks=1&path=%2F
    The filesystem under path '/' has 0 CORRUPT files

    ...TOADD

    其他命令

    (1)检查本地安装的压缩库

    [hadoop@node01 ~]$ hadoop checknative
    20/02/12 19:06:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Native library checking:
    hadoop:  false
    zlib:    false
    snappy:  false
    lz4:     false
    bzip2:   false
    openssl: false

    (2)格式化节点名称,一般只在初次搭建集群使用

    hadoop namenode -format

    (3)执行jar包

    hadoop jar jar包名 类的全路径名 [参数]

    (4)小文件压缩和解压缩

    小文件治理,一般有打成har包和使用sequence file两种方式,如果是打成har包或解压缩,需要用到如下命令。

    # 先准备好文件
    [root@hadoop01 /home]# hadoop fs -ls -R /archive
    20/02/16 10:41:55 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    drwxr-xr-x   - root supergroup          0 2020-02-16 10:41 /archive/th1
    -rw-r--r--   1 root supergroup          0 2020-02-16 10:40 /archive/th1/1.txt
    -rw-r--r--   1 root supergroup          0 2020-02-16 10:41 /archive/th1/2.txt
    drwxr-xr-x   - root supergroup          0 2020-02-16 10:41 /archive/th2
    -rw-r--r--   1 root supergroup          0 2020-02-16 10:41 /archive/th2/3.txt
    -rw-r--r--   1 root supergroup          0 2020-02-16 10:41 /archive/th2/4.txt
    # 打成har包命名提示
    # -p 指定父目录
    # -r 指定副本数
    # src 子目录
    # 打包后路径
    [root@hadoop01 /home]# hadoop archive
    archive -archiveName <NAME>.har -p <parent path> [-r <replication factor>]<src>* <dest>
    
    Invalid usage.
    # 打包 ,/archive下的th1和th2都打包
    [root@hadoop01 /home]# hadoop archive -archiveName test.har -p /archive -r 1 th1 th2 /outhar
    20/02/16 10:44:04 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    20/02/16 10:44:05 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.200.140:8032
    20/02/16 10:44:07 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.200.140:8032
    20/02/16 10:44:07 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.200.140:8032
    20/02/16 10:44:07 INFO mapreduce.JobSubmitter: number of splits:1
    20/02/16 10:44:07 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1581820473891_0001
    20/02/16 10:44:08 INFO impl.YarnClientImpl: Submitted application application_1581820473891_0001
    20/02/16 10:44:08 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1581820473891_0001/
    20/02/16 10:44:08 INFO mapreduce.Job: Running job: job_1581820473891_0001
    20/02/16 10:44:16 INFO mapreduce.Job: Job job_1581820473891_0001 running in uber mode : true
    20/02/16 10:44:16 INFO mapreduce.Job:  map 0% reduce 0%
    20/02/16 10:44:19 INFO mapreduce.Job:  map 100% reduce 100%
    20/02/16 10:44:19 INFO mapreduce.Job: Job job_1581820473891_0001 completed successfully
    20/02/16 10:44:19 INFO mapreduce.Job: Counters: 52
        File System Counters
            FILE: Number of bytes read=1014
            FILE: Number of bytes written=1537
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=1196
            HDFS: Number of bytes written=257101
            HDFS: Number of read operations=67
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=15
        Job Counters
            Launched map tasks=1
            Launched reduce tasks=1
            Other local map tasks=1
            Total time spent by all maps in occupied slots (ms)=820
            Total time spent by all reduces in occupied slots (ms)=422
            TOTAL_LAUNCHED_UBERTASKS=2
            NUM_UBER_SUBMAPS=1
            NUM_UBER_SUBREDUCES=1
            Total time spent by all map tasks (ms)=820
            Total time spent by all reduce tasks (ms)=422
            Total vcore-seconds taken by all map tasks=820
            Total vcore-seconds taken by all reduce tasks=422
            Total megabyte-seconds taken by all map tasks=839680
            Total megabyte-seconds taken by all reduce tasks=432128
        Map-Reduce Framework
            Map input records=7
            Map output records=7
            Map output bytes=471
            Map output materialized bytes=491
            Input split bytes=116
            Combine input records=0
            Combine output records=0
            Reduce input groups=7
            Reduce shuffle bytes=491
            Reduce input records=7
            Reduce output records=0
            Spilled Records=14
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=126
            CPU time spent (ms)=1200
            Physical memory (bytes) snapshot=516005888
            Virtual memory (bytes) snapshot=5989122048
            Total committed heap usage (bytes)=262676480
        Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
        File Input Format Counters
            Bytes Read=467
        File Output Format Counters
            Bytes Written=0
    # 查看archive文件
    [root@hadoop01 /home]# hdfs dfs -ls -R har:///outhar/test.har
    20/02/16 10:44:48 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    20/02/16 10:44:49 WARN hdfs.DFSClient: DFSInputStream has been closed already
    drwxr-xr-x   - root supergroup          0 2020-02-16 10:41 har:///outhar/test.har/th1
    -rw-r--r--   1 root supergroup          0 2020-02-16 10:40 har:///outhar/test.har/th1/1.txt
    -rw-r--r--   1 root supergroup          0 2020-02-16 10:41 har:///outhar/test.har/th1/2.txt
    drwxr-xr-x   - root supergroup          0 2020-02-16 10:41 har:///outhar/test.har/th2
    -rw-r--r--   1 root supergroup          0 2020-02-16 10:41 har:///outhar/test.har/th2/3.txt
    -rw-r--r--   1 root supergroup          0 2020-02-16 10:41 har:///outhar/test.har/th2/4.txt
    # 解压缩,顺序解压
    [root@hadoop01 /home]# hdfs dfs -cp har:///outhar/test.har/th1 hdfs:/unarchive1
    20/02/16 10:45:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    20/02/16 10:45:37 WARN hdfs.DFSClient: DFSInputStream has been closed already
    20/02/16 10:45:37 WARN hdfs.DFSClient: DFSInputStream has been closed already
    20/02/16 10:45:37 WARN hdfs.DFSClient: DFSInputStream has been closed already
    # 查看已解压ok
    [root@hadoop01 /home]# hadoop fs -ls /unarchive1
    20/02/16 10:45:53 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Found 2 items
    -rw-r--r--   1 root supergroup          0 2020-02-16 10:45 /unarchive1/1.txt
    -rw-r--r--   1 root supergroup          0 2020-02-16 10:45 /unarchive1/2.txt
    # 加压缩方式,并行mr解压
    [root@hadoop01 /home]# hadoop distcp har:///outhar/test.har/th2 hdfs:/unarchive2
    20/02/16 10:46:26 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    20/02/16 10:46:26 INFO tools.DistCp: Input Options: DistCpOptions{atomicCommit=false, syncFolder=false, deleteMissing=false, ignoreFailures=false, maxMaps=20, sslConfigurationFile='null', copyStrategy='uniformsize', sourceFileListing=null, sourcePaths=[har:/outhar/test.har/th2], targetPath=hdfs:/unarchive2, targetPathExists=false, preserveRawXattrs=false}
    20/02/16 10:46:26 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.200.140:8032
    20/02/16 10:46:27 WARN hdfs.DFSClient: DFSInputStream has been closed already
    20/02/16 10:46:27 INFO Configuration.deprecation: io.sort.mb is deprecated. Instead, use mapreduce.task.io.sort.mb
    20/02/16 10:46:27 INFO Configuration.deprecation: io.sort.factor is deprecated. Instead, use mapreduce.task.io.sort.factor
    20/02/16 10:46:27 INFO client.RMProxy: Connecting to ResourceManager at hadoop01/192.168.200.140:8032
    20/02/16 10:46:28 INFO mapreduce.JobSubmitter: number of splits:1
    20/02/16 10:46:28 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1581820473891_0002
    20/02/16 10:46:28 INFO impl.YarnClientImpl: Submitted application application_1581820473891_0002
    20/02/16 10:46:28 INFO mapreduce.Job: The url to track the job: http://hadoop01:8088/proxy/application_1581820473891_0002/
    20/02/16 10:46:28 INFO tools.DistCp: DistCp job-id: job_1581820473891_0002
    20/02/16 10:46:28 INFO mapreduce.Job: Running job: job_1581820473891_0002
    20/02/16 10:46:35 INFO mapreduce.Job: Job job_1581820473891_0002 running in uber mode : true
    20/02/16 10:46:35 INFO mapreduce.Job:  map 0% reduce 0%
    20/02/16 10:46:36 INFO mapreduce.Job:  map 100% reduce 0%
    20/02/16 10:46:37 INFO mapreduce.Job: Job job_1581820473891_0002 completed successfully
    20/02/16 10:46:37 INFO mapreduce.Job: Counters: 35
        File System Counters
            FILE: Number of bytes read=0
            FILE: Number of bytes written=0
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=1077
            HDFS: Number of bytes written=126136
            HDFS: Number of read operations=78
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=10
        Job Counters
            Launched map tasks=1
            Other local map tasks=1
            Total time spent by all maps in occupied slots (ms)=1451
            Total time spent by all reduces in occupied slots (ms)=0
            TOTAL_LAUNCHED_UBERTASKS=1
            NUM_UBER_SUBMAPS=1
            Total time spent by all map tasks (ms)=1451
            Total vcore-seconds taken by all map tasks=1451
            Total megabyte-seconds taken by all map tasks=1485824
        Map-Reduce Framework
            Map input records=3
            Map output records=0
            Input split bytes=135
            Spilled Records=0
            Failed Shuffles=0
            Merged Map outputs=0
            GC time elapsed (ms)=51
            CPU time spent (ms)=560
            Physical memory (bytes) snapshot=155303936
            Virtual memory (bytes) snapshot=2993799168
            Total committed heap usage (bytes)=25538560
        File Input Format Counters
            Bytes Read=461
        File Output Format Counters
            Bytes Written=0
        org.apache.hadoop.tools.mapred.CopyMapper$Counter
            BYTESCOPIED=0
            BYTESEXPECTED=0
            COPY=3
    # 解压ok
    [root@hadoop01 /home]# hadoop fs -ls /unarchive2
    20/02/16 10:47:02 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Found 2 items
    -rw-r--r--   1 root supergroup          0 2020-02-16 10:46 /unarchive2/3.txt
    -rw-r--r--   1 root supergroup          0 2020-02-16 10:46 /unarchive2/4.txt

    以上,持续添加中。

  • 相关阅读:
    POST GET原理函数
    位宽与带宽
    编程小工具
    C#的四个基本技巧
    关闭弹出模态窗口以后刷新父窗口
    十年技术,不要再迷茫
    冒泡排序
    单元测试工具及资源推荐
    xml xhtml html dhtml的区别
    删除List<string>中重复的值
  • 原文地址:https://www.cnblogs.com/youngchaolin/p/12300387.html
Copyright © 2011-2022 走看看