zoukankan      html  css  js  c++  java
  • 执行hadoop自带的WordCount实例

    hadoop 自带的WordCount实例可以统计一批文本文件中各单词出现的次数。
    下面介绍如何执行WordCount实例。

    1.启动hadoop

    [root@hadoop ~]# start-all.sh #启动hadoop

    2.在本地新建目录及2个文件

    [root@hadoop ~]# mkdir input 
    [root@hadoop ~]# cd input/
    [root@hadoop input]# echo "hello world">test1.txt #新建2个测试文件
    [root@hadoop input]# echo "hello hadoop">test2.txt

    3.将本地文件系统上的input目录复制到HDFS根目录下,重命名为in

    [root@hadoop ~]# hdfs dfs -put input/ /in
    [root@hadoop ~]# hdfs dfs -ls / #查看根目录
    Found 1 items
    drwxr-xr-x - root supergroup 0 2018-07-20 03:06 /in
    [root@hadoop ~]# hdfs dfs -ls /in #查看in根目录
    Found 2 items
    -rw-r--r-- 1 root supergroup 12 2018-07-20 03:06 /in/test1.txt
    -rw-r--r-- 1 root supergroup 13 2018-07-20 03:06 /in/test2.txt

    4.执行以下命令

    [root@hadoop ~]# cd /usr/local/hadoop/share/hadoop/mapreduce/ #示例jar包在此目录中存放
    [root@hadoop mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.7.jar wordcount /in /out #out为输出目录,执行命令之前必须为空或者不存在否则报错
    [root@hadoop ~]# cd /usr/local/hadoop/share/hadoop/mapreduce/ #示例jar包在此目录中存放
    [root@hadoop mapreduce]# hadoop jar hadoop-mapreduce-examples-2.7.7.jar wordcount /in /out
    18/07/30 14:02:11 INFO client.RMProxy: Connecting to ResourceManager at hadoop/192.168.42.133:8032
    18/07/30 14:02:13 INFO input.FileInputFormat: Total input paths to process : 2
    18/07/30 14:02:13 INFO mapreduce.JobSubmitter: number of splits:2
    18/07/30 14:02:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1532913019648_0002
    18/07/30 14:02:14 INFO impl.YarnClientImpl: Submitted application application_1532913019648_0002
    18/07/30 14:02:14 INFO mapreduce.Job: The url to track the job: http://hadoop:8088/proxy/application_1532913019648_0002/
    18/07/30 14:02:14 INFO mapreduce.Job: Running job: job_1532913019648_0002
    18/07/30 14:02:36 INFO mapreduce.Job: Job job_1532913019648_0002 running in uber mode : false
    18/07/30 14:02:36 INFO mapreduce.Job:  map 0% reduce 0%
    18/07/30 14:04:37 INFO mapreduce.Job:  map 67% reduce 0%
    18/07/30 14:04:42 INFO mapreduce.Job:  map 100% reduce 0%
    18/07/30 14:05:21 INFO mapreduce.Job:  map 100% reduce 100%
    18/07/30 14:05:23 INFO mapreduce.Job: Job job_1532913019648_0002 completed successfully
    18/07/30 14:05:26 INFO mapreduce.Job: Counters: 49
        File System Counters
            FILE: Number of bytes read=55
            FILE: Number of bytes written=368074
            FILE: Number of read operations=0
            FILE: Number of large read operations=0
            FILE: Number of write operations=0
            HDFS: Number of bytes read=217
            HDFS: Number of bytes written=25
            HDFS: Number of read operations=9
            HDFS: Number of large read operations=0
            HDFS: Number of write operations=2
        Job Counters 
            Launched map tasks=2
            Launched reduce tasks=1
            Data-local map tasks=2
            Total time spent by all maps in occupied slots (ms)=259093
            Total time spent by all reduces in occupied slots (ms)=21736
            Total time spent by all map tasks (ms)=259093
            Total time spent by all reduce tasks (ms)=21736
            Total vcore-milliseconds taken by all map tasks=259093
            Total vcore-milliseconds taken by all reduce tasks=21736
            Total megabyte-milliseconds taken by all map tasks=265311232
            Total megabyte-milliseconds taken by all reduce tasks=22257664
        Map-Reduce Framework
            Map input records=2
            Map output records=4
            Map output bytes=41
            Map output materialized bytes=61
            Input split bytes=192
            Combine input records=4
            Combine output records=4
            Reduce input groups=3
            Reduce shuffle bytes=61
            Reduce input records=4
            Reduce output records=3
            Spilled Records=8
            Shuffled Maps =2
            Failed Shuffles=0
            Merged Map outputs=2
            GC time elapsed (ms)=847
            CPU time spent (ms)=4390
            Physical memory (bytes) snapshot=461631488
            Virtual memory (bytes) snapshot=6226669568
            Total committed heap usage (bytes)=277356544
        Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
        File Input Format Counters 
            Bytes Read=25
        File Output Format Counters 
            Bytes Written=25
    执行命令时显示MapReduce过程

    5.查看输出结果

    1)直接查看HDFS上的输出文件

    [root@hadoop mapreduce]# hdfs dfs -ls /out
    Found 2 items
    -rw-r--r--   1 root supergroup          0 2018-07-30 14:05 /out/_SUCCESS
    -rw-r--r--   1 root supergroup         25 2018-07-30 14:05 /out/part-r-00000
    [root@hadoop mapreduce]# hdfs dfs -cat /out/part-r-00000
    hadoop    1
    hello    2
    world    1

    2)也可以输入以下命令查看

    [root@hadoop mapreduce]# hdfs dfs -cat /out/*
    hadoop    1
    hello    2
    world    1

    3)还可以把文件复制到本地查看

    [root@hadoop mapreduce]# hdfs dfs -get /out /root/output
    [root@hadoop mapreduce]# cd  /root/output/
    [root@hadoop output]# ll
    总用量 4
    -rw-r--r-- 1 root root 25 7月  30 17:18 part-r-00000
    -rw-r--r-- 1 root root  0 7月  30 17:18 _SUCCESS
    [root@hadoop output]# cat part-r-00000 
    hadoop    1
    hello    2
    world    1
  • 相关阅读:
    BNUOJ 19792 Airport Express
    Poor Hanamichi
    BNUOJ 1206 A Plug for UNIX
    HDU 3507 Print Article
    一个程序猿试用有道云笔记VIP功能体验
    Cloud Foundry Session Affinity(Sticky Session)的实现
    SAP成都研究院廖婧:SAP C4C社交媒体集成概述
    SAP订单编排和流程增强概述
    在Kubernetes上运行SAP UI5应用(上)
    Docker入门系列之三:如何将dockerfile制作好的镜像发布到Docker hub上
  • 原文地址:https://www.cnblogs.com/zhengna/p/9391775.html
Copyright © 2011-2022 走看看