zoukankan      html  css  js  c++  java
  • [b0004] Hadoop 版hello word mapreduce wordcount 运行

    目的:

    初步感受一下hadoop mapreduce

    环境:

    hadoop 2.6.4 

    1 准备输入文件

    paper.txt 内容一般为英文文章,随便弄点什么进去
    hadoop@ssmaster:~$ hadoop fs -mkdir /input
    hadoop@ssmaster:~$ ls
    Desktop  Documents  Downloads  examples.desktop  hadoop-2.6.4.tar.gz  Music  paper.txt  Pictures  Public  Templates  Videos
    hadoop@ssmaster:~$ hadoop fs -put paper.txt  /input
    hadoop@ssmaster:~$ hadoop fs -ls /input
    Found 1 items
    -rw-r--r--   1 hadoop supergroup       1762 2016-10-23 00:45 /input/paper.txt

    注意:输出目录/output 不用提前创建,程序会自动做这一步

    2  执行

    hadoop@ssmaster:~$ hadoop jar /opt/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.4.jar  wordcount /input /output
    16/10/23 00:51:09 INFO client.RMProxy: Connecting to ResourceManager at ssmaster/192.168.249.144:8032
    16/10/23 00:51:11 INFO input.FileInputFormat: Total input paths to process : 1
    16/10/23 00:51:12 INFO mapreduce.JobSubmitter: number of splits:1
    16/10/23 00:51:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1477208120905_0001
    16/10/23 00:51:14 INFO impl.YarnClientImpl: Submitted application application_1477208120905_0001
    16/10/23 00:51:14 INFO mapreduce.Job: The url to track the job: http://ssmaster:8088/proxy/application_1477208120905_0001/
    16/10/23 00:51:14 INFO mapreduce.Job: Running job: job_1477208120905_0001
    16/10/23 00:51:38 INFO mapreduce.Job: Job job_1477208120905_0001 running in uber mode : false
    16/10/23 00:51:38 INFO mapreduce.Job:  map 0% reduce 0%

    6/10/23 00:51:38 INFO mapreduce.Job: map 0% reduce 0%
    16/10/23 00:52:17 INFO mapreduce.Job: map 100% reduce 0%
    16/10/23 00:52:39 INFO mapreduce.Job: map 100% reduce 100%
    16/10/23 00:52:41 INFO mapreduce.Job: Job job_1477208120905_0001 completed successfully
    16/10/23 00:52:41 INFO mapreduce.Job: Counters: 49
    File System Counters
    FILE: Number of bytes read=2061
    FILE: Number of bytes written=217797
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=1863
    HDFS: Number of bytes written=1425
    HDFS: Number of read operations=6
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=2
    Job Counters
    Launched map tasks=1
    Launched reduce tasks=1
    Data-local map tasks=1
    Total time spent by all maps in occupied slots (ms)=35792
    Total time spent by all reduces in occupied slots (ms)=18540
    Total time spent by all map tasks (ms)=35792
    Total time spent by all reduce tasks (ms)=18540
    Total vcore-milliseconds taken by all map tasks=35792
    Total vcore-milliseconds taken by all reduce tasks=18540
    Total megabyte-milliseconds taken by all map tasks=36651008
    Total megabyte-milliseconds taken by all reduce tasks=18984960
    Map-Reduce Framework
    Map input records=11
    Map output records=303
    Map output bytes=2969
    Map output materialized bytes=2061
    Input split bytes=101
    Combine input records=303
    Combine output records=158
    Reduce input groups=158
    Reduce shuffle bytes=2061
    Reduce input records=158
    Reduce output records=158
    Spilled Records=316
    Shuffled Maps =1
    Failed Shuffles=0
    Merged Map outputs=1
    GC time elapsed (ms)=1093
    CPU time spent (ms)=5550
    Physical memory (bytes) snapshot=442781696
    Virtual memory (bytes) snapshot=1448112128
    Total committed heap usage (bytes)=276299776
    Shuffle Errors
    BAD_ID=0
    CONNECTION=0
    IO_ERROR=0
    WRONG_LENGTH=0
    WRONG_MAP=0
    WRONG_REDUCE=0
    File Input Format Counters
    Bytes Read=1762
    File Output Format Counters
    Bytes Written=1425

    可以从Web监控页面查看执行状态

    http://ssmaster:8088/cluster

    Cluster Metrics

    Apps SubmittedApps PendingApps RunningApps CompletedContainers RunningMemory UsedMemory TotalMemory ReservedVCores UsedVCores TotalVCores ReservedActive NodesDecommissioned NodesLost NodesUnhealthy NodesRebooted Nodes
    1 0 1 0 2 3 GB 8 GB 0 B 2 8 0 1 0 0 0 0
     
    ID
    User
    Name
    Application Type
    Queue
    StartTime
    FinishTime
    State
    FinalStatus
    Progress
    Tracking UI
    Blacklisted Nodes
    application_1477208120905_0001 hadoop word count MAPREDUCE default Sun, 23 Oct 2016 07:51:13 GMT N/A RUNNING UNDEFINED   ApplicationMaster 0

    3 查看输出结果

    hadoop@ssmaster:~$ hadoop fs -ls /output
    Found 2 items
    -rw-r--r--   1 hadoop supergroup          0 2016-10-23 00:52 /output/_SUCCESS
    -rw-r--r--   1 hadoop supergroup       1425 2016-10-23 00:52 /output/part-r-00000
    hadoop@ssmaster:~$ hadoop fs -cat  /output/part-r-00000
    Always    1
    Dream    1
    There    1
    a    4
    all    1
    along    1
    always    1
    ...........
    ...........

    Q 总结

    非常简单,没什么感觉。

    后续:

    •     自己编写mapreduce wordcount 程序
    •     搭建一个纯分布式,同样的程序处理一个大文件,观察一下速度
  • 相关阅读:
    [OpenCV]基于arm64和Python2、Python3的opencv-python-contrib编译
    [Jupyter_Notebook]Windows下Jupyter-Notebook更换默认目录
    【Vmware】NAT模式下网络无法连接
    COCO数据集转mask
    [COCO数据集]关于instances中的分割信息按部分类别进行获取及保存
    Leetcode147-对链表进行插入排序(Python3实现)
    Leetcode1415-长度为 n 的开心字符串中字典序第 k 小的字符串(Python3实现)
    Leetcode1353-最多可以参加的会议数目(Python3实现)
    RabbitMQ 官方NET教程(六)【RPC】
    RabbitMQ 官方NET教程(五)【Topic】
  • 原文地址:https://www.cnblogs.com/sunzebo/p/5990175.html
Copyright © 2011-2022 走看看