zoukankan      html  css  js  c++  java
  • Hadoop测试例子wordcount

      1、建立一个测试的目录  

    [root@localhost hadoop-1.1.1]# bin/hadoop dfs -mkdir /hadoop/input
    

      2、建立测试文件

    [root@localhost test]# vi test.txt
    
    hello hadoop
    hello World
    Hello Java
    Hey man
    i am a programmer

      3、将测试文件放到测试目录中

    [root@localhost hadoop-1.1.1]# bin/hadoop dfs -put ./test/test.txt /hadoop/input

      4、执行wordcount程序

    [root@localhost hadoop-1.1.1]# bin/hadoop jar hadoop-examples-1.1.1.jar wordcount /hadoop/input/* /hadoop/output

        /hadoop/output目录必须不存在,否则会报错:

    org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory /hadoop/output already exists

        因为Hadoop执行的是耗费资源的运算,产生的结果默认是不能被覆盖的。

        执行成功的话,显示下面的信息:

    [root@localhost hadoop-1.1.1]# bin/hadoop jar hadoop-examples-1.1.1.jar wordcount /hadoop/input/* /hadoop/output
    
    13/01/17 00:36:06 INFO input.FileInputFormat: Total input paths to process : 1
    13/01/17 00:36:06 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    13/01/17 00:36:06 WARN snappy.LoadSnappy: Snappy native library not loaded
    13/01/17 00:36:07 INFO mapred.JobClient: Running job: job_201301162205_0006
    13/01/17 00:36:08 INFO mapred.JobClient:  map 0% reduce 0%
    13/01/17 00:36:14 INFO mapred.JobClient:  map 100% reduce 0%
    13/01/17 00:36:22 INFO mapred.JobClient:  map 100% reduce 33%
    13/01/17 00:36:24 INFO mapred.JobClient:  map 100% reduce 100%
    13/01/17 00:36:25 INFO mapred.JobClient: Job complete: job_201301162205_0006
    13/01/17 00:36:25 INFO mapred.JobClient: Counters: 29
    13/01/17 00:36:25 INFO mapred.JobClient:   Job Counters 
    13/01/17 00:36:25 INFO mapred.JobClient:     Launched reduce tasks=1
    13/01/17 00:36:25 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=6863
    13/01/17 00:36:25 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
    13/01/17 00:36:25 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
    13/01/17 00:36:25 INFO mapred.JobClient:     Launched map tasks=1
    13/01/17 00:36:25 INFO mapred.JobClient:     Data-local map tasks=1
    13/01/17 00:36:25 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=9207
    13/01/17 00:36:25 INFO mapred.JobClient:   File Output Format Counters 
    13/01/17 00:36:25 INFO mapred.JobClient:     Bytes Written=78
    13/01/17 00:36:25 INFO mapred.JobClient:   FileSystemCounters
    13/01/17 00:36:25 INFO mapred.JobClient:     FILE_BYTES_READ=128
    13/01/17 00:36:25 INFO mapred.JobClient:     HDFS_BYTES_READ=170
    13/01/17 00:36:25 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=48059
    13/01/17 00:36:25 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=78
    13/01/17 00:36:25 INFO mapred.JobClient:   File Input Format Counters 
    13/01/17 00:36:25 INFO mapred.JobClient:     Bytes Read=62
    13/01/17 00:36:25 INFO mapred.JobClient:   Map-Reduce Framework
    13/01/17 00:36:25 INFO mapred.JobClient:     Map output materialized bytes=128
    13/01/17 00:36:25 INFO mapred.JobClient:     Map input records=5
    13/01/17 00:36:25 INFO mapred.JobClient:     Reduce shuffle bytes=128
    13/01/17 00:36:25 INFO mapred.JobClient:     Spilled Records=22
    13/01/17 00:36:25 INFO mapred.JobClient:     Map output bytes=110
    13/01/17 00:36:25 INFO mapred.JobClient:     CPU time spent (ms)=1650
    13/01/17 00:36:25 INFO mapred.JobClient:     Total committed heap usage (bytes)=176492544
    13/01/17 00:36:25 INFO mapred.JobClient:     Combine input records=12
    13/01/17 00:36:25 INFO mapred.JobClient:     SPLIT_RAW_BYTES=108
    13/01/17 00:36:25 INFO mapred.JobClient:     Reduce input records=11
    13/01/17 00:36:25 INFO mapred.JobClient:     Reduce input groups=11
    13/01/17 00:36:25 INFO mapred.JobClient:     Combine output records=11
    13/01/17 00:36:25 INFO mapred.JobClient:     Physical memory (bytes) snapshot=180088832
    13/01/17 00:36:25 INFO mapred.JobClient:     Reduce output records=11
    13/01/17 00:36:25 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=756244480
    13/01/17 00:36:25 INFO mapred.JobClient:     Map output records=12
    [root@localhost hadoop-1.1.1]# 

      5、查看结果

        wordcount程序统计目标文件中的单词个数,将结果输出到/hadoop/output/part-r-00000文件中

    [root@localhost hadoop-1.1.1]# bin/hadoop dfs -ls /hadoop/output
    
    Found 3 items
    -rw-r--r--   1 root supergroup          0 2013-01-17 00:36 /hadoop/output/_SUCCESS
    drwxr-xr-x   - root supergroup          0 2013-01-17 00:36 /hadoop/output/_logs
    -rw-r--r--   1 root supergroup         78 2013-01-17 00:36 /hadoop/output/part-r-00000
    [root@localhost hadoop-1.1.1]#
    [root@localhost hadoop-1.1.1]# bin/hadoop dfs -cat /hadoop/output/part-r-00000
    
    Hello   1
    Hey     1
    Java    1
    World   1
    a       1
    am      1
    hadoop  1
    hello   2
    i       1
    man     1
    programmer      1
    [root@localhost hadoop-1.1.1]#
  • 相关阅读:
    open-falcon之agent
    centos 7 部署 open-falcon 0.2.0
    高可用Redis服务架构分析与搭建
    python操作mongo脚本
    mongo查询日期格式数据
    离线下载pip包安装
    mongo同步到es
    mongo ttl索引
    kibana多台服务部署
    logstash过滤配置
  • 原文地址:https://www.cnblogs.com/luxh/p/2863612.html
Copyright © 2011-2022 走看看