zoukankan      html  css  js  c++  java
  • hadoop运行wordcount实例,hdfs简单操作

    1.查看hadoop版本

    [hadoop@ltt1 sbin]$ hadoop version
    Hadoop 2.6.0-cdh5.12.0
    Subversion http://github.com/cloudera/hadoop -r dba647c5a8bc5e09b572d76a8d29481c78d1a0dd
    Compiled by jenkins on 2017-06-29T11:33Z
    Compiled with protoc 2.5.0
    From source with checksum 7c45ae7a4592ce5af86bc4598c5b4
    This command was run using /home/hadoop/hadoop260/share/hadoop/common/hadoop-common-2.6.0-cdh5.12.0.jar

    2.通过hadoop自带的jar文件,可以简单测试一些功能。

    提君博客原创

    查看hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar文件所支持的MapReduce功能列表

    [hadoop@ltt1 sbin]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar
    An example program must be given as the first argument.
    Valid program names are:
      aggregatewordcount: An Aggregate based map/reduce program that counts the words in the input files.
      aggregatewordhist: An Aggregate based map/reduce program that computes the histogram of the words in the input files.
      bbp: A map/reduce program that uses Bailey-Borwein-Plouffe to compute exact digits of Pi.
      dbcount: An example job that count the pageview counts from a database.
      distbbp: A map/reduce program that uses a BBP-type formula to compute exact bits of Pi.
      grep: A map/reduce program that counts the matches of a regex in the input.
      join: A job that effects a join over sorted, equally partitioned datasets
      multifilewc: A job that counts words from several files.
      pentomino: A map/reduce tile laying program to find solutions to pentomino problems.
      pi: A map/reduce program that estimates Pi using a quasi-Monte Carlo method.
      randomtextwriter: A map/reduce program that writes 10GB of random textual data per node.
      randomwriter: A map/reduce program that writes 10GB of random data per node.
      secondarysort: An example defining a secondary sort to the reduce.
      sort: A map/reduce program that sorts the data written by the random writer.
      sudoku: A sudoku solver.
      teragen: Generate data for the terasort
      terasort: Run the terasort
      teravalidate: Checking results of terasort
      wordcount: A map/reduce program that counts the words in the input files.
      wordmean: A map/reduce program that counts the average length of the words in the input files.
      wordmedian: A map/reduce program that counts the median length of the words in the input files.
      wordstandarddeviation: A map/reduce program that counts the standard deviation of the length of the words in the input files.

    3.在hdfs上创建文件夹

    hadoop fs -mkdir /input

    4.查看hdfs的更目录列表

    [hadoop@ltt1 ~]$ hadoop fs -ls /
    Found 2 items
    drwxr-xr-x - hadoop supergroup 0 2017-09-17 08:11 /input
    drwx------ - hadoop supergroup 0 2017-09-17 08:07 /tmp

    5.上传本地文件到hdfs

    hadoop fs -put $HADOOP_HOME/*.txt /input

    6.查看hdfs上input目录下文件

    [hadoop@ltt1 ~]$ hadoop fs -ls /input
    Found 3 items
    -rw-r--r--   2 hadoop supergroup      85063 2017-09-17 08:15 /input/LICENSE.txt
    -rw-r--r--   2 hadoop supergroup      14978 2017-09-17 08:15 /input/NOTICE.txt
    -rw-r--r--   2 hadoop supergroup       1366 2017-09-17 08:15 /input/README.txt

    7.wordcount简单测试。

    提君博客原创

    [hadoop@ltt1 ~]$ hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0-cdh5.12.0.jar wordcount /input /output
    17/09/17 08:19:12 INFO input.FileInputFormat: Total input paths to process : 3
    17/09/17 08:19:13 INFO mapreduce.JobSubmitter: number of splits:3
    17/09/17 08:19:13 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1505605169997_0002
    17/09/17 08:19:14 INFO impl.YarnClientImpl: Submitted application application_1505605169997_0002
    17/09/17 08:19:14 INFO mapreduce.Job: The url to track the job: http://ltt1.bg.cn:9180/proxy/application_1505605169997_0002/
    17/09/17 08:19:14 INFO mapreduce.Job: Running job: job_1505605169997_0002
    17/09/17 08:19:27 INFO mapreduce.Job: Job job_1505605169997_0002 running in uber mode : false
    17/09/17 08:19:27 INFO mapreduce.Job:  map 0% reduce 0%
    17/09/17 08:19:39 INFO mapreduce.Job:  map 33% reduce 0%
    17/09/17 08:19:48 INFO mapreduce.Job:  map 100% reduce 0%
    17/09/17 08:19:50 INFO mapreduce.Job:  map 100% reduce 100%
    17/09/17 08:19:50 INFO mapreduce.Job: Job job_1505605169997_0002 completed successfully
    17/09/17 08:19:50 INFO mapreduce.Job: Counters: 50
    >>提君博客原创  http://www.cnblogs.com/tijun/  <<
    File System Counters FILE: Number of bytes read=42705 FILE: Number of bytes written=588235 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=101699 HDFS: Number of bytes written=30167 HDFS: Number of read operations=12 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=3 Launched reduce tasks=1 Data-local map tasks=2 Rack-local map tasks=1 Total time spent by all maps in occupied slots (ms)=47617 Total time spent by all reduces in occupied slots (ms)=8244 Total time spent by all map tasks (ms)=47617 Total time spent by all reduce tasks (ms)=8244 Total vcore-milliseconds taken by all map tasks=47617 Total vcore-milliseconds taken by all reduce tasks=8244 Total megabyte-milliseconds taken by all map tasks=48759808 Total megabyte-milliseconds taken by all reduce tasks=8441856 Map-Reduce Framework Map input records=2035 Map output records=14239 Map output bytes=155828 Map output materialized bytes=42717 Input split bytes=292 Combine input records=14239 Combine output records=2653 Reduce input groups=2402 Reduce shuffle bytes=42717 Reduce input records=2653 Reduce output records=2402 Spilled Records=5306 Shuffled Maps =3 Failed Shuffles=0 Merged Map outputs=3 GC time elapsed (ms)=881 CPU time spent (ms)=22320 Physical memory (bytes) snapshot=690192384 Virtual memory (bytes) snapshot=10862809088 Total committed heap usage (bytes)=380243968 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=101407 File Output Format Counters Bytes Written=30167

    8.查看wordcount运行结果(由于结果太长,只举出了部分结果)

    [hadoop@ltt1 ~]$ hadoop fs -cat /output/*
    worldwide,    4
    would    1
    writing    2
    writing,    4
    written    19
    xmlenc    1
    year    1
    you    12
    your    5
    zlib    1
     252.227-7014(a)(1))    1
    §    1
    “AS    1
    “Contributor    1
    “Contributor”    1
    “Covered    1
    “Executable”    1
    “Initial    1
    “Larger    1
    “Licensable”    1
    “License”    1
    “Modifications”    1
    “Original    1
    “Participant”)    1
    “Patent    1
    “Source    1
    “Your”)    1
    “You”    2
    “commercial    3
    “control”    1

    >>提君博客原创  http://www.cnblogs.com/tijun/  <<

    至此,通过一个wordcount的一个小栗子,简介实践了一下hdfs的创建文件夹,上传文件,查看目录,运行wordcount实例。

    提君博客原创

    >>提君博客原创  http://www.cnblogs.com/tijun/  <<

  • 相关阅读:
    转自 Because of you 的总结
    转自 Good morning 的几句精辟的话
    (转)一句话小结各种网络流)
    上下界网络流总结
    浮云洲之战
    Poj3680 Intervals
    NOI2008假面舞会
    NOI2010航空管制
    python爬虫之反爬虫(随机user-agent,获取代理ip,检测代理ip可用性)
    python爬虫之反爬虫(随机user-agent,获取代理ip,检测代理ip可用性)
  • 原文地址:https://www.cnblogs.com/tijun/p/7544228.html
Copyright © 2011-2022 走看看