zoukankan      html  css  js  c++  java
  • Hadoop之HelloWorld

    Hadoop开始:

    1. 下载最新的发行版,解压到你喜欢的路径。

    2. 配置,Hadoop的配置文件位于~/hadoop/conf/ 目录下。这里我先只配置了core-site.xml文件。

     1 <?xml version="1.0"?>
     2 <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
     3 
     4 <!-- Put site-specific property overrides in this file. -->
     5 
     6 <configuration>
     7     <property>
     8         <name>fs.default.name</name>
     9         <value>hdfs://localhost:9000</value>
    10     </property>
    11     <property>
    12         <name>hadoop.tmp.dir</name>
    13         <value>/home/Jack/dfs</value>
    14     </property>
    15 </configuration>

    上面我指定了hadoop的DFS文件系统的路径。

    3. 格式化DFS系统,输入命令: > ./hadoop namenode -format

    4. 启动Hadoop,输入命令: > ./start-all.sh

    **到这里Hadoop的启动已经正常,可以在端口50070和50030查看集群的状态。

    ======================================================================

    第一个程序:HadoopHelloWorld

    import java.io.IOException;
    import java.util.*;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    
    public class HadoopHelloWorld {
        
        public static class Map extends MapReduceBase implements Mapper<LongWritable,Text,Text,IntWritable> {
            private final static IntWritable one=new IntWritable(1);
            private Text word=new Text();
    
            public void map(LongWritable key, Text value, OutputCollector<Text,IntWritable> output, Reporter reporter)
            throws IOException {
                String line= value.toString();
                StringTokenizer tokenizer=new StringTokenizer(line);
                while(tokenizer.hasMoreTokens()) {
                    word.set(tokenizer.nextToken());
                    output.collect(word, one);
                }
            }
        }
        
        public static class Reduce extends MapReduceBase implements Reducer<Text,IntWritable,Text,IntWritable> {
            public void reduce(Text key,Iterator<IntWritable> values,OutputCollector<Text,IntWritable>output, Reporter reporter)
            throws IOException{
                int sum=0;
                while(values.hasNext()) {
                    sum+=values.next().get();
                }
                output.collect(key, new IntWritable(sum));
        
            }
        }
        
        public static void main(String args[]) throws Exception {
            JobConf conf=new JobConf(HadoopHelloWorld.class);
            conf.setJobName("wordcount");
            
            conf.setOutputKeyClass(Text.class);
            conf.setOutputValueClass(IntWritable.class);
            
            conf.setMapperClass(Map.class);
            conf.setReducerClass(Reduce.class);
            
            conf.setInputFormat(TextInputFormat.class);
            conf.setOutputFormat(TextOutputFormat.class);
            
            FileInputFormat.setInputPaths(conf, new Path(args[0]));
            FileOutputFormat.setOutputPath(conf, new Path(args[1]));
            
            JobClient.runJob(conf);    
        }
    
    }
    HadoopHelloWorld

    需要引入的基础包:

    JRE system Library

    Hadoop-core.jar 

    commons-logging.jar

    说明一下,别的文档中没有将需要commons-logging.jar 这个包,可以我的没有这个包一直报错。java.lang.NoClassDefFoundError: org/apache/commons/logging/LogFactory

    以上工作做好了之后,编译HadoopHelloWorld.java文件就好,将生成的class文件放入文件夹~/source/java2013/HadoopHelloWorld/,然后打成一个jar包。

    [Jack@win bin]$ jar -cvf HadoopHelloWorld.jar -C ~/source/java2013/HadoopHelloWorld/ .

    上传2个input文件作为程序输入[ file01,file02 ]。

    [Jack@win bin]$./ hadoop fs -mkdir input

    [Jack@win bin]$ ./hadoop dfs -put ~/source/java2012/FirstJar/input/file* input

    运行程序:

    [Jack@win bin]$./hadoop jar HadoopHelloWorld.jar HadoopHelloWorld input output

    13/06/20 03:16:44 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
    13/06/20 03:16:45 INFO util.NativeCodeLoader: Loaded the native-hadoop library
    13/06/20 03:16:45 WARN snappy.LoadSnappy: Snappy native library not loaded
    13/06/20 03:16:45 INFO mapred.FileInputFormat: Total input paths to process : 4
    13/06/20 03:16:45 INFO mapred.JobClient: Running job: job_201306200226_0002
    13/06/20 03:16:46 INFO mapred.JobClient: map 0% reduce 0%
    13/06/20 03:16:59 INFO mapred.JobClient: map 40% reduce 0%
    13/06/20 03:17:05 INFO mapred.JobClient: map 80% reduce 0%
    13/06/20 03:17:08 INFO mapred.JobClient: map 80% reduce 26%
    13/06/20 03:17:11 INFO mapred.JobClient: map 100% reduce 26%
    13/06/20 03:17:23 INFO mapred.JobClient: map 100% reduce 100%
    13/06/20 03:17:28 INFO mapred.JobClient: Job complete: job_201306200226_0002
    13/06/20 03:17:28 INFO mapred.JobClient: Counters: 30
    13/06/20 03:17:28 INFO mapred.JobClient: Job Counters 
    13/06/20 03:17:28 INFO mapred.JobClient: Launched reduce tasks=1
    13/06/20 03:17:28 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=32074
    13/06/20 03:17:28 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
    13/06/20 03:17:28 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
    13/06/20 03:17:28 INFO mapred.JobClient: Launched map tasks=5
    13/06/20 03:17:28 INFO mapred.JobClient: Data-local map tasks=3
    13/06/20 03:17:28 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=23534
    13/06/20 03:17:28 INFO mapred.JobClient: File Input Format Counters 
    13/06/20 03:17:28 INFO mapred.JobClient: Bytes Read=54
    13/06/20 03:17:28 INFO mapred.JobClient: File Output Format Counters 
    13/06/20 03:17:28 INFO mapred.JobClient: Bytes Written=41
    13/06/20 03:17:28 INFO mapred.JobClient: FileSystemCounters
    13/06/20 03:17:28 INFO mapred.JobClient: FILE_BYTES_READ=104
    13/06/20 03:17:28 INFO mapred.JobClient: HDFS_BYTES_READ=541
    13/06/20 03:17:28 INFO mapred.JobClient: FILE_BYTES_WRITTEN=128481
    13/06/20 03:17:28 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=41
    13/06/20 03:17:28 INFO mapred.JobClient: Map-Reduce Framework
    13/06/20 03:17:28 INFO mapred.JobClient: Map output materialized bytes=128
    13/06/20 03:17:28 INFO mapred.JobClient: Map input records=2
    13/06/20 03:17:28 INFO mapred.JobClient: Reduce shuffle bytes=122
    13/06/20 03:17:28 INFO mapred.JobClient: Spilled Records=16
    13/06/20 03:17:28 INFO mapred.JobClient: Map output bytes=82
    13/06/20 03:17:28 INFO mapred.JobClient: Total committed heap usage (bytes)=912719872
    13/06/20 03:17:28 INFO mapred.JobClient: CPU time spent (ms)=5190
    13/06/20 03:17:28 INFO mapred.JobClient: Map input bytes=50
    13/06/20 03:17:28 INFO mapred.JobClient: SPLIT_RAW_BYTES=487
    13/06/20 03:17:28 INFO mapred.JobClient: Combine input records=0
    13/06/20 03:17:28 INFO mapred.JobClient: Reduce input records=8
    13/06/20 03:17:28 INFO mapred.JobClient: Reduce input groups=5
    13/06/20 03:17:28 INFO mapred.JobClient: Combine output records=0
    13/06/20 03:17:28 INFO mapred.JobClient: Physical memory (bytes) snapshot=932745216
    13/06/20 03:17:28 INFO mapred.JobClient: Reduce output records=5
    13/06/20 03:17:28 INFO mapred.JobClient: Virtual memory (bytes) snapshot=2390478848
    13/06/20 03:17:28 INFO mapred.JobClient: Map output records=8
    Result
  • 相关阅读:
    玩转Android之手摸手教你DIY一个抢红包神器!
    NetWork——关于TCP协议的三次握手和四次挥手
    请保持心情快乐,请保持情绪稳定
    第八节:Task的各类Task<TResult>返回值以及通用线程的异常处理方案。
    第七节:利用CancellationTokenSource实现任务取消和利用CancellationToken类检测取消异常。
    第六节:深入研究Task实例方法ContinueWith的参数TaskContinuationOptions
    第五节:Task构造函数之TaskCreationOptions枚举处理父子线程之间的关系。
    第四节:Task的启动的四种方式以及Task、TaskFactory的线程等待和线程延续的解决方案
    第三节:ThreadPool的线程开启、线程等待、线程池的设置、定时功能
    第二节:深入剖析Thread的五大方法、数据槽、内存栅栏。
  • 原文地址:https://www.cnblogs.com/jackhub/p/3146113.html
Copyright © 2011-2022 走看看