zoukankan      html  css  js  c++  java
  • Atitit hadoop使用总结 目录 1.1. 下载300m ,解压后800M 1 1.2. 二:需要的jar包 1 2. Demo code 2 2.1. WCMapper 2 2.2. WC

    Atitit hadoop使用总结

     

    目录

    1.1. 下载300m ,解压后800M 1

    1.2. 二:需要的jar包 1

    2. Demo code 2

    2.1. WCMapper 2

    2.2. WCReduce 3

    2.3. (3)实现运行驱动 3

    3. Run 设置Hadoop  HADOOP_HOME 6

    3.1. Input txt 6

    3.2. Run output console 6

    3.3. Result output .txt 7

    4. 四:操作流程 jar mode 7

    5. Ref 7

     

     

      1. 下载300m ,解压后800M

     

    HDFS是Hadoop大数据平台中的分布式文件系统,为上层应用或其他大数据组件提供数据存储,如Hive,Mapreduce,Spark,HBase等。

     

     

      1. 二:需要的jar包

     

     

    hadoop-2.4.1\share\hadoop\common\hadoop-common-2.4.1.jar

    hadoop-2.4.1\share\hadoop\common\lib\所有jar包

     

     hadoop-2.4.1\share\hadoop\mapreduce\lib\所有jar包

    ---------------------

     

     

    1. Demo code
      1. WCMapper 

    package hadoopDemo;

     

    import java.io.IOException;

    import java.util.StringTokenizer;

    import org.apache.hadoop.io.IntWritable;

    import org.apache.hadoop.io.Text;

     

    import org.apache.hadoop.io.IntWritable;

    import org.apache.hadoop.io.LongWritable;

    import org.apache.hadoop.io.Text;

    import org.apache.hadoop.mapreduce.Mapper;

     

    import java.io.IOException;

     

    //  public class Mapper<KEYIN, VALUEIN, KEYOUT, VALUEOUT> {

    public class WCMapper extends Mapper<LongWritable, Text, Text, IntWritable> {

     

    // 1.mapper阶段,切片

    // 1).mapper类首先要继承自mapper类,指定输入的key类型,输入的value类型

    // 2).指定输出的key类型,输出的value类型

    // 3).重写map方法

    // 在map方法里面获取的是文本的行号,一行文本的内容,写出的上下文对象

     

     

     

    @Override

    protected void map(LongWritable key, Text value_line, Context context) throws IOException, InterruptedException {

    String line = value_line.toString();

    String[] words = line.split(" ");

    for (String word : words) {

    Text key_Text = new Text();

    IntWritable val_IntWritable = new IntWritable(1);

    key_Text.set(word);

    context.write(key_Text, val_IntWritable);

    }

    }

    }

     

      1. WCReduce 

     

    package hadoopDemo;

     

    import org.apache.hadoop.io.IntWritable;

    import org.apache.hadoop.io.Text;

    import org.apache.hadoop.mapreduce.Reducer;

     

    import com.alibaba.fastjson.JSON;

    import com.google.common.collect.Maps;

     

    import java.io.IOException;

    import java.util.Map;

     

    public class WCReduce extends Reducer<Text,IntWritable,Text,IntWritable> {

        @Override

        protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {

            int sum=0; //定义一个变量来统计单词出现的次数

            for (IntWritable num:values //遍历这个迭代器,累计单词出现的次数

                 ) {

                sum += num.get();

                

                Map  m=Maps.newConcurrentMap();

                m.put("key",key );

                m.put("num",num);

                m.put("sum_curr",sum );

                System.out.println(JSON.toJSONString(m));

            }

            context.write(key,new IntWritable(sum));

        }

    }

     

      1. (3)实现运行驱动

    运行驱动的目的就是在程序中指定用户的Map类和Reduce类,并配置提交给Hadoop时的相关参数。例如实现一个词频统计的wordcount驱动类:MyWordCount.java,其核心代码如下:

     

     

     

    package hadoopDemo;

     

     

     

    import org.apache.hadoop.conf.Configuration;

    import org.apache.hadoop.fs.Path;

    import org.apache.hadoop.io.IntWritable;

    import org.apache.hadoop.io.Text;

    import org.apache.hadoop.mapreduce.Job;

    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

    import java.io.IOException;

     

    public class WCDriver {

        public static void main(String[]  args) throws IOException, ClassNotFoundException, InterruptedException {

        

        

         System.load("D:\\haddop\\hadoop-3.1.1\\bin\\hadoop.dll");

        //创建Job作业

            Job job  = Job.getInstance(new Configuration());

        //设置驱动类

            job.setJarByClass(WCDriver.class);

            //设置mapper类、reduce类

            job.setMapperClass(WCMapper.class);

            job.setReducerClass(WCReduce.class);

            //设置map阶段输出的key类型、value类型

            job.setMapOutputKeyClass(Text.class);

            job.setMapOutputValueClass(IntWritable.class);

            //设置reduce阶段输出key类型、value类型

            job.setOutputKeyClass(Text.class);

            job.setOutputValueClass(IntWritable.class);

            //设置读取文件路径、输出文件路径

            String path_ipt ="D:\\workspace\\hadoopDemo\\ipt.txt";

    FileInputFormat.setInputPaths(job, new Path(path_ipt));

            String path_out = "D:\\workspace\\hadoopDemo\\out.txt";

    FileOutputFormat.setOutputPath(job, new Path(path_out));

            //等待提交作业

            boolean result = job.waitForCompletion(true);

            System.out.println(result);

            while(true)

            {

             Thread.sleep(5000);

             System.out.println("..");

            }

        //    System.exit(result ? 0 : 1);

        }

    }

     

     

     

    import org.apache.hadoop.conf.Conf?iguration;

    import org.apache.hadoop.fs.Path;

    import org.apache.hadoop.io.IntWritable;

    import org.apache.hadoop.io.Text;

    import org.apache.hadoop.mapreduce.Job;

    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

    public class MyWordCount {

       public static void main(String[] args) throws Exception {

         Conf?iguration conf = new Conf?iguration();

         Job job = new Job(conf, "word count");

         job.setJarByClass(MyWordCount.class);

         job.setMapperClass(WordcountMapper.class);

         job.setCombinerClass(WordcountReducer.class);

         job.setReducerClass(WordcountReducer.class);

         job.setOutputKeyClass(Text.class);

         job.setOutputValueClass(IntWritable.class);

         FileInputFormat.addInputPath(job, new Path(args[0]));

         FileOutputFormat.setOutputPath(job, new Path(args[1]));

         System.exit(job.waitForCompletion(true) ? 0 : 1);

       }

    }

    从上述核心代码中可以看出,需要在main函数中设置输入/输出路径的参数,同时为了提交作业,需要job对象,并在job对象中指定作业名称、Map类、Reduce类,以及键值的类型等参数。来源:CUUG官网

     

    1. Run 设置Hadoop  HADOOP_HOME

    可以通过附加下面的命令到 ~/.bashrc 文件中设置 Hadoop 环境变量。

    export HADOOP_HOME=/usr/local/hadoop

    Eclipse envi only can cfg in run cfg ..

     

      1. Input txt 

     

    aaa bbb ccc aaa

     

      1. Run output console

    {"num":{},"sum_curr":1,"key":{"bytes":"YWFh","length":3}}

    {"num":{},"sum_curr":2,"key":{"bytes":"YWFh","length":3}}

    {"num":{},"sum_curr":1,"key":{"bytes":"YmJi","length":3}}

    {"num":{},"sum_curr":1,"key":{"bytes":"Y2Nj","length":3}}

     

      1. Result output .txt

    D:\workspace\hadoopDemo\out.txt\part-r-00000  file

    aaa 2

    bbb 1

    ccc 1

     

    1. 四:操作流程 jar mode

     

    1、将项目打成jar包上传到虚拟机上 if use jar mode

     

    运行jar文件

     

     

    1. Ref

    Mapreduce实例---统计单词个数(wordcount) - Tyshawn的博客 - CSDN博客.html

    MapperReduce入门Wordcount案例 - 小刘的博客 - CSDN博客.html

  • 相关阅读:
    trackr: An AngularJS app with a Java 8 backend – Part III
    trackr: An AngularJS app with a Java 8 backend – Part II
    21. Wireless tools (无线工具 5个)
    20. Web proxies (网页代理 4个)
    19. Rootkit detectors (隐形工具包检测器 5个)
    18. Fuzzers (模糊测试器 4个)
    16. Antimalware (反病毒 3个)
    17. Debuggers (调试器 5个)
    15. Password auditing (密码审核 12个)
    14. Encryption tools (加密工具 8个)
  • 原文地址:https://www.cnblogs.com/attilax/p/15197508.html
Copyright © 2011-2022 走看看