zoukankan      html  css  js  c++  java
  • 「hadoop」win7 idea maven hadoop 运行WordCount示例

    运行一个简单的hadoop实例,已自测成功。

    假设已安装如下环境:

    1、win7跑三台ubuntu虚拟机,虚拟机已成功安装hadoop2.8.1环境;

    2、win7安装idea工具 idea2017;

    3、win7安装hadoop2.8.1环境,并已配置相关的环境变量;

    4、拷贝windows用的已编译好的hadoop.dll和winutils.exe,务必注意一定要是2.8.1版本的, 参考 https://github.com/steveloughran/winutils

    【步骤】

    1、参考 http://blog.csdn.net/u011654631/article/details/70037219,该地址简称 参考页;

    2、idea创建maven的java工程;

    3、按参考页pom.xml中集成相应的hadoop jar包;(有hadoop-mapreduce-client-core,hadoop-hdfs,hadoop-mapreduce-client-jobclient(务必去掉provideed控制),hadoop-mapreduce-client-common,hadoop-common。

    4、最后通过$hdfs dfs -cat /test/out/part-r-00000查看统计结果。

    WordCount代码

    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.conf.Configured;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.LongWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
    import org.apache.hadoop.util.Tool;
    import org.apache.hadoop.util.ToolRunner;
    
    import java.io.IOException;
    
    public class WordCount extends Configured implements Tool {
        public int run(String[] strings) throws Exception {
            try {
                System.setProperty("hadoop.home.dir", "C:\LearnTool\hadoop");
                System.setProperty("HADOOP_USER_NAME", "chendajian");
    
                Configuration conf = getConf();
                conf.set("mapreduce.job.jar", "C:\Workspace\javaweb\hadoop\out\artifacts\hadoop_jar\hadoop.jar");
    //            conf.set("yarn.resourcemanager.hostname", "10.0.10.231");
                conf.set("mapreduce.app-submission.cross-platform", "true");
    
                Job job = Job.getInstance(conf);
                job.setJarByClass(WordCount.class);
    
                job.setOutputKeyClass(Text.class);
                job.setOutputValueClass(LongWritable.class);
    
                job.setMapperClass(WcMapper.class);
                job.setReducerClass(WcReducer.class);
    
                job.setInputFormatClass(TextInputFormat.class);
                job.setOutputFormatClass(TextOutputFormat.class);
    
                // 清空out
                FileSystem fs = FileSystem.get(conf);
                String out = "hdfs://10.0.10.231:9000/test/out";
                Path outPath = new Path(out);
                if (fs.exists(outPath)) {
                    fs.delete(outPath, true);
                }
    
                FileInputFormat.setInputPaths(job, "hdfs://master:9000/test/testvim.txt");
                FileOutputFormat.setOutputPath(job, new Path(out));
    
                job.waitForCompletion(true);
            } catch (Exception e) {
                e.printStackTrace();
            }
            return 0;
        }
    
        public static class WcMapper extends Mapper<LongWritable, Text, Text, LongWritable> {
            @Override
            protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
                String mVal = value.toString();
                context.write(new Text(mVal), new LongWritable(1));
            }
        }
    
        public static class WcReducer extends Reducer<Text, LongWritable, Text, LongWritable> {
            @Override
            protected void reduce(Text key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {
                long sum = 0;
                for (LongWritable lVal : values) {
                    sum += lVal.get();
                }
                context.write(key, new LongWritable(sum));
            }
        }
    
        public static void main(String[] args) throws Exception {
            ToolRunner.run(new WordCount(), args);
        }
    }
    View Code

    几点补充:

    1、把core-site.xml,mapred-site.xml,yarn-site.xml等拷到工程的resources目录下;

    2、如遇到 hdfs:master:9000 访问refused,用IP地址替换master试试;

    3、input文件位于hdfs系统内,linux只能通过hdfs dfs方式访问;

    4、2.8.1版本的hadoop.dll和winutils.exe需另行下载, 参考 https://github.com/steveloughran/winutils;

    5、用户权限问题,win7增加环境变量 HADOOP_USER_NAME, 值为 hadoop的用户名;

    6、增加日志打印配置文件log4j.xml,放到工程的resources目录下,xml内容参考 http://www.cnblogs.com/ftrako/p/7570094.html

    7、pom.xml中的hadoop-mapreduce-client-jobclient依赖中去掉provide控制,会导致不会使用YARN模式,而使用local模式;

  • 相关阅读:
    Socket的应用案例
    利用XStream实现对象XML话
    策略模式
    深入理解Java引用类型
    java 消息机制 ActiveMQ入门实例
    activity工作流表结构分析
    Spring MVC 之 Hello World
    如何发布Web项目到互联网
    ionic开发ios app
    ionic开发android app步骤
  • 原文地址:https://www.cnblogs.com/ftrako/p/7570072.html
Copyright © 2011-2022 走看看