zoukankan      html  css  js  c++  java
  • Hadoop Windows IDEA

    java jdk1.8都可以了
    注意jdk的路径要拷贝到一个没有空格的路径改掉JAVA_HOME系统环境变量
    在etc/hadoop/hadoop_env.cmd里有设置%JAVA_HOME%了不用管,但是不支持带空格的路径,hadoop路径也不能有空格
    首先:
    配置输入和输出结果文件夹

    1. 添加和src目录同级的input文件夹到项目中

    在input文件夹中放置一个或多个输入文件源
    新建一个test.segmented文件
    内容如下:

    dfdfadgdgag
    aadads
    fudflcl
    cckcer
    fadf
    dfdfadgdgag
    fudflcl
    fuck
    fuck
    fuckfuck
    haha
    aaa
    
    1. 配置运行参数
      在Intellij菜单栏中选择Run->Edit Configurations,在弹出来的对话框中点击+,新建一个Application配置。配置Main class为WordCount(可以点击右边的...选择),
      Program arguments为input/ output/,即输入路径为刚才创建的input文件夹,输出为output
      另外我建议改下IDEA maven的镜像不然会很慢
      修改方法:在~/.m2目录下的settings.xml文件中,(如果该文件不存在,则需要从maven/conf目录下拷贝一份),找到标签,添加如下子标签:
      然后去对照一下IDEA里的maven目录与settings.xml文件目录
    <mirror>
        <id>alimaven</id>
        <name>aliyun maven</name>
        <url>http://maven.aliyun.com/nexus/content/groups/public/</url>
        <mirrorOf>central</mirrorOf>        
    </mirror>
    

    pom.xml:

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>dsf</groupId>
        <artifactId>dsff</artifactId>
        <version>1.0-SNAPSHOT</version>
        <repositories>
            <repository>
                <id>apache</id>
                <url>http://maven.apache.org</url>
            </repository>
        </repositories>
    
        <dependencies>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-mapreduce-client-core</artifactId>
                <version>2.7.3</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>2.7.3</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-yarn-common</artifactId>
                <version>2.7.3</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-mapreduce-client-common</artifactId>
                <version>2.7.3</version>
            </dependency>
        </dependencies>
        <build>
            <plugins>
                <plugin>
                    <artifactId>maven-dependency-plugin</artifactId>
                    <configuration>
                        <excludeTransitive>false</excludeTransitive>
                        <stripVersion>true</stripVersion>
                        <outputDirectory>./lib</outputDirectory>
                    </configuration>
                </plugin>
            </plugins>
        </build>
    </project>
    
    1. 新建一个WordCount.java文件
      内容如下:
    import java.io.IOException;
    import java.util.StringTokenizer;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    public class WordCount {
    
        public static class TokenizerMapper
                extends Mapper<Object, Text, Text, IntWritable> {
    
            private final static IntWritable one = new IntWritable(1);
            private Text word = new Text();
    
            public void map(Object key, Text value, Context context
            ) throws IOException, InterruptedException {
                StringTokenizer itr = new StringTokenizer(value.toString());
                while (itr.hasMoreTokens()) {
                    word.set(itr.nextToken());
                    context.write(word, one);
                }
            }
        }
    
        public static class IntSumReducer
                extends Reducer<Text, IntWritable, Text, IntWritable> {
            private IntWritable result = new IntWritable();
    
            public void reduce(Text key, Iterable<IntWritable> values,
                               Context context
            ) throws IOException, InterruptedException {
                int sum = 0;
                for (IntWritable val : values) {
                    sum += val.get();
                }
                result.set(sum);
                context.write(key, result);
            }
        }
    
        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
            Job job = Job.getInstance(conf, "word count");
            job.setJarByClass(WordCount.class);
            job.setMapperClass(TokenizerMapper.class);
            job.setCombinerClass(IntSumReducer.class);
            job.setReducerClass(IntSumReducer.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));
            System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
    }
    
    1. 然后要用windows可以看下教程:
      这一定是可以的也是最方便的了
      https://www.cs.helsinki.fi/u/jilu/paper/hadoop_on_win.pdf
      首先下载hadoop二进制程序
      2.7.3 source:
      http://hadoop.apache.org/releases.html
      替换掉hadoop-2.7.3里的bin文件
      替换程序:
      https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip
      设置环境:
      这里注意下最好是在IDEA里设置HADOOP_HOME环境变量,如果设置的是系统环境变量那么你就还需要修改,我都设置了。。。
      hadoop-2.7.3-srchadoop-common-projecthadoop-commonsrcmainjavaorgapachehadooputilShell.java
      源码:
      2.7.3 source:
      http://hadoop.apache.org/releases.html
    private static String checkHadoopHome() {
    
        // first check the Dflag hadoop.home.dir with JVM scope
    		//System.setProperty("hadoop.home.dir", "...");
        String home = System.getProperty("hadoop.home.dir");
    
        // fall back to the system/user-global env variable
        if (home == null) {
          home = System.getenv("HADOOP_HOME");
        }
    
        try {
           // couldn't find either setting for hadoop's home directory
           if (home == null) {
             throw new IOException("HADOOP_HOME or hadoop.home.dir are not set.");
           }
    
           if (home.startsWith(""") && home.endsWith(""")) {
             home = home.substring(1, home.length()-1);
           }
    
           // check that the home setting is actually a directory that exists
           File homedir = new File(home);
           if (!homedir.isAbsolute() || !homedir.exists() || !homedir.isDirectory()) {
             throw new IOException("Hadoop home directory " + homedir
               + " does not exist, is not a directory, or is not an absolute path.");
           }
    
           home = homedir.getCanonicalPath();
    
        } catch (IOException ioe) {
          if (LOG.isDebugEnabled()) {
            LOG.debug("Failed to detect a valid hadoop home directory", ioe);
          }
          home = null;
        }
        //固定本机的hadoop地址
        home="D:\hadoop-2.7.3";
        return home;
      }
    
    home = System.getenv("HADOOP_HOME");
    

    如果是设置系统环境变量,这里获取的HADOOP_HOME的home目录的字符串会在字符串开始加入一个'u202A'字符,(好像是)代表c/c++/java源码(神奇)
    然后把文件拷贝到你的工程下,idea会优先查找工程目录下的(可以先拷过来再改)
    如果是设置IDEA里的环境变量就不用改Shell.java了
    之后按照文档里的改下NativeIO.java文件
    hadoop-2.7.3-srchadoop-common-projecthadoop-commonsrcmainjavaorgapachehadoopio ativeioNativeIO.java
    修改行609左右

        return access0(path, desiredAccess.accessRight());
    

    改成

        return true; 
    

    也拷到工程下来就行了

    另外建议以管理员模式打开IDEA

    由于Hadoop的设定,下次运行时务必删除output文件夹!

    好了,运行程序,结果如下:

    aaa 1
    aadads 1
    cckcer 1
    dfdfadgdgag 2
    fadf 1
    fuck 2
    fuckfuck 1
    fudflcl 2
    haha 1

  • 相关阅读:
    PHP实现发送模板消息(微信公众号版)
    laravel 跨域问题
    微信授权登录
    支付demo2
    支付demo1
    微信支付注意点
    微信支付方式区分
    debian,dietpi,linux中文乱码解决方法
    嵌入式应该深入专研STM32还是继续学习linux内核驱动呢?
    arduino下载ESP8266开发板的方法
  • 原文地址:https://www.cnblogs.com/HaibaraAi/p/6478477.html
Copyright © 2011-2022 走看看