zoukankan      html  css  js  c++  java
  • 第一个hadoop程序(hadoop2.4.0集群+Eclipse环境)

    一、Eclipse hadoop环境配置 

    1. 在我的电脑右键->属性->高级系统设置->环境变量,配置环境变量:

            JAVA_HOME=D:ProgramFilesJavajdk1.7.0_67

          HADOOP_HOME=D:TEDP_Softwarehadoop-2.4.0,

          PATH=.;%JAVA_HOME%in;%HADOOP_HOME%in;

    2. 在Eclipse中安装好hadoop-eclipse-kepler-plugin-2.2.0.jar插件,并配置好Hadoop Server

    二、WordCount程序

    1.准备测试文件
    [hadoop@master hadoop]# mkdir file 

    [hadoop@master hadoop]# cd file

    [hadoop@master file]# ls
    [hadoop@master file]# echo "Hello world">file1.txt
    [hadoop@master file]# echo"Hello hadoop">file2.txt

    2. 输入文件夹
    创建Hadoop文件夹: hadoop fs -mkdir /user
    权限设置:hadoop fs -chmod -R 777 /user
    创建输入文件夹: hadoop fs -mkdir /user/input
    查看文件夹: hadoop fs -ls /
    上传文件到Hadoop: hadoop fs -put ~/file/file*.txt /user/input
    报错1:
    java.net.NoRouteToHostException: No route to host
    (或在hive中:could only be replicated to 0 nodes instead of minReplication (=1).  There are 2 datanode(s) running and 2 node(s) are excluded in this operation.)
    防火墙没关闭导致的:各主机切换到root, 执行 service iptables stop 
     
    3. 新建MR工程,将附件中WordCount.java拷贝进去
    WordCount类上右键->Run as->Run Configurations,输入如下参数信息:
    hdfs://192.168.1.200:9000/user/input hdfs://192.168.1.200:9000/user/output
     
    4.Run on hadoop
    (1)异常信息1:Exception in thread "main" java.lang.NullPointerException
    解决办法: 百度上说,这是Hadoopwindows上的一个BUG,在linux上没有问题
    下载hadoop-common-2.2.0-bin-master.ziphadoop-common-2.2.0-bin-master.zip解压后将

    bin中的文件替换到.hadoop-2.4.0in

    并将bin中的hadoop.dll拷贝到C:WindowsSystem32中,重启电脑。

    (2)异常信息2:14/12/02 21:01:01 ERROR util.Shell: Failed to locate the winutils binary in the hadoop binary path

    java.io.IOException: Could not locate executable nullinwinutils.exe in the Hadoop binaries.

    解决办法:   配置本地环境变量: HADOOP_HOME =D:SoftLinuxhadoop-2.4.0需重启,

    不想重启的话在代码中加:  System.setProperty("hadoop.home.dir", "D:\Soft\Linux\hadoop-2.4.0"); 
    (3)异常信息3:Exception in thread "main" org.apache.hadoop.mapred.FileAlreadyExistsException: Output directory hdfs://192.168.1.200:9000/user/output already exists

    解决办法:  output文件夹已存在,修改一下输出文件夹或间output删掉

    (4)异常信息4: 然后没反应了(这是后来新建第二个hadoop程序时发生的错误)

    解决办法:到Run Configurations->main中发现mainclass为jline.ANSIBuffer, 改成WordCount,让后点击“Run”即可

    注意:如果用”Run As“ ->“Run On Hadoop”菜单执行,在弹出页面选择Select Type的时候要输入或选择WordCount;

    5.OK 运行结果:

    Hello 2

    hadoop 1

    world 1

    6. 附件: WordCount .java文件
     
    import java.io.IOException;
    import java.util.*;
     
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.conf.*;
    import org.apache.hadoop.io.*;
    import org.apache.hadoop.mapred.*;
    import org.apache.hadoop.util.*;
     
    public class WordCount {
     
     public static class Map extends MapReduceBase implements
       Mapper<LongWritable, Text, Text, IntWritable> {
      private final static IntWritable one = new IntWritable(1);
      private Text word = new Text();
     
      public void map(LongWritable key, Text value,
        OutputCollector<Text, IntWritable> output, Reporter reporter)
        throws IOException {
       String line = value.toString();
       StringTokenizer tokenizer = new StringTokenizer(line);
       while (tokenizer.hasMoreTokens()) {
        word.set(tokenizer.nextToken());
        output.collect(word, one);
       }
      }
     }
     
     public static class Reduce extends MapReduceBase implements
       Reducer<Text, IntWritable, Text, IntWritable> {
      public void reduce(Text key, Iterator<IntWritable> values,
        OutputCollector<Text, IntWritable> output, Reporter reporter)
        throws IOException {
       int sum = 0;
       while (values.hasNext()) {
        sum += values.next().get();
       }
       output.collect(key, new IntWritable(sum));
      }
     }
     
     public static void main(String[] args) throws Exception {
     
     // System.setProperty("hadoop.home.dir", "D:\Soft\Linux\hadoop-2.4.0");
     
      JobConf conf = new JobConf(WordCount.class);
      conf.setJobName("wordcount");
     
      conf.setOutputKeyClass(Text.class);
      conf.setOutputValueClass(IntWritable.class);
     
      conf.setMapperClass(Map.class);
      conf.setCombinerClass(Reduce.class);
      conf.setReducerClass(Reduce.class);
     
      conf.setInputFormat(TextInputFormat.class);
      conf.setOutputFormat(TextOutputFormat.class);
     
      FileInputFormat.setInputPaths(conf, new Path(args[0]));
      FileOutputFormat.setOutputPath(conf, new Path(args[1]));
     
      JobClient.runJob(conf);
     }
    }

    本文参考:http://www.cnblogs.com/xia520pi/archive/2012/05/16/2504205.html 

    《完》
     

  • 相关阅读:
    php之static静态变量详解
    设计模式【代理模式】
    小牟Andorid下面MD5具体实现的思路总结
    ubuntu14.04安装MySQL
    Android手机定位技术的发展
    我不同意你,这是您的支持
    我要遵守11文章数据库设计指南
    quick-cocos2d-x游戏开发【3】——display.newSprite创建向导
    第二章 自己的框架WMTS服务,下载数据集成的文章1
    JSTL实现int数据的类型的长度
  • 原文地址:https://www.cnblogs.com/zhaohz/p/4397953.html
Copyright © 2011-2022 走看看