zoukankan      html  css  js  c++  java
  • 基于Eclipse搭建hadoop开发环境

    一、基础环境准备

    1、Eclipse 下载地址:http://pan.baidu.com/s/1slArxAP

    2、JDK1.8  下载地址:http://pan.baidu.com/s/1i5iNyTZ

    二、win10下hadoop开发环境搭建

    1、下载hadoop插件:hadoop-eclipse-plugin-2.7.3.jar,插件放在eclipsedropins目录下。

    hadoop-eclipse-plugin-2.7.3.jar 百度云盘下载地址: http://pan.baidu.com/s/1i585KTv 
     
    hadoop-eclipse-plugin-2.7.3.jar  CSDN下载地址:http://download.csdn.net/detail/chongxin1/9859371
     
     
    关闭,并重新启动Eclipse。

    2、在windows解压hadoop-2.7.3.tar.gz

    hadoop-2.7.3.tar.gz 百度云盘下载地址:http://pan.baidu.com/s/1o8c77PS

    3、配置Hadoop Map/Reduce


    4、点击show view -> other… ,在mapreduce tools下选择Map/ReduceLocations


     在eclipse右下侧,点击蓝色大象:
     
     
     
    添加一个新的HadoopLocation,并配置:
     
    locationname:随意写 
     
    Map/Reduce Master :
    host:192.168.168.200 【装hadoop的linux系统的IP地址】
    port:9001(core-site.xml)
     
    DFS Master :
    Use M/R Master host:(打勾:单机模式) 
    User name:windows系统得默认用户
    Port:9000 (mapred-site.xml)
     
    这里的Host和Port在Ubuntu中搭建Hadoop环境时已经设置了。在core-site.xml和mapred-site.xml中查看。 

    5、查看是否连接成功

    至此win10下hadoop开发环境搭建完成。

    三、运行新建WordCount 项目并运行

    1.右击New->Map/Reduce Project

    2.在hdfs输入目录创建需要统计的文本

      1)没有输入输出目录卡,先在hdfs上建个文件夹  
    1. bin/hadoop dfs -mkdir -p hdfs://192.168.168.200:9000/input
    2. bin/hadoop dfs -mkdir -p hdfs://192.168.168.200:9000/output

    2).把要统计的文本上传到hdfs的输入目录下
    1. bin/hadoop fs -put words.txt /input

     words.txt内容为:

    1. HelloHadoop
    2. HelloBigData
    3. HelloSpark
    4. HelloFlume
    5. HelloKafka

    3.新建WordCount.java

    1. import java.io.IOException;
    2. import java.util.StringTokenizer;
    3.  
    4. import org.apache.hadoop.conf.Configuration;
    5. import org.apache.hadoop.fs.Path;
    6. import org.apache.hadoop.io.IntWritable;
    7. import org.apache.hadoop.io.Text;
    8. import org.apache.hadoop.mapreduce.Job;
    9. import org.apache.hadoop.mapreduce.Mapper;
    10. import org.apache.hadoop.mapreduce.Reducer;
    11. import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    12. import org.apache.hadoop.mapreduce.lib.input.NLineInputFormat;
    13. import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    14.  
    15. /**
    16.  * 第一个MapReduce程序
    17.  * 
    18.  * @author sunchen
    19.  * 
    20.  */
    21. public class WordCount {
    22.  
    23.     public static class TokenizerMapper extends
    24.             Mapper<Object, Text, Text, IntWritable> {
    25.  
    26.         private final static IntWritable one = new IntWritable(1);
    27.         private Text word = new Text();
    28.  
    29.         public void map(Object key, Text value, Context context)
    30.                 throws IOException, InterruptedException {
    31.             StringTokenizer itr = new StringTokenizer(value.toString());
    32.             while (itr.hasMoreTokens()) {
    33.                 word.set(itr.nextToken());
    34.                 context.write(word, one);
    35.             }
    36.         }
    37.     }
    38.  
    39.     public static class IntSumReducer extends
    40.             Reducer<Text, IntWritable, Text, IntWritable> {
    41.         private IntWritable result = new IntWritable();
    42.  
    43.         public void reduce(Text key, Iterable<IntWritable> values,
    44.                 Context context) throws IOException, InterruptedException {
    45.             int sum = 0;
    46.             for (IntWritable val : values) {
    47.                 sum += val.get();
    48.             }
    49.             result.set(sum);
    50.             context.write(key, result);
    51.         }
    52.     }
    53.  
    54.     public static void main(String[] args) throws Exception {
    55.         Configuration conf = new Configuration();
    56.         Job job = Job.getInstance(conf, "word count");
    57.         job.setJarByClass(WordCount.class);
    58.         job.setMapperClass(TokenizerMapper.class);
    59.         job.setCombinerClass(IntSumReducer.class);
    60.         job.setReducerClass(IntSumReducer.class);
    61.         job.setOutputKeyClass(Text.class);
    62.         job.setOutputValueClass(IntWritable.class);
    63.         job.setInputFormatClass(NLineInputFormat.class);
    64.         // 输入文件路径
    65.         FileInputFormat.addInputPath(job, new Path(
    66.                 "hdfs://192.168.168.200:9000/input/words.txt"));
    67.         // 输出文件路径
    68.         FileOutputFormat.setOutputPath(job, new Path(
    69.                 "hdfs://192.168.168.200:9000/output/wordcount"));
    70.         System.exit(job.waitForCompletion(true) ? 0 : 1);
    71.     }
    72. }​

    4、配置JDK1.8

     

    因为Hadoop-eclipse-plugin-2.7.3.jar是使用JDK1.8编译的,如果不使用JDK1.8,则会出现以下报错: 

    Java.lang.UnsupportedClassVersionError: WordCount : Unsupported major.minor version 52.0

    原因:JDK版本太低,一定要换成JDK1.8。

    5、在项目的src下面新建file名为log4j.properties的文件

     在项目的src下面新建file名为log4j.properties的文件,内容为: 

    1. ### 设置日志级别及日志存储器 ###
    2. #log4j.rootLogger=DEBUG, Console
    3. ### 设置日志级别及日志存储器 ###
    4. log4j.rootLogger=info,consolePrint,errorFile,logFile
    5. #log4j.rootLogger=DEBUG,consolePrint,errorFile,logFile,Console  
    6.  
    7. ###  输出到控制台 ###
    8. log4j.appender.consolePrint.Encoding = UTF-8
    9. log4j.appender.consolePrint = org.apache.log4j.ConsoleAppender
    10. log4j.appender.consolePrint.Target = System.out
    11. log4j.appender.consolePrint.layout = org.apache.log4j.PatternLayout
    12. log4j.appender.consolePrint.layout.ConversionPattern=%%[%c] - %m%n
    13.  
    14. ### 输出到日志文件 ###
    15. log4j.appender.logFile.Encoding = UTF-8
    16. log4j.appender.logFile = org.apache.log4j.DailyRollingFileAppender
    17. log4j.appender.logFile.File = D:/RUN_Data/log/dajiangtai_ok.log
    18. log4j.appender.logFile.Append = true
    19. log4j.appender.logFile.Threshold = info
    20. log4j.appender.logFile.layout = org.apache.log4j.PatternLayout
    21. log4j.appender.logFile.layout.ConversionPattern = %-d{yyyy-MM-dd HH:mm:ss}  [ %t:%] - [ %]  %m%n
    22.  
    23. ### 保存异常信息到单独文件 ###
    24. log4j.appender.errorFile.Encoding = UTF-8
    25. log4j.appender.errorFile = org.apache.log4j.DailyRollingFileAppender
    26. log4j.appender.errorFile.File = D:/RUN_Data/log/dajiangtai_error.log
    27. log4j.appender.errorFile.Append = true
    28. log4j.appender.errorFile.Threshold = ERROR
    29. log4j.appender.errorFile.layout = org.apache.log4j.PatternLayout
    30. log4j.appender.errorFile.layout.ConversionPattern =%-d{yyyy-MM-dd HH:mm:ss}  [ %t:%] - [ %]  %m%n
    31.   
    32. #Console  
    33. log4j.appender.Console=org.apache.log4j.ConsoleAppender  
    34. log4j.appender.Console.layout=org.apache.log4j.PatternLayout  
    35. log4j.appender.Console.layout.ConversionPattern=%[%t] %-5p [%c] - %m%n  
    36.   
    37. log4j.logger.java.sql.ResultSet=INFO  
    38. log4j.logger.org.apache=INFO  
    39. log4j.logger.java.sql.Connection=DEBUG  
    40. log4j.logger.java.sql.Statement=DEBUG  
    41. log4j.logger.java.sql.PreparedStatement=DEBUG
    42.  
    43. #log4j.logger.com.dajiangtai.dao=DEBUG,TRACE  
    44. log4j.logger.com.dajiangtai.dao.IFollowDao=DEBUG 

    如图: 

    没有log4j.properties日志打不出来,会报警告信息:

    1. log4j:WARN No appenders could be found for logger (org.apache.hadoop.metrics2.lib.MutableMetricsFactory).
    2. log4j:WARN Please initialize the log4j system properly.
    3. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.​​

    6、配置hadoop环境变量

    添加环境变量HADOOP_HOME=D:hadoop-2.7.3
    追加环境变量path内容:%HADOOP_HOME%/bin 

    如果没有生效,重启eclipse;如果还是没有生效,重启电脑。

    如果没配置hadoop环境变量,则会出现以下报错:

    Could not locate executable nullinwinutils.exe in the Hadoop binaries.

    1. 2017-07-08 15:53:03,783 ERROR [org.apache.hadoop.util.Shell] - Failed to locate the winutils binary in the hadoop binary path
    2. java.io.IOException: Could not locate executable nullinwinutils.exe in the Hadoop binaries.
    3.  at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
    4.  at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
    5.  at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
    6.  at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
    7.  at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:610)
    8.  at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
    9.  at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
    10.  at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
    11.  at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
    12.  at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
    13.  at org.apache.hadoop.mapreduce.task.JobContextImpl.<init>(JobContextImpl.java:72)
    14.  at org.apache.hadoop.mapreduce.Job.<init>(Job.java:142)
    15.  at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:185)
    16.  at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:204)
    17.  at WordCount.main(WordCount.java:56)​​

    跟代码就去发现是HADOOP_HOME的问题。如果HADOOP_HOME为空,必然fullExeName为nullinwinutils.exe。解决方法很简单,配置环境变量吧。

    7、下载winutils.exe,hadoop.dll拷贝到%HADOOP_HOME%in目录 

    winutils.exe , hadoop.dll github下载地址:https://github.com/SweetInk/hadoop-common-2.7.1-bin
    winutils.exe , hadoop.dll 百度云盘下载地址:https://pan.baidu.com/s/1jI3KdX8#list/path=%2F
    拷贝winutils.exe , hadoop.dll到%HADOOP_HOME%in目录
     少了winutils.exe会报以下错误:

    java.io.IOException: Could not locate executable D:hadoop-2.7.3inwinutils.exe in the Hadoop binaries.

    1. 2017-07-08 16:17:13,272 ERROR [org.apache.hadoop.util.Shell] - Failed to locate the winutils binary in the hadoop binary path
    2. java.io.IOException: Could not locate executable D:hadoop-2.7.3inwinutils.exe in the Hadoop binaries.
    3. at org.apache.hadoop.util.Shell.getQualifiedBinPath(Shell.java:379)
    4. at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:394)
    5. at org.apache.hadoop.util.Shell.<clinit>(Shell.java:387)
    6. at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:80)
    7. at org.apache.hadoop.security.SecurityUtil.getAuthenticationMethod(SecurityUtil.java:610)
    8. at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:273)
    9. at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:261)
    10. at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:791)
    11. at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:761)
    12. at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:634)
    13. at org.apache.hadoop.mapreduce.task.JobContextImpl.<init>(JobContextImpl.java:72)
    14. at org.apache.hadoop.mapreduce.Job.<init>(Job.java:142)
    15. at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:185)
    16. at org.apache.hadoop.mapreduce.Job.getInstance(Job.java:204)
    17. at WordCount.main(WordCount.java:56)​

     少了hadoop.dll会报以下错误:

    1. 2017-07-08 16:34:27,170 WARN [org.apache.hadoop.util.NativeCodeLoader] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable​​

     8、点击WordCount.java右击-->Run As-->Run on  Hadoop  

     运行结果:

     

     单词统计结果如下:

     至此搭建完毕,666! 

  • 相关阅读:
    mysql替代like模糊查询的方法
    8个超实用的jQuery插件应用
    判断登陆设备是否为手机
    SQL tp3.2 批量更新 saveAll
    SQL-批量插入和批量更新
    防止手机端底部导航被搜索框顶起
    php COM
    thinkphp3.2 where 条件查询 复查的查询语句
    Form表单提交,js验证
    jupyter notebook 使用cmd命令窗口打开
  • 原文地址:https://www.cnblogs.com/yangcx666/p/8723912.html
Copyright © 2011-2022 走看看