本文记录Maven构建Hadoop开发环境
软件环境:Eclipse Kepler x64 & Hadoop 1.2.1 & Maven 3
硬件环境:Centos 6.5 x64
前提已经安装Maven环境,详见http://www.cnblogs.com/guarder/p/3734309.html
1、Maven创建项目
使用CMD命令在工作空间手动创建
E:wsmvn archetype:generate -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=org.conan.myhadoop.mr -DartifactId=myHadoop -DpackageName=org.conan.myhadoop.mr -Dversion=1.0-SNAPSHOT -DinteractiveMode=false
命令下载依赖工程项目,时间比较漫长。
[INFO] Generating project in Batch mode
发现到这一步卡住不继续往下,原因是网速或者权限问题,需要加个参数,-DarchetypeCatalog=internal 不要从远程服务器上取catalog。
E:wsmvn archetype:generate -DarchetypeCatalog=interna -DarchetypeGroupId=org.apache.maven.archetypes -DgroupId=org.conan.myhadoop.mr -DartifactId=myHadoop -DpackageName=org.conan.myhadoop.mr -Dversion=1.0-SNAPSHOT -DinteractiveMode=false
[INFO] Parameter: groupId, Value: org.conan.myhadoop.mr
[INFO] Parameter: packageName, Value: org.conan.myhadoop.mr
[INFO] Parameter: package, Value: org.conan.myhadoop.mr
[INFO] Parameter: artifactId, Value: myHadoop
[INFO] Parameter: basedir, Value: E:ws
[INFO] Parameter: version, Value: 1.0-SNAPSHOT
[INFO] project created from Old (1.x) Archetype in dir: E:wsmyHadoop
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 1.303 s
[INFO] Finished at: 2014-05-17T22:21:09+08:00
[INFO] Final Memory: 8M/71M
[INFO] ------------------------------------------------------------------------
2、项目导入Eclipse
选择导入Maven项目,不是导入Java项目。
3、增加Hadoop依赖
修改pom.xml,增加Hadoop配置。
<dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-core</artifactId> <version>1.2.1</version> </dependency>
4、下载依赖包
E:wsmyHadoopmvn clean install
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time: 19:39 min
[INFO] Finished at: 2014-05-17T23:30:24+08:00
[INFO] Final Memory: 13M/99M
[INFO] ------------------------------------------------------------------------
下载完毕刷新Eclipse中的工程,会自动加载依赖类库。
5、下载集群配置文件
将集群环境中core-site.xml,hdfs-site.xml,mapred-site.xml保存在本地Eclipse工程src/main/resources/hadoop目录下面。
6、配置本地Host
修改C:WindowsSystem32driversetc,增加配置:
192.168.1.115 master
192.168.1.111 slave1
192.168.1.112 slave2
7、编写测试程序
package org.conan.myhadoop.mr; import java.io.IOException; import java.util.Iterator; import java.util.StringTokenizer; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapred.FileInputFormat; import org.apache.hadoop.mapred.FileOutputFormat; import org.apache.hadoop.mapred.JobClient; import org.apache.hadoop.mapred.JobConf; import org.apache.hadoop.mapred.MapReduceBase; import org.apache.hadoop.mapred.Mapper; import org.apache.hadoop.mapred.OutputCollector; import org.apache.hadoop.mapred.Reducer; import org.apache.hadoop.mapred.Reporter; import org.apache.hadoop.mapred.TextInputFormat; import org.apache.hadoop.mapred.TextOutputFormat; public class WordCount { public static class WordCountMapper extends MapReduceBase implements Mapper<Object, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); output.collect(word, one); } } } public static class WordCountReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException { int sum = 0; while (values.hasNext()) { sum += values.next().get(); } result.set(sum); output.collect(key, result); } } public static void main(String[] args) throws Exception { String input = "hdfs://192.168.1.115:9000/user/huser/in"; String output = "hdfs://192.168.1.115:9000/user/huser/output/result"; JobConf conf = new JobConf(WordCount.class); conf.setJobName("WordCount"); conf.addResource("classpath:/hadoop/core-site.xml"); conf.addResource("classpath:/hadoop/hdfs-site.xml"); conf.addResource("classpath:/hadoop/mapred-site.xml"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(WordCountMapper.class); conf.setCombinerClass(WordCountReducer.class); conf.setReducerClass(WordCountReducer.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); FileInputFormat.setInputPaths(conf, new Path(input)); FileOutputFormat.setOutputPath(conf, new Path(output)); JobClient.runJob(conf); System.exit(0); } }
8、运行程序
在Eclipse中运行JAVA程序。
2014-5-18 9:46:36 org.apache.hadoop.util.NativeCodeLoader <clinit> 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-5-18 9:46:36 org.apache.hadoop.security.UserGroupInformation doAs 严重: PriviledgedActionException as:Administrator cause:java.io.IOException: Failed to set permissions of path: mphadoop-AdministratormapredstagingAdministrator1092236978.staging to 0700 Exception in thread "main" java.io.IOException: Failed to set permissions of path: mphadoop-AdministratormapredstagingAdministrator1092236978.staging to 0700 at org.apache.hadoop.fs.FileUtil.checkReturnValue(FileUtil.java:691) at org.apache.hadoop.fs.FileUtil.setPermission(FileUtil.java:664) at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:514) at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:349) at org.apache.hadoop.fs.FilterFileSystem.mkdirs(FilterFileSystem.java:193) at org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(JobSubmissionFiles.java:126) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:942) at org.apache.hadoop.mapred.JobClient$2.run(JobClient.java:936) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:936) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:910) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1353) at org.conan.myhadoop.mr.WordCount.main(WordCount.java:79)
运行报错,这是文件权限问题,修改Hadoop源代码srccoreorgapachehadoopfsFileUtil.java文件。
private static void checkReturnValue(boolean rv, File p, FsPermission permission ) throws IOException { // if (!rv) { // throw new IOException("Failed to set permissions of path: " + p + // " to " + // String.format("%04o", permission.toShort())); // } }
注释上面几行,将Hadoop重新打包,替换Maven中的Hadoop对应JAR包,继续运行程序。
2014-5-18 10:03:00 org.apache.hadoop.util.NativeCodeLoader <clinit> 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-5-18 10:03:00 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles 警告: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2014-5-18 10:03:00 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles 警告: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 2014-5-18 10:03:00 org.apache.hadoop.io.compress.snappy.LoadSnappy <clinit> 警告: Snappy native library not loaded 2014-5-18 10:03:00 org.apache.hadoop.mapred.FileInputFormat listStatus 信息: Total input paths to process : 3 2014-5-18 10:03:01 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local1959811789_0001 2014-5-18 10:03:01 org.apache.hadoop.mapred.LocalJobRunner$Job run 警告: job_local1959811789_0001 org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=Administrator, access=WRITE, inode="output":huser:supergroup:rwxr-xr-x at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) at java.lang.reflect.Constructor.newInstance(Constructor.java:513) at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:95) at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:57) at org.apache.hadoop.hdfs.DFSClient.mkdirs(DFSClient.java:1459) at org.apache.hadoop.hdfs.DistributedFileSystem.mkdirs(DistributedFileSystem.java:362) at org.apache.hadoop.fs.FileSystem.mkdirs(FileSystem.java:1161) at org.apache.hadoop.mapred.FileOutputCommitter.setupJob(FileOutputCommitter.java:52) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:319)
运行报错,修改HDFS目录权限和配置文件hdfs-site.xml。
[huser@master hadoop-1.2.1]$ bin/hadoop fs -chmod 777 /user/huser Warning: $HADOOP_HOME is deprecated.
<property> <name>dfs.permissions</name> <value>false</value> <description> If "true", enable permission checking in HDFS. If "false", permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories. </description> </property>
重启集群,再运行JAVA程序。
2014-5-18 10:40:22 org.apache.hadoop.util.NativeCodeLoader <clinit> 警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-5-18 10:40:22 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles 警告: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2014-5-18 10:40:22 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles 警告: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). 2014-5-18 10:40:23 org.apache.hadoop.io.compress.snappy.LoadSnappy <clinit> 警告: Snappy native library not loaded 2014-5-18 10:40:23 org.apache.hadoop.mapred.FileInputFormat listStatus 信息: Total input paths to process : 3 2014-5-18 10:40:23 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Running job: job_local985253743_0001 2014-5-18 10:40:23 org.apache.hadoop.mapred.LocalJobRunner$Job run 信息: Waiting for map tasks 2014-5-18 10:40:23 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable run 信息: Starting task: attempt_local985253743_0001_m_000000_0 2014-5-18 10:40:23 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2014-5-18 10:40:23 org.apache.hadoop.mapred.MapTask updateJobWithSplit 信息: Processing split: hdfs://192.168.1.115:9000/user/huser/in/test.txt:0+172 2014-5-18 10:40:23 org.apache.hadoop.mapred.MapTask runOldMapper 信息: numReduceTasks: 1 2014-5-18 10:40:23 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: io.sort.mb = 100 2014-5-18 10:40:23 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: data buffer = 79691776/99614720 2014-5-18 10:40:23 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: record buffer = 262144/327680 2014-5-18 10:40:23 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local985253743_0001_m_000000_0 is done. And is in the process of commiting 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.115:9000/user/huser/in/test.txt:0+172 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local985253743_0001_m_000000_0' done. 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable run 信息: Finishing task: attempt_local985253743_0001_m_000000_0 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable run 信息: Starting task: attempt_local985253743_0001_m_000001_0 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask updateJobWithSplit 信息: Processing split: hdfs://192.168.1.115:9000/user/huser/in/test3.txt:0+20 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask runOldMapper 信息: numReduceTasks: 1 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: io.sort.mb = 100 2014-5-18 10:40:24 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 33% reduce 0% 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: data buffer = 79691776/99614720 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: record buffer = 262144/327680 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local985253743_0001_m_000001_0 is done. And is in the process of commiting 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.115:9000/user/huser/in/test3.txt:0+20 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local985253743_0001_m_000001_0' done. 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable run 信息: Finishing task: attempt_local985253743_0001_m_000001_0 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable run 信息: Starting task: attempt_local985253743_0001_m_000002_0 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask updateJobWithSplit 信息: Processing split: hdfs://192.168.1.115:9000/user/huser/in/test2.txt:0+13 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask runOldMapper 信息: numReduceTasks: 1 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: io.sort.mb = 100 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: data buffer = 79691776/99614720 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init> 信息: record buffer = 262144/327680 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush 信息: Starting flush of map output 2014-5-18 10:40:24 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill 信息: Finished spill 0 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local985253743_0001_m_000002_0 is done. And is in the process of commiting 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: hdfs://192.168.1.115:9000/user/huser/in/test2.txt:0+13 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local985253743_0001_m_000002_0' done. 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable run 信息: Finishing task: attempt_local985253743_0001_m_000002_0 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job run 信息: Map task executor complete. 2014-5-18 10:40:24 org.apache.hadoop.mapred.Task initialize 信息: Using ResourceCalculatorPlugin : null 2014-5-18 10:40:24 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2014-5-18 10:40:24 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Merging 3 sorted segments 2014-5-18 10:40:25 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 0% 2014-5-18 10:40:25 org.apache.hadoop.mapred.Merger$MergeQueue merge 信息: Down to the last merge-pass, with 3 segments left of total size: 266 bytes 2014-5-18 10:40:25 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2014-5-18 10:40:27 org.apache.hadoop.mapred.Task done 信息: Task:attempt_local985253743_0001_r_000000_0 is done. And is in the process of commiting 2014-5-18 10:40:27 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: 2014-5-18 10:40:27 org.apache.hadoop.mapred.Task commit 信息: Task attempt_local985253743_0001_r_000000_0 is allowed to commit now 2014-5-18 10:40:27 org.apache.hadoop.mapred.FileOutputCommitter commitTask 信息: Saved output of task 'attempt_local985253743_0001_r_000000_0' to hdfs://192.168.1.115:9000/user/huser/output/result 2014-5-18 10:40:27 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate 信息: reduce > reduce 2014-5-18 10:40:27 org.apache.hadoop.mapred.Task sendDone 信息: Task 'attempt_local985253743_0001_r_000000_0' done. 2014-5-18 10:40:28 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: map 100% reduce 100% 2014-5-18 10:40:29 org.apache.hadoop.mapred.JobClient monitorAndPrintJob 信息: Job complete: job_local985253743_0001 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Counters: 20 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: File Input Format Counters 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Bytes Read=205 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: File Output Format Counters 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Bytes Written=224 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: FileSystemCounters 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_READ=3410 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_READ=774 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: FILE_BYTES_WRITTEN=276151 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: HDFS_BYTES_WRITTEN=224 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Map-Reduce Framework 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Map output materialized bytes=278 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Map input records=8 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Reduce shuffle bytes=0 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Spilled Records=18 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Map output bytes=242 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Total committed heap usage (bytes)=1128595456 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Map input bytes=205 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Combine input records=9 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: SPLIT_RAW_BYTES=305 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Reduce input records=9 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Reduce input groups=9 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Combine output records=9 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Reduce output records=9 2014-5-18 10:40:29 org.apache.hadoop.mapred.Counters log 信息: Map output records=9
运行成功,查看输出。
[huser@master hadoop-1.2.1]$ bin/hadoop fs -ls /user/huser/output/result Warning: $HADOOP_HOME is deprecated. Found 2 items -rw-r--r-- 3 Administrator supergroup 0 2014-04-18 10:11 /user/huser/output/result/_SUCCESS -rw-r--r-- 3 Administrator supergroup 224 2014-04-18 10:11 /user/huser/output/result/part-00000 [huser@master hadoop-1.2.1]$ bin/hadoop fs -cat /user/huser/output/result/part-00000 Warning: $HADOOP_HOME is deprecated. 111111111ccc11111222222222eeeeeeee222222 1 11111111tttttttttttttttttttffffffffffffffffffff 1 222222222222ccc2222222222f 1 2ccc2222222222f 1 33333333333ttttttttttttttttt 1 4fff 1 4fffffffffffffffffffffffff 1 hadoop 1 hello 1