zoukankan      html  css  js  c++  java
  • Hadoop with tool interface

    Often Hadoop jobsare executed through a command line. Therefore, each Hadoop job has to
    support reading, parsing, and processing command-line arguments. To avoid each developer
    having to rewrite this code, Hadoop provides a org.apache.hadoop.util.Toolinterface.

    Sample code :

    public class WordcountWithTools extends Configured implements Tool {
    
    	public int run(String[] args) throws Exception {
    		if (args.length < 2) {
    			System.out
    					.println("chapter3.WordCountWithTools WordCount <inDir> <outDir>");
    			ToolRunner.printGenericCommandUsage(System.out);
    			System.out.println("");
    			return -1;
    		}
    
    		System.out.println(Arrays.toString(args));
    		// just for test
    		System.out.println(getConf().get("test"));
    
    		Job job = new Job(getConf(), "word count");
    		job.setJarByClass(WordCount.class);
    		job.setMapperClass(TokenizerMapper.class);
    		// Uncomment this to
    		// job.setCombinerClass(IntSumReducer.class);
    		job.setReducerClass(IntSumReducer.class);
    		job.setOutputKeyClass(Text.class);
    		job.setOutputValueClass(IntWritable.class);
    		FileInputFormat.addInputPath(job, new Path(args[0]));
    		// delete target if exists
    		FileSystem.get(getConf()).delete(new Path(args[1]), true);
    		FileOutputFormat.setOutputPath(job, new Path(args[1]));
    		job.waitForCompletion(true);
    
    		return 0;
    	}
    
    	public static void main(String[] args) throws Exception {
    		int res = ToolRunner.run(new Configuration(), new WordcountWithTools(),
    				args);
    		System.exit(res);
    	}
    
    }

    Generic options supported are
    -conf<configuration file> specify an application configuration
    file
    -D <property=value> use value for given property
    -fs<local|namenode:port> specify a namenode
    -jt<local|jobtracker:port> specify a job tracker
    -files<comma separated list of files> specify comma separated
    files to be copied to the map reduce cluster
    -libjars<comma separated list of jars> specify comma separated
    jar files to include in the classpath.
    -archives<comma separated list of archives> specify comma
    separated archives to be unarchived on the compute machines.
    The general command line syntax is
    bin/hadoop command [genericOptions] [commandOptions]
    这里一定要注意顺序,我曾经用错过顺序,把-input -output放在前面,后面使用-D,-libjars不起作用。

    使用示例:

    JAR_NAME=/home/hadoop/workspace/myhadoop/target/myhadoop-0.0.1-SNAPSHOT.jar
    MAIN_CLASS=chapter3.WordcountWithTools
    INPUT_DIR=/data/input/
    OUTPUT_DIR=/data/output/
    hadoop jar $JAR_NAME $MAIN_CLASS -Dtest=lovejava $INPUT_DIR $OUTPUT_DIR 

    在代码中测试传递的test属性的值。

    JAR_NAME=/home/hadoop/workspace/myhadoop/target/myhadoop-0.0.1-SNAPSHOT.jar
    MAIN_CLASS=chapter3.WordcountWithTools
    INPUT_DIR=/home/hadoop/data/test1.txt
    OUTPUT_DIR=/home/hadoop/data/output/
    hadoop jar $JAR_NAME $MAIN_CLASS -Dtest=lovejava -fs=file:/// -files=home/hadoop/data/test2.txt
    $INPUT_DIR $OUTPUT_DIR

    测试处理本地文件系统的文件。

    JAR_NAME=/home/hadoop/workspace/myhadoop/target/myhadoop-0.0.1-SNAPSHOT.jar
    MAIN_CLASS=chapter3.WordcountWithTools
    INPUT_DIR=/home/hadoop/data/test1.txt
    OUTPUT_DIR=/home/hadoop/data/output/
    hadoop jar $JAR_NAME $MAIN_CLASS -conf=/home/hadoop/data/democonf.xml -fs=file:/// $INPUT_DIR $OUTPUT_DIR

    指定配置文件。

    -libjars可以把你写的mapreduce中引用的第三方包放到HDFS上,然后各结点在运行作业的时候复制到本地临时目录,以避免找不到引用类的情况。

    Looking for a job working at Home about MSBI
  • 相关阅读:
    async&await的前世今生
    如何使用cocos2dx-jsbinding 来处理分辨率适配
    cocos2d-x jsbinding 资源下载实现
    cocos2d-x jsbinding 在线更新策略设计
    xml2js
    快速入门cocos2d-x jsbinding
    cocos2d-x 工程目录结构说明
    Javascript 开发IDE
    认识cocos2d-x jsbinding
    MySQL 灵异事件一则 -- desc报语法错误
  • 原文地址:https://www.cnblogs.com/huaxiaoyao/p/4413440.html
Copyright © 2011-2022 走看看