zoukankan      html  css  js  c++  java
  • IDEA 提交 MapReduce 到 Hadoop 集群的 Yarn 上运行

    文章目录
    1. 搭建环境
    2. 新建WordCount V1.0
    3. 坑
    1. 搭建环境
    搭建 Hadoop集群环境 Hadoop 3.1.2 独立模式,单节点和多节点伪分布式安装与使用

    新建环境变量,设置hadoop的用户名,为集群的用户名


    2. 新建WordCount V1.0
    添加Maven依赖,虽然hadoop-client中有hadoop-mapreduce-client-jobclient,但不单独添加,IDEA控制台日志不会打印

    <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-client</artifactId>
    <version>3.1.2</version>
    </dependency>
    <dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
    <version>3.1.2</version>
    </dependency>
     
    添加log4j.properties到resource文件夹中

    log4j.rootLogger=INFO, console
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.Target=System.out
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=[%p] %d{yyyy-MM-dd HH:mm:ss,SSS} method:%l%m%n
     
    将Hadoop集群环境中的core-site.xml,hdfs-site.xml,mapred-site.xml,yarn-site.xml添加到resource文件夹中

    map

    public class WordCountMapper1 extends Mapper<LongWritable, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);

    private Text word = new Text();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
    // 读取一行
    String line = value.toString();
    // 空格分隔
    StringTokenizer stringTokenizer = new StringTokenizer(line);
    // 循环空格分隔,给每个计数1
    while(stringTokenizer.hasMoreTokens()){
    word.set(stringTokenizer.nextToken());
    context.write(word, one);
    }
    }
    }
     
    reduce

    public class WordCountReducer1 extends Reducer<Text, IntWritable, Text, IntWritable> {

    private IntWritable result = new IntWritable();

    @Override
    protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
    // 根据key对values计数
    int sum = 0;
    for(IntWritable intWritable : values){
    sum += intWritable.get();
    }
    result.set(sum);
    context.write(key, result);
    }

    WordCount V1.0,需要添加设置用户可以跨平台提交和需要执行jar的路径,即Maven的Package命令生成的该jar的路径

    public class WordCount1 {

    public static void main( String[] args ) {
    // 读取hdfs-site.xml,core-site.xml
    Configuration conf = new Configuration();
    // 设置用户可以跨平台提交,否则提交成功但是执行失败
    conf.set("mapreduce.app-submission.cross-platform","true");
    try{
    Job job = Job.getInstance(conf,"WordCount V1.0");

    job.setJarByClass(WordCount1.class);
    // 设置需要执行jar的路径,下面根据Maven的Package命令生成的jar路径
    job.setJar("E:\IDEA_workspace\mapreduce-test\target\mapreduce-test-1.0-SNAPSHOT.jar");

    job.setMapperClass(WordCountMapper1.class);
    job.setCombinerClass(WordCountReducer1.class);
    job.setReducerClass(WordCountReducer1.class);

    // job 输出key value 类型,mapper和reducer类型相同可用
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);

    // hdfs路径
    FileInputFormat.addInputPath(job, new Path("/hdfsTest/input"));
    FileOutputFormat.setOutputPath(job, new Path("/hdfsTest/output"));

    System.exit(job.waitForCompletion(true) ? 0 : 1);
    }catch (Exception e){
    e.printStackTrace();
    }
    }

    Maven的Clean和Package,再Rebuild Project,运行main函数,查看日志成功打印

    Yarn上也显示运行成功


    3. 坑
    HDFS和Windows的路径在IDEA上会被识别错误。要Maven进行Clean和Package,然后再Rebuild Project就可以了。

    IDEA再Windows上,所以Hadoop会获取Windows上的用户,和集群不同会报错。可以在Windows中添加环境变量,或者在hdfs-site.xml设置权限不可用。

    <property>
    <name>dfs.permissions.enabled</name>
    <value>false</value>
    </property> 
    需要设置跨平台。可直接在代码中设置,或者在mapred-site.xml设置跨平台。

    <property>
    <name>mapreduce.app-submission.cross-platform</name>
    <value>true</value>
    </property> 
    参考:
    本地idea开发mapreduce程序提交到远程hadoop集群执行
    Exception message: /bin/bash: line 0: fg: no job control
    ————————————————
    版权声明:本文为CSDN博主「shpunishment」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
    原文链接:https://blog.csdn.net/qq_36160730/article/details/101292584

  • 相关阅读:
    使用Docker在本地搭建Hadoop分布式集群
    微博推荐 第三个map 源码
    对象
    http无状态(stateless)
    理解http的无连接
    http响应报文之首部行
    http响应报文之状态行
    http响应报文
    http请求报文之首部行
    http请求之请求数据
  • 原文地址:https://www.cnblogs.com/javalinux/p/14927051.html
Copyright © 2011-2022 走看看