zoukankan      html  css  js  c++  java
  • win下idea远程提交WordCount任务到HA集群

    一,环境配置

    1,修改win下的host文件:即C:WindowsSystem32driversetchost中添加集群中机子的ip

    2,win下hadoop,并为win的环境变量配置hadoop_home,添加winutils.exe放到$HADOOP_HOME/bin下

    3,使用idea新建maven项目,其中pom.xml设置如下:

    <?xml version="1.0" encoding="UTF-8"?>
    <project xmlns="http://maven.apache.org/POM/4.0.0"
             xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
             xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
        <modelVersion>4.0.0</modelVersion>
    
        <groupId>big</groupId>
        <artifactId>data</artifactId>
        <version>1.0-SNAPSHOT</version>
        <dependencies>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
                <version>2.7.5</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-client</artifactId>
                <version>2.7.5</version>
            </dependency>
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-hdfs</artifactId>
                <version>2.7.5</version>
            </dependency>
           <!-- <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-hdfs-client</artifactId>
                <version>2.7.5</version>
            </dependency>-->
            <dependency>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
                <version>2.7.5</version>
            </dependency>
        </dependencies>
    
    </project>

    4,拷贝ha集群中hadoop的配置文件到idea中resource中,hadoop的具体配置如下:

    core-site.xml:

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    <configuration>
           <property>
                    <name>fs.defaultFS</name>
                    <value>hdfs://mycluster</value>
           </property>
           <property>
                    <name>ha.zookeeper.quorum</name>
                    <value>cent1:2181,cent2:2181,cent3:2181</value>
            </property>
           <!--<property>
                   <name>hadoop.tmp.dir</name>
                   <value>/opt/hadoop2</value>
                   <description>A base for other temporary   directories.</description>
           </property>-->
    </configuration>

    hdfs-site.xml:

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
        <property>
                 <name>dfs.nameservices</name>
                 <value>mycluster</value>
        </property>
         <property>
                 <name>dfs.ha.namenodes.mycluster</name>
                 <value>nn1,nn2</value>
        </property>
         <property>
                 <name>dfs.namenode.rpc-address.mycluster.nn1</name>
                 <value>cent1:9000</value>
        </property>
         <property>
                 <name>dfs.namenode.rpc-address.mycluster.nn2</name>
                 <value>cent2:9000</value>
        </property>
         <property>
                 <name>dfs.namenode.http-address.mycluster.nn1</name>
                 <value>cent1:50070</value>
        </property>
         <property>
                 <name>dfs.namenode.http-address.mycluster.nn2</name>
                 <value>cent2:50070</value>
        </property>
         <property>
                 <name>dfs.namenode.shared.edits.dir</name>
                 <value>qjournal://cent2:8485;cent3:8485;cent4:8485/mycluster</value>
        </property>
         <property>
                 <name>dfs.client.failover.proxy.provider.mycluster</name>
                 <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
         <property>
                 <name>dfs.ha.fencing.methods</name>
                 <value>sshfence</value>
        </property>
         <property>
                 <name>dfs.ha.fencing.ssh.private-key-files</name>
                 <value>/root/.ssh/id_rsa</value>
        </property>
         <property>
                 <name>dfs.journalnode.edits.dir</name>
                 <value>/opt/jn/data</value>
        </property>
         <property>
                 <name>dfs.ha.automatic-failover.enabled</name>
                 <value>true</value>
        </property>
        <property>
            <name>dfs.permissions.enabled</name>
            <value>false</value>
        </property>
    
    </configuration>

    mapred-site.xml:

    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <property> 
          <name>mapreduce.framework.name</name>
              <value>yarn</value>
    </property>
    </configuration>

    yarn-site.xml:

    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
      <property>
                   <name>yarn.nodemanager.aux-services</name>
                   <value>mapreduce_shuffle</value>
            </property>
            <property>                                                               
                    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
            </property>
            <property>
                   <name>yarn.resourcemanager.hostname</name>
                   <value>cent1</value>
           </property>
    </configuration>

    log4j.properties:

    # Licensed to the Apache Software Foundation (ASF) under one
    # or more contributor license agreements.  See the NOTICE file
    # distributed with this work for additional information
    # regarding copyright ownership.  The ASF licenses this file
    # to you under the Apache License, Version 2.0 (the
    # "License"); you may not use this file except in compliance
    # with the License.  You may obtain a copy of the License at
    #
    #     http://www.apache.org/licenses/LICENSE-2.0
    #
    # Unless required by applicable law or agreed to in writing, software
    # distributed under the License is distributed on an "AS IS" BASIS,
    # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    # See the License for the specific language governing permissions and
    # limitations under the License.
    
    # Define some default values that can be overridden by system properties
    hadoop.root.logger=INFO,console
    hadoop.log.dir=.
    hadoop.log.file=hadoop.log
    
    # Define the root logger to the system property "hadoop.root.logger".
    log4j.rootLogger=${hadoop.root.logger}, EventCounter
    
    # Logging Threshold
    log4j.threshold=ALL
    
    # Null Appender
    log4j.appender.NullAppender=org.apache.log4j.varia.NullAppender
    
    #
    # Rolling File Appender - cap space usage at 5gb.
    #
    hadoop.log.maxfilesize=256MB
    hadoop.log.maxbackupindex=20
    log4j.appender.RFA=org.apache.log4j.RollingFileAppender
    log4j.appender.RFA.File=${hadoop.log.dir}/${hadoop.log.file}
    
    log4j.appender.RFA.MaxFileSize=${hadoop.log.maxfilesize}
    log4j.appender.RFA.MaxBackupIndex=${hadoop.log.maxbackupindex}
    
    log4j.appender.RFA.layout=org.apache.log4j.PatternLayout
    
    # Pattern format: Date LogLevel LoggerName LogMessage
    log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    # Debugging Pattern format
    #log4j.appender.RFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
    
    
    #
    # Daily Rolling File Appender
    #
    
    log4j.appender.DRFA=org.apache.log4j.DailyRollingFileAppender
    log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file}
    
    # Rollover at midnight
    log4j.appender.DRFA.DatePattern=.yyyy-MM-dd
    
    log4j.appender.DRFA.layout=org.apache.log4j.PatternLayout
    
    # Pattern format: Date LogLevel LoggerName LogMessage
    log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    # Debugging Pattern format
    #log4j.appender.DRFA.layout.ConversionPattern=%d{ISO8601} %-5p %c{2} (%F:%M(%L)) - %m%n
    
    
    #
    # console
    # Add "console" to rootlogger above if you want to use this 
    #
    
    log4j.appender.console=org.apache.log4j.ConsoleAppender
    log4j.appender.console.target=System.err
    log4j.appender.console.layout=org.apache.log4j.PatternLayout
    log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
    
    #
    # TaskLog Appender
    #
    
    #Default values
    hadoop.tasklog.taskid=null
    hadoop.tasklog.iscleanup=false
    hadoop.tasklog.noKeepSplits=4
    hadoop.tasklog.totalLogFileSize=100
    hadoop.tasklog.purgeLogSplits=true
    hadoop.tasklog.logsRetainHours=12
    
    log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
    log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
    log4j.appender.TLA.isCleanup=${hadoop.tasklog.iscleanup}
    log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
    
    log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
    log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    
    #
    # HDFS block state change log from block manager
    #
    # Uncomment the following to suppress normal block state change
    # messages from BlockManager in NameNode.
    #log4j.logger.BlockStateChange=WARN
    
    #
    #Security appender
    #
    hadoop.security.logger=INFO,NullAppender
    hadoop.security.log.maxfilesize=256MB
    hadoop.security.log.maxbackupindex=20
    log4j.category.SecurityLogger=${hadoop.security.logger}
    hadoop.security.log.file=SecurityAuth-${user.name}.audit
    log4j.appender.RFAS=org.apache.log4j.RollingFileAppender 
    log4j.appender.RFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
    log4j.appender.RFAS.layout=org.apache.log4j.PatternLayout
    log4j.appender.RFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    log4j.appender.RFAS.MaxFileSize=${hadoop.security.log.maxfilesize}
    log4j.appender.RFAS.MaxBackupIndex=${hadoop.security.log.maxbackupindex}
    
    #
    # Daily Rolling Security appender
    #
    log4j.appender.DRFAS=org.apache.log4j.DailyRollingFileAppender 
    log4j.appender.DRFAS.File=${hadoop.log.dir}/${hadoop.security.log.file}
    log4j.appender.DRFAS.layout=org.apache.log4j.PatternLayout
    log4j.appender.DRFAS.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
    log4j.appender.DRFAS.DatePattern=.yyyy-MM-dd
    
    #
    # hadoop configuration logging
    #
    
    # Uncomment the following line to turn off configuration deprecation warnings.
    # log4j.logger.org.apache.hadoop.conf.Configuration.deprecation=WARN
    
    #
    # hdfs audit logging
    #
    hdfs.audit.logger=INFO,NullAppender
    hdfs.audit.log.maxfilesize=256MB
    hdfs.audit.log.maxbackupindex=20
    log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=${hdfs.audit.logger}
    log4j.additivity.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=false
    log4j.appender.RFAAUDIT=org.apache.log4j.RollingFileAppender
    log4j.appender.RFAAUDIT.File=${hadoop.log.dir}/hdfs-audit.log
    log4j.appender.RFAAUDIT.layout=org.apache.log4j.PatternLayout
    log4j.appender.RFAAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    log4j.appender.RFAAUDIT.MaxFileSize=${hdfs.audit.log.maxfilesize}
    log4j.appender.RFAAUDIT.MaxBackupIndex=${hdfs.audit.log.maxbackupindex}
    
    #
    # mapred audit logging
    #
    mapred.audit.logger=INFO,NullAppender
    mapred.audit.log.maxfilesize=256MB
    mapred.audit.log.maxbackupindex=20
    log4j.logger.org.apache.hadoop.mapred.AuditLogger=${mapred.audit.logger}
    log4j.additivity.org.apache.hadoop.mapred.AuditLogger=false
    log4j.appender.MRAUDIT=org.apache.log4j.RollingFileAppender
    log4j.appender.MRAUDIT.File=${hadoop.log.dir}/mapred-audit.log
    log4j.appender.MRAUDIT.layout=org.apache.log4j.PatternLayout
    log4j.appender.MRAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    log4j.appender.MRAUDIT.MaxFileSize=${mapred.audit.log.maxfilesize}
    log4j.appender.MRAUDIT.MaxBackupIndex=${mapred.audit.log.maxbackupindex}
    
    # Custom Logging levels
    
    #log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
    #log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG
    #log4j.logger.org.apache.hadoop.hdfs.server.namenode.FSNamesystem.audit=DEBUG
    
    # Jets3t library
    log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR
    
    # AWS SDK & S3A FileSystem
    log4j.logger.com.amazonaws=ERROR
    log4j.logger.com.amazonaws.http.AmazonHttpClient=ERROR
    log4j.logger.org.apache.hadoop.fs.s3a.S3AFileSystem=WARN
    
    #
    # Event Counter Appender
    # Sends counts of logging messages at different severity levels to Hadoop Metrics.
    #
    log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
    
    #
    # Job Summary Appender 
    #
    # Use following logger to send summary to separate file defined by 
    # hadoop.mapreduce.jobsummary.log.file :
    # hadoop.mapreduce.jobsummary.logger=INFO,JSA
    # 
    hadoop.mapreduce.jobsummary.logger=${hadoop.root.logger}
    hadoop.mapreduce.jobsummary.log.file=hadoop-mapreduce.jobsummary.log
    hadoop.mapreduce.jobsummary.log.maxfilesize=256MB
    hadoop.mapreduce.jobsummary.log.maxbackupindex=20
    log4j.appender.JSA=org.apache.log4j.RollingFileAppender
    log4j.appender.JSA.File=${hadoop.log.dir}/${hadoop.mapreduce.jobsummary.log.file}
    log4j.appender.JSA.MaxFileSize=${hadoop.mapreduce.jobsummary.log.maxfilesize}
    log4j.appender.JSA.MaxBackupIndex=${hadoop.mapreduce.jobsummary.log.maxbackupindex}
    log4j.appender.JSA.layout=org.apache.log4j.PatternLayout
    log4j.appender.JSA.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{2}: %m%n
    log4j.logger.org.apache.hadoop.mapred.JobInProgress$JobSummary=${hadoop.mapreduce.jobsummary.logger}
    log4j.additivity.org.apache.hadoop.mapred.JobInProgress$JobSummary=false
    
    #
    # Yarn ResourceManager Application Summary Log 
    #
    # Set the ResourceManager summary log filename
    yarn.server.resourcemanager.appsummary.log.file=rm-appsummary.log
    # Set the ResourceManager summary log level and appender
    yarn.server.resourcemanager.appsummary.logger=${hadoop.root.logger}
    #yarn.server.resourcemanager.appsummary.logger=INFO,RMSUMMARY
    
    # To enable AppSummaryLogging for the RM, 
    # set yarn.server.resourcemanager.appsummary.logger to 
    # <LEVEL>,RMSUMMARY in hadoop-env.sh
    
    # Appender for ResourceManager Application Summary Log
    # Requires the following properties to be set
    #    - hadoop.log.dir (Hadoop Log directory)
    #    - yarn.server.resourcemanager.appsummary.log.file (resource manager app summary log filename)
    #    - yarn.server.resourcemanager.appsummary.logger (resource manager app summary log level and appender)
    
    log4j.logger.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=${yarn.server.resourcemanager.appsummary.logger}
    log4j.additivity.org.apache.hadoop.yarn.server.resourcemanager.RMAppManager$ApplicationSummary=false
    log4j.appender.RMSUMMARY=org.apache.log4j.RollingFileAppender
    log4j.appender.RMSUMMARY.File=${hadoop.log.dir}/${yarn.server.resourcemanager.appsummary.log.file}
    log4j.appender.RMSUMMARY.MaxFileSize=256MB
    log4j.appender.RMSUMMARY.MaxBackupIndex=20
    log4j.appender.RMSUMMARY.layout=org.apache.log4j.PatternLayout
    log4j.appender.RMSUMMARY.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    
    # HS audit log configs
    #mapreduce.hs.audit.logger=INFO,HSAUDIT
    #log4j.logger.org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger=${mapreduce.hs.audit.logger}
    #log4j.additivity.org.apache.hadoop.mapreduce.v2.hs.HSAuditLogger=false
    #log4j.appender.HSAUDIT=org.apache.log4j.DailyRollingFileAppender
    #log4j.appender.HSAUDIT.File=${hadoop.log.dir}/hs-audit.log
    #log4j.appender.HSAUDIT.layout=org.apache.log4j.PatternLayout
    #log4j.appender.HSAUDIT.layout.ConversionPattern=%d{ISO8601} %p %c{2}: %m%n
    #log4j.appender.HSAUDIT.DatePattern=.yyyy-MM-dd
    
    # Http Server Request Logs
    #log4j.logger.http.requests.namenode=INFO,namenoderequestlog
    #log4j.appender.namenoderequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    #log4j.appender.namenoderequestlog.Filename=${hadoop.log.dir}/jetty-namenode-yyyy_mm_dd.log
    #log4j.appender.namenoderequestlog.RetainDays=3
    
    #log4j.logger.http.requests.datanode=INFO,datanoderequestlog
    #log4j.appender.datanoderequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    #log4j.appender.datanoderequestlog.Filename=${hadoop.log.dir}/jetty-datanode-yyyy_mm_dd.log
    #log4j.appender.datanoderequestlog.RetainDays=3
    
    #log4j.logger.http.requests.resourcemanager=INFO,resourcemanagerrequestlog
    #log4j.appender.resourcemanagerrequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    #log4j.appender.resourcemanagerrequestlog.Filename=${hadoop.log.dir}/jetty-resourcemanager-yyyy_mm_dd.log
    #log4j.appender.resourcemanagerrequestlog.RetainDays=3
    
    #log4j.logger.http.requests.jobhistory=INFO,jobhistoryrequestlog
    #log4j.appender.jobhistoryrequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    #log4j.appender.jobhistoryrequestlog.Filename=${hadoop.log.dir}/jetty-jobhistory-yyyy_mm_dd.log
    #log4j.appender.jobhistoryrequestlog.RetainDays=3
    
    #log4j.logger.http.requests.nodemanager=INFO,nodemanagerrequestlog
    #log4j.appender.nodemanagerrequestlog=org.apache.hadoop.http.HttpRequestLogAppender
    #log4j.appender.nodemanagerrequestlog.Filename=${hadoop.log.dir}/jetty-nodemanager-yyyy_mm_dd.log
    #log4j.appender.nodemanagerrequestlog.RetainDays=3

    二,编写WordCount程序

    import java.io.IOException;
    import java.net.URI;
    import java.util.StringTokenizer;
    
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.Mapper;
    import org.apache.hadoop.mapreduce.Reducer;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    public class WordCount {
    
        public static class TokenizerMapper
                extends Mapper<Object, Text, Text, IntWritable> {
    
            private final static IntWritable one = new IntWritable(1);
            private Text word = new Text();
    
            public void map(Object key, Text value, Context context
            ) throws IOException, InterruptedException {
                StringTokenizer itr = new StringTokenizer(value.toString());
                while (itr.hasMoreTokens()) {
                    word.set(itr.nextToken());
                    context.write(word, one);
                }
            }
        }
    
        public static class IntSumReducer
                extends Reducer<Text, IntWritable, Text, IntWritable> {
            private IntWritable result = new IntWritable();
    
            public void reduce(Text key, Iterable<IntWritable> values,
                               Context context
            ) throws IOException, InterruptedException {
                int sum = 0;
                for (IntWritable val : values) {
                    sum += val.get();
                }
                result.set(sum);
                context.write(key, result);
            }
        }
    
        public static void main(String[] args) throws Exception {
            Configuration conf = new Configuration();
           System.setProperty("hadoop.home.dir", "E:\softs\majorSoft\hadoop-2.7.5");//初始时解决winutils异常
            conf.set("mapreduce.app-submission.cross-platform", "true");//允许远程访问
            Path input = new Path(URI.create("hdfs://mycluster/testFile/wordCount"));
            Path output = new Path(URI.create("hdfs://mycluster/output"));
            Job job = Job.getInstance(conf, "word count");
            job.setJar("E:\bigData\hadoopDemo\out\artifacts\wordCount_jar\hadoopDemo.jar");//必须要先打包出jar包
            job.setJarByClass(WordCount.class);
            job.setMapperClass(TokenizerMapper.class);
            job.setCombinerClass(IntSumReducer.class);
            job.setReducerClass(IntSumReducer.class);
    
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
            FileInputFormat.addInputPath(job, input);
            FileOutputFormat.setOutputPath(job, output);
            System.exit(job.waitForCompletion(true) ? 0 : 1);
        }
    }  

    三,遇到的异常

    1,RuntimeException, ClassNotFoundException: Class WordCount$Map not found . Mapper class issue
    job.setJar("WordCount.jar");
    
    2,Exception message:/bin/bash:第0行fg:无任务控制  #表示运行远程访问格式
    conf.set(“mapreduce.app-submission.cross-platform”, “true”);
    和设置hdfs-site.xml
    <property>
            <name>dfs.permissions.enabled</name>
            <value>false</value>
        </property>
    
    3. java.io.IOException: Could not locate executable nullinwinutils.exe in the Hadoop binaries.
    System.setProperty("hadoop.home.dir", "E:\softs\majorSoft\hadoop-2.7.5");
    
    4,无法访问hdfs权限和识别不到集群
    修改C:WindowsSystem32driversetc文件
  • 相关阅读:
    jython resources
    Installing a Library of Jython ScriptsPart of the WebSphere Application Server v7.x Administration Series Series
    jython好资料
    ulipad install on 64bit win7 has issue
    an oracle article in high level to descibe how to archtichre operator JAVA relevet project
    table的宽度,单元格内换行问题
    Linux常用命令大全
    dedecms系统后台登陆提示用户名密码不存在
    登录织梦后台提示用户名不存在的解决方法介绍
    Shell常用命令整理
  • 原文地址:https://www.cnblogs.com/ksWorld/p/8670165.html
Copyright © 2011-2022 走看看