zoukankan      html  css  js  c++  java
  • hadoop debug script

    A Hadoop job may consist of many map tasks and reduce tasks. Therefore, debugging a
    Hadoop job is often a complicated process. It is a good practice to first test a Hadoop job
    using unit tests by running it with a subset of the data.
    However, sometimes it is necessary to debug a Hadoop job in a distributed mode. To support
    such cases, Hadoop provides a mechanism called debug scripts. This recipe explains how to
    use debug scripts.

    A debug script is a shell script, and Hadoop executes the script whenever a task encounters
    an error. The script will have access to the $script, $stdout, $stderr, $syslog, and
    $jobconfproperties, as environment variables populated by Hadoop. You can find a
    sample script from resources/chapter3/debugscript. We can use the debug scripts
    to copy all the logfiles to a single location, e-mail them to a single e-mail account, or perform
    some analysis.
    LOG_FILE=HADOOP_HOME/error.log
    echo "Run the script" >> $LOG_FILE
    echo $script >> $LOG_FILE
    echo $stdout>> $LOG_FILE
    echo $stderr>> $LOG_FILE
    echo $syslog >> $LOG_FILE
    echo $jobconf>> $LOG_FILE

    when you execute this, you should pay attention to the execute path, or else it will not found debug script.

    package chapter3;
    
    import java.net.URI;
    
    import org.apache.hadoop.filecache.DistributedCache;
    import org.apache.hadoop.fs.FileStatus;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.IntWritable;
    import org.apache.hadoop.io.Text;
    import org.apache.hadoop.mapred.JobConf;
    import org.apache.hadoop.mapreduce.Job;
    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
    
    public class WordcountWithDebugScript {
        private static final String scriptFileLocation = "resources/chapter3/debugscript";
        private static final String HDFS_ROOT = "/debug";
    
        public static void setupFailedTaskScript(JobConf conf) throws Exception {
    
            // create a directory on HDFS where we'll upload the fail scripts
            FileSystem fs = FileSystem.get(conf);
            // Path debugDir = new Path("/debug");
            Path debugDir = new Path(HDFS_ROOT);
    
            // who knows what's already in this directory; let's just clear it.
            if (fs.exists(debugDir)) {
                fs.delete(debugDir, true);
            }
    
            // ...and then make sure it exists again
            fs.mkdirs(debugDir);
    
            // upload the local scripts into HDFS
            fs.copyFromLocalFile(new Path(scriptFileLocation), new Path(HDFS_ROOT
                    + "/fail-script"));
    
            FileStatus[] list = fs.listStatus(new Path(HDFS_ROOT));
            if (list == null || list.length == 0) {
                System.out.println("No File found");
            } else {
                for (FileStatus f : list) {
                    System.out.println("File found " + f.getPath());
                }
            }
    
            conf.setMapDebugScript("./fail-script");
            conf.setReduceDebugScript("./fail-script");
            // this create a simlink from the job directory to cache directory of
            // the mapper node
            DistributedCache.createSymlink(conf);
    
            URI fsUri = fs.getUri();
    
            String mapUriStr = fsUri.toString() + HDFS_ROOT
                    + "/fail-script#fail-script";
            System.out.println("added " + mapUriStr + "to distributed cache 1");
            URI mapUri = new URI(mapUriStr);
            // Following copy the map uri to the cache directory of the job node
            DistributedCache.addCacheFile(mapUri, conf);
        }
    
        public static void main(String[] args) throws Exception {
            JobConf conf = new JobConf();
            setupFailedTaskScript(conf);
            Job job = new Job(conf, "word count");
    
            job.setJarByClass(FaultyWordCount.class);
            job.setMapperClass(FaultyWordCount.TokenizerMapper.class);
            job.setReducerClass(FaultyWordCount.IntSumReducer.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
            FileSystem.get(conf).delete(new Path(args[1]), true);
            FileInputFormat.addInputPath(job, new Path(args[0]));
            FileOutputFormat.setOutputPath(job, new Path(args[1]));
            job.waitForCompletion(true);
        }
    
    }

    digest from mapreduce cookbook

    Looking for a job working at Home about MSBI
  • 相关阅读:
    【LeetCode】41. First Missing Positive (3 solutions)
    【LeetCode】42. Trapping Rain Water
    【LeetCode】164. Maximum Gap (2 solutions)
    【原创】SQLServer将数据导出为SQL脚本的方法
    Jconsole远程监控tomcat 的JVM内存(linux、windows)
    selenium + python自动化测试环境搭建
    LR--Controller的Pacing设置(不容忽视的设置)
    loadrunner录制回放常见问题及解决办法
    修改windows系统文件权限
    TestNG官方文档中文版(4)-运行TestNG
  • 原文地址:https://www.cnblogs.com/huaxiaoyao/p/4413488.html
Copyright © 2011-2022 走看看