zoukankan html css js c++ java

MapReduce中，new Text()引发的写入HDFS的输出文件多一列的问题

前段时间业务系统有个模块数据没有了，在排查问题的时候发现中间处理环节出错了，错误日志为文件格式不正确，将数据导出后发现这个处理逻辑的输入文件中每一行都多了一列，而且是一个空列（列分隔符是）。第一次检查代码后没发现代码里多写了一列，第二次排查Reduce代码时，发现在写文件时value为空的Text():

 public void reduce(Text key, Iterator<Text> values,
            OutputCollector<Text, Text> output, Reporter reporter)
            throws IOException  
    {

        String keyString = key.toString();

        Iterator<Text> iterValue = values;

        Double totalSize = 0D;

        while (iterValue.hasNext())
        {
            String value = iterValue.next().toString();
            totalSize += Double.valueOf(value);
        }
        keyString += "	" + totalSize;
         //原来是这么写的
        // output.collect(new Text(keyString), new Text());
        //应当这么写（此处不推荐new Text(keyString)，正确的做法是定义全局的Text,使用的时候用text.Set()）：
        output.collect(new Text(keyString), null);
    }

参见上面的代码段。

如果在输出reduce结果时这么写：

output.collect(new Text(keyString), new Text());

就会导致结果文件中有三个。

将new Text() 改成null就可以解决问题了。

查看全文

相关阅读:
Two strings CodeForces
Dasha and Photos CodeForces
Largest Beautiful Number CodeForces
Timetable CodeForces
Financiers Game CodeForces
AC日记——整理药名 openjudge 1.7 15
AC日记——大小写字母互换 openjudge 1.7 14
AC日记——将字符串中的小写字母换成大写字母 openjudge 1.7 13
AC日记——加密的病历单 openjudge 1.7 12
AC日记——潜伏着 openjudge 1.7 11

原文地址：https://www.cnblogs.com/sixiweb/p/3835785.html