zoukankan      html  css  js  c++  java
  • Hadoop HDFS copyMergeFromLocal

    在谈到HDFS优化中,其中HDFS擅长处理大文件,而对于小文件常用的优化策略有压缩合并。在此列举小文件合并工具类供参考。

    
    
    /**
    * Get all the files in the directories that match the source file pattern
    * and merge and sort them to only one file on HDFS is kept.
    * 
    * Also adds a string between the files (useful for adding 
    
    * to a text file)
    * @param srcf: a file pattern specifying source files
    * @param dstf: a destination local file/directory
    * @param endline: if an end of line character is added to a text file 
    * @exception: IOException 
    */
    public static void copyMergeFromLocal(String srcf, Path dst, boolean endline)
                throws IOException {
            Configuration conf = new Configuration();
            Path srcPath = new Path(srcf);
            FileSystem dstFs = srcPath.getFileSystem(conf);
            FileSystem srcFs = FileSystem.getLocal(conf);
            Path[] srcs = FileUtil.stat2Paths(srcFs.globStatus(srcPath), srcPath);
            for (Path src : srcs) {
                FileUtil.copyMerge(srcFs, src,
                        dstFs, dst, false, conf,
                        endline ? "
    " : null);
            }
        }
    void copyMergeFromLocal(String srcf, Path dst) throws IOException {
        copyMergeFromLocal(srcf, dst, false);
      }

    在HDFS文件上传时,可以设置过滤条件,使小文件自动合并。

  • 相关阅读:
    JS 创建对象的几种方式
    JS跨域笔记
    HTML5随笔
    css3随笔
    CSS3最简洁的轮播图
    canvas createRadialGradient 用法
    git 初级
    Oracle数据库字符集与国家字符集
    连接Oracle 19c出现ORA-28040:没有匹配的验证协议
    Vmware workstation虚拟机导入到esxi虚拟机
  • 原文地址:https://www.cnblogs.com/cunchen/p/9464195.html
Copyright © 2011-2022 走看看