zoukankan      html  css  js  c++  java
  • Hadoop HDFS copyMergeFromLocal

    在谈到HDFS优化中,其中HDFS擅长处理大文件,而对于小文件常用的优化策略有压缩合并。在此列举小文件合并工具类供参考。

    
    
    /**
    * Get all the files in the directories that match the source file pattern
    * and merge and sort them to only one file on HDFS is kept.
    * 
    * Also adds a string between the files (useful for adding 
    
    * to a text file)
    * @param srcf: a file pattern specifying source files
    * @param dstf: a destination local file/directory
    * @param endline: if an end of line character is added to a text file 
    * @exception: IOException 
    */
    public static void copyMergeFromLocal(String srcf, Path dst, boolean endline)
                throws IOException {
            Configuration conf = new Configuration();
            Path srcPath = new Path(srcf);
            FileSystem dstFs = srcPath.getFileSystem(conf);
            FileSystem srcFs = FileSystem.getLocal(conf);
            Path[] srcs = FileUtil.stat2Paths(srcFs.globStatus(srcPath), srcPath);
            for (Path src : srcs) {
                FileUtil.copyMerge(srcFs, src,
                        dstFs, dst, false, conf,
                        endline ? "
    " : null);
            }
        }
    void copyMergeFromLocal(String srcf, Path dst) throws IOException {
        copyMergeFromLocal(srcf, dst, false);
      }

    在HDFS文件上传时,可以设置过滤条件,使小文件自动合并。

  • 相关阅读:
    以太坊学习笔记
    linux找不到动态链接库
    centos7 firewall指定IP与端口访问
    VMware Fusion 序列号
    mysql pxc无法启动
    vmware workstation许可证密钥
    Gradle上传依赖到私服(nexus)
    Java对象操作工具
    Java获取不到请求的真实IP
    java8+ Lambda表达式基本用法
  • 原文地址:https://www.cnblogs.com/cunchen/p/9464193.html
Copyright © 2011-2022 走看看