zoukankan      html  css  js  c++  java
  • Hadoop HDFS copyMergeFromLocal

    在谈到HDFS优化中,其中HDFS擅长处理大文件,而对于小文件常用的优化策略有压缩合并。在此列举小文件合并工具类供参考。

    
    
    /**
    * Get all the files in the directories that match the source file pattern
    * and merge and sort them to only one file on HDFS is kept.
    * 
    * Also adds a string between the files (useful for adding 
    
    * to a text file)
    * @param srcf: a file pattern specifying source files
    * @param dstf: a destination local file/directory
    * @param endline: if an end of line character is added to a text file 
    * @exception: IOException 
    */
    public static void copyMergeFromLocal(String srcf, Path dst, boolean endline)
                throws IOException {
            Configuration conf = new Configuration();
            Path srcPath = new Path(srcf);
            FileSystem dstFs = srcPath.getFileSystem(conf);
            FileSystem srcFs = FileSystem.getLocal(conf);
            Path[] srcs = FileUtil.stat2Paths(srcFs.globStatus(srcPath), srcPath);
            for (Path src : srcs) {
                FileUtil.copyMerge(srcFs, src,
                        dstFs, dst, false, conf,
                        endline ? "
    " : null);
            }
        }
    void copyMergeFromLocal(String srcf, Path dst) throws IOException {
        copyMergeFromLocal(srcf, dst, false);
      }

    在HDFS文件上传时,可以设置过滤条件,使小文件自动合并。

  • 相关阅读:
    ReactNative 打包 APK
    ReactNative常用命令
    ReactNative之坑:停在gradle一直出点
    Python搭建Web服务器,与Ajax交互,接收处理Get和Post请求的简易结构
    tensorflow 安装升级
    sqlserver 全库查询 带架构
    气象数据资料整理
    poj2841
    cf1430e
    cf1436d
  • 原文地址:https://www.cnblogs.com/cunchen/p/9464195.html
Copyright © 2011-2022 走看看