zoukankan      html  css  js  c++  java
  • Hadoop HDFS copyMergeFromLocal

    在谈到HDFS优化中,其中HDFS擅长处理大文件,而对于小文件常用的优化策略有压缩合并。在此列举小文件合并工具类供参考。

    
    
    /**
    * Get all the files in the directories that match the source file pattern
    * and merge and sort them to only one file on HDFS is kept.
    * 
    * Also adds a string between the files (useful for adding 
    
    * to a text file)
    * @param srcf: a file pattern specifying source files
    * @param dstf: a destination local file/directory
    * @param endline: if an end of line character is added to a text file 
    * @exception: IOException 
    */
    public static void copyMergeFromLocal(String srcf, Path dst, boolean endline)
                throws IOException {
            Configuration conf = new Configuration();
            Path srcPath = new Path(srcf);
            FileSystem dstFs = srcPath.getFileSystem(conf);
            FileSystem srcFs = FileSystem.getLocal(conf);
            Path[] srcs = FileUtil.stat2Paths(srcFs.globStatus(srcPath), srcPath);
            for (Path src : srcs) {
                FileUtil.copyMerge(srcFs, src,
                        dstFs, dst, false, conf,
                        endline ? "
    " : null);
            }
        }
    void copyMergeFromLocal(String srcf, Path dst) throws IOException {
        copyMergeFromLocal(srcf, dst, false);
      }

    在HDFS文件上传时,可以设置过滤条件,使小文件自动合并。

  • 相关阅读:
    MSSQL复制表
    分享职场心得《7》
    分享职场心得《2》
    分享职场心得《3》
    免费收录网站搜索引擎登录入口最新版
    读写分离,读写分离死锁解决方案
    分享职场心得《5》
    分享职场心得《6》
    分享职场心得《1》
    分享职场心得《4》
  • 原文地址:https://www.cnblogs.com/cunchen/p/9464195.html
Copyright © 2011-2022 走看看