zoukankan      html  css  js  c++  java
  • 使用Hadoop API 解压缩 HDFS文件

    接上篇:使用Hadoop API 压缩HDFS文件

      压缩完了,当然需要解压缩了。

      直接上代码:

      

    private static void getFile(String filePath) throws IOException, ClassNotFoundException {
    
            FileSystem fs = FileSystem.get(URI.create(filePath), HDFSConf.getConf());
            Path path = new Path(filePath);
            if (fs.exists(path) ) {
    
    
                FSDataInputStream in;
                FSDataOutputStream out;
                Path outPath;
    
                FileStatus file = fs.getFileStatus(path);
                // 该压缩方法对应的文件扩展名
    
                outPath = new Path(filePath.substring(0,filePath.indexOf(".")) + ".new");
                logger.info("out put path is : " + outPath.toString());
    
                if (fs.createNewFile(outPath)) {
    
                    CompressionCodecFactory factory = new CompressionCodecFactory(HDFSConf.getConf());
                    CompressionCodec codec = factory.getCodec(file.getPath());
                    in = fs.open(file.getPath());
                    InputStream cin = codec.createInputStream(in);
                    logger.info("create file  : " + outPath.toString());
    
                    out = fs.append(outPath);
    
                    // 缓冲区设为5MB
                    IOUtils.copyBytes(cin, out, 1024 * 1024 * 5, false);
    
                    out.flush();
                    cin.close();
                    in.close();
                    out.close();
    
    
                    logger.info("Decompress file successful");
                } else {
                    logger.error("File exists");
                }
    
    
            } else {
                logger.info("There is no file :" + filePath);
            }
    
        }

    打包执行:  

    [hadoop@venn05 venn]$ java -cp compressHdfsFile-1.0-SNAPSHOT.jar com.utstarcom.hdfs.DeCompressFile /aaa/test/viewlog_20180402.log.gz
    2018-06-10 04:21:44.562 [Common.java] [main] 
    INFO : start init : 
    2018-06-10 04:21:44.566 [Common.java] [main] 
    INFO : properties path : /opt/hadoop/tmp/venn/
    /opt/hadoop/tmp/venn/hdfs.properties
    default.compress.format
    hdfs.uri
    2018-06-10 04:21:44.568 [Common.java] [main] 
    INFO : get System enviroment : 46
    2018-06-10 04:21:44.569 [Common.java] [main] 
    INFO : properties path : {hdfs.uri=hdfs://venn06:8020, default.compress.format=bz2}
    hdfs://venn06:8020/aaa/test/viewlog_20180402.log.gz
    log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
    log4j:WARN Please initialize the log4j system properly.
    log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
    2018-06-10 04:21:46.409 [DeCompressFile.java] [main] 
    INFO : out put path is : hdfs://venn06:8020/aaa/test/viewlog_20180402.new
    2018-06-10 04:21:46.623 [DeCompressFile.java] [main] 
    INFO : create file : hdfs://venn06:8020/aaa/test/viewlog_20180402.new
    2018-06-10 04:22:24.566 [DeCompressFile.java] [main] 
    INFO : Decompress file successful
    cost : 
    39 s

     文件大小: 249.4 M ,解压后大小:1.4 G,执行时间  39 s,很不错

    文件大小:
    [hadoop@ut01 venn]$ hadoop fs -ls /aaa/test/
    18/06/10 04:26:29 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Found 3 items
    -rw-r--r-- 3 hadoop supergroup 1515343101 2018-06-03 17:07 /aaa/test/viewlog_20180402.log
    -rw-r--r-- 3 hadoop supergroup 261506977 2018-06-09 15:46 /aaa/test/viewlog_20180402.log.gz
    -rw-r--r-- 3 hadoop supergroup 1515343101 2018-06-09 15:43 /aaa/test/viewlog_20180402.new
    [hadoop@ut01 venn]$ hadoop fs -ls -h /aaa/test/
    18/06/10 04:26:38 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Found 3 items
    -rw-r--r-- 3 hadoop supergroup 1.4 G 2018-06-03 17:07 /aaa/test/viewlog_20180402.log
    -rw-r--r-- 3 hadoop supergroup 249.4 M 2018-06-09 15:46 /aaa/test/viewlog_20180402.log.gz
    -rw-r--r-- 3 hadoop supergroup 1.4 G 2018-06-09 15:43 /aaa/test/viewlog_20180402.new

     项目地址:码云

  • 相关阅读:
    url
    松弛时间
    Linux下为当前用户添加 PYTHONPATH 环境变量
    ElasticSearch集群的安装(windows)
    软件开发安全
    java,判断手机设备跟adb建立连接
    question
    氚云后台代码小栗子,流程表单新增完成反写源单状态
    November Challenge 2020 Division 1
    February Challenge 2021 Division 1 选做
  • 原文地址:https://www.cnblogs.com/Springmoon-venn/p/9194654.html
Copyright © 2011-2022 走看看