zoukankan      html  css  js  c++  java
  • 如何读取Hadoop中压缩的文件

    最近在处理离线数据导入HBase的问题,涉及从Hdfs中读取gz压缩文件,把思路记录下来,以作备用。具体代码如下:

    package org.dba.util;
    import java.io.BufferedReader;
    import java.io.IOException;
    import java.io.InputStreamReader;
    import java.io.PrintStream;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FSDataInputStream;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.compress.CompressionCodec;
    import org.apache.hadoop.io.compress.CompressionCodecFactory;
    import org.apache.hadoop.io.compress.CompressionInputStream;
    
    public class ReadHdfs {
        public static void ReadFile(String fileName) throws IOException{
            Configuration conf = new Configuration();
            Path file = new Path(fileName);
            FileSystem fs = FileSystem.get(conf);
            FSDataInputStream hdfsInstream = fs.open(file);
            CompressionCodecFactory factory = new CompressionCodecFactory(conf);
            CompressionCodec codec = factory.getCodec(file);
            BufferedReader reader = null;
            try{
                if(codec == null){
                    reader = new BufferedReader(new InputStreamReader(hdfsInstream));
                }else{
                    CompressionInputStream comInStream = codec.createInputStream(hdfsInstream);
                    reader = new BufferedReader(new InputStreamReader(comInStream));
                    System.out.println(reader.readLine().substring(0, 100));
                }
            }catch(Exception e){
                e.printStackTrace();
            }
        }
        public static void main(String[] args) throws IOException{
            ReadFile(args[0]);
        }
    
    }
  • 相关阅读:
    Linux堆内存管理深入分析 (上半部)【转】
    TCMalloc小记【转】
    Linux signal 那些事儿(4)信号的deliver顺序【转】
    Linux signal 那些事儿 (3)【转】
    Linux signal那些事儿【转】
    Linux signal 那些事儿(2)【转】
    常用的Firefox浏览器插件、Chrome浏览器插件收藏
    vi 常用命令
    清除浮动新说
    【荐】万能清除浮动样式
  • 原文地址:https://www.cnblogs.com/ballwql/p/6616580.html
Copyright © 2011-2022 走看看