zoukankan      html  css  js  c++  java
  • 如何读取Hadoop中压缩的文件

    最近在处理离线数据导入HBase的问题,涉及从Hdfs中读取gz压缩文件,把思路记录下来,以作备用。具体代码如下:

    package org.dba.util;
    import java.io.BufferedReader;
    import java.io.IOException;
    import java.io.InputStreamReader;
    import java.io.PrintStream;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FSDataInputStream;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.compress.CompressionCodec;
    import org.apache.hadoop.io.compress.CompressionCodecFactory;
    import org.apache.hadoop.io.compress.CompressionInputStream;
    
    public class ReadHdfs {
        public static void ReadFile(String fileName) throws IOException{
            Configuration conf = new Configuration();
            Path file = new Path(fileName);
            FileSystem fs = FileSystem.get(conf);
            FSDataInputStream hdfsInstream = fs.open(file);
            CompressionCodecFactory factory = new CompressionCodecFactory(conf);
            CompressionCodec codec = factory.getCodec(file);
            BufferedReader reader = null;
            try{
                if(codec == null){
                    reader = new BufferedReader(new InputStreamReader(hdfsInstream));
                }else{
                    CompressionInputStream comInStream = codec.createInputStream(hdfsInstream);
                    reader = new BufferedReader(new InputStreamReader(comInStream));
                    System.out.println(reader.readLine().substring(0, 100));
                }
            }catch(Exception e){
                e.printStackTrace();
            }
        }
        public static void main(String[] args) throws IOException{
            ReadFile(args[0]);
        }
    
    }
  • 相关阅读:
    10月9日学习日志
    10月2日学习日志
    11月3日学习日志
    10月5日学习日志
    10月6日学习日志
    10月7日学习日志
    11月4日学习日志
    AccessDatabaseEngine.exe 32位64安装失败问题
    国产各安卓系统接收消息在杀进程后
    SAP容差组
  • 原文地址:https://www.cnblogs.com/ballwql/p/6616580.html
Copyright © 2011-2022 走看看