zoukankan      html  css  js  c++  java
  • 如何读取Hadoop中压缩的文件

    最近在处理离线数据导入HBase的问题,涉及从Hdfs中读取gz压缩文件,把思路记录下来,以作备用。具体代码如下:

    package org.dba.util;
    import java.io.BufferedReader;
    import java.io.IOException;
    import java.io.InputStreamReader;
    import java.io.PrintStream;
    import org.apache.hadoop.conf.Configuration;
    import org.apache.hadoop.fs.FSDataInputStream;
    import org.apache.hadoop.fs.FileSystem;
    import org.apache.hadoop.fs.Path;
    import org.apache.hadoop.io.compress.CompressionCodec;
    import org.apache.hadoop.io.compress.CompressionCodecFactory;
    import org.apache.hadoop.io.compress.CompressionInputStream;
    
    public class ReadHdfs {
        public static void ReadFile(String fileName) throws IOException{
            Configuration conf = new Configuration();
            Path file = new Path(fileName);
            FileSystem fs = FileSystem.get(conf);
            FSDataInputStream hdfsInstream = fs.open(file);
            CompressionCodecFactory factory = new CompressionCodecFactory(conf);
            CompressionCodec codec = factory.getCodec(file);
            BufferedReader reader = null;
            try{
                if(codec == null){
                    reader = new BufferedReader(new InputStreamReader(hdfsInstream));
                }else{
                    CompressionInputStream comInStream = codec.createInputStream(hdfsInstream);
                    reader = new BufferedReader(new InputStreamReader(comInStream));
                    System.out.println(reader.readLine().substring(0, 100));
                }
            }catch(Exception e){
                e.printStackTrace();
            }
        }
        public static void main(String[] args) throws IOException{
            ReadFile(args[0]);
        }
    
    }
  • 相关阅读:
    C++数组释放问题
    C# 线程与进程
    Inspector面板Debug模式
    Unity实现汉诺塔游戏
    Unity中的销毁方法
    如何修改Unity中脚本模板
    序列帧动画
    Unity中的射线和射线图层过滤使用方法
    简单第一人称射击游戏
    C# 集合和泛型
  • 原文地址:https://www.cnblogs.com/ballwql/p/6616580.html
Copyright © 2011-2022 走看看