zoukankan      html  css  js  c++  java
  • splittability A SequenceFile can be split by Hadoop and distributed across map jobs whereas a GZIP file cannot be.

    splittability

     
    Skip to end of metadata
     
    Go to start of metadata
     

    Compressed Data Storage

    Keeping data compressed in Hive tables has, in some cases, been known to give better performance than uncompressed storage; both in terms of disk usage and query performance.

    You can import text files compressed with Gzip or Bzip2 directly into a table stored as TextFile. The compression will be detected automatically and the file will be decompressed on-the-fly during query execution. For example:

    CREATE TABLE raw (line STRING)
       ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY ' ';
     
    LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE raw;

    The table 'raw' is stored as a TextFile, which is the default storage. However, in this case Hadoop will not be able to split your file into chunks/blocks and run multiple maps in parallel. This can cause underutilization of your cluster's 'mapping' power.

    【 A SequenceFile can be split by Hadoop and distributed across map jobs whereas a GZIP file cannot be.】

    The recommended practice is to insert data into another table, which is stored as a SequenceFile. A SequenceFile can be split by Hadoop and distributed across map jobs whereas a GZIP file cannot be. For example:

    CREATE TABLE raw (line STRING)
       ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' LINES TERMINATED BY ' ';
     
    CREATE TABLE raw_sequence (line STRING)
       STORED AS SEQUENCEFILE;
     
    LOAD DATA LOCAL INPATH '/tmp/weblogs/20090603-access.log.gz' INTO TABLE raw;
     
    SET hive.exec.compress.output=true;
    SET io.seqfile.compression.type=BLOCK; -- NONE/RECORD/BLOCK (see below)
    INSERT OVERWRITE TABLE raw_sequence SELECT * FROM raw;

    The value for io.seqfile.compression.type determines how the compression is performed. Record compresses each value individually while BLOCK buffers up 1MB (default) before doing compression.

    LZO Compression

    See LZO Compression for information about using LZO with Hive.

     
  • 相关阅读:
    程序员 你中毒了吗?
    Win8 下安装 Live Writer 发布博客
    Rational Rose 2003 下载及破解方法(转载)
    如何在dos 下使用csc.exe命令?
    as 与 is
    【转载】关于工资的三个秘密
    C#反射(1)<转>
    C#常用字符串格式
    微软企业库EntLib5.0使用过程中常见的异常
    关于window7 AERO 声音 IIS 无线网络失效的解决办法
  • 原文地址:https://www.cnblogs.com/rsapaper/p/7764426.html
Copyright © 2011-2022 走看看