zoukankan      html  css  js  c++  java
  • hive SerDe序列化和反序列序列化表

    什么是SerDe

    SerDe 是两个单词的拼写 serialized(序列化) 和 deserialized(反序列化)。 什么是序列化和反序列化呢?

    当进程在进行远程通信时,彼此可以发送各种类型的数据,无论是什么类型的数据都会以 二进制序列的形式在网络上传送。发送方需要把对象转化为字节序列才可在网络上传输, 称为对象序列化;接收方则需要把字节序列恢复为对象,称为对象的反序列化。

    Hive的反序列化是对key/value反序列化成hive table的每个列的值。Hive可以方便 的将数据加载到表中而不需要对数据进行转换,这样在处理海量数据时可以节省大量的时间。

    Hive SerDe

    What is a SerDe?

    • SerDe is a short name for "Serializer and Deserializer."
    • Hive uses SerDe (and FileFormat) to read and write table rows.
    • HDFS files --> InputFileFormat --> <key, value> --> Deserializer --> Row object     (读流程)
    • Row object --> Serializer --> <key, value> --> OutputFileFormat --> HDFS files      (写流程)

    ----参考官网:https://cwiki.apache.org/confluence/display/Hive/DeveloperGuide#DeveloperGuide-HiveSerDe

    2 Built-in SerDes(SerDe包括内置类型)

    ----参考官网:https://cwiki.apache.org/confluence/display/Hive/SerDe

    序列化的使用

    3.1 建表时指定序列化方式

    · RegexSerDe

    CREATE TABLE apachelog (
      host STRING,
      identity STRING,
      user STRING,
      time STRING,
      request STRING,
      status STRING,
      size STRING,
      referer STRING,
      agent STRING)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
    WITH SERDEPROPERTIES (
      "input.regex" = "([^]*) ([^]*) ([^]*) (-|\[^\]*\]) ([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)(?: ([^ "]*|".*") ([^ "]*|".*"))?"
    )
    STORED AS TEXTFILE;

    · JsonSerDe

    ADD JAR /usr/lib/hive-hcatalog/lib/hive-hcatalog-core.jar;
    
    CREATE TABLE my_table(a string, b bigint, ...)
    ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
    STORED AS TEXTFILE;

    · CSVSerDe

    CREATE TABLE my_table(a string, b string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ("separatorChar" = "	","quoteChar"= "'","escapeChar"= "\")   STORED AS TEXTFILE;

    ·ORCSerDe

    ROW FORMAT SERDE
      'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
      STORED AS INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
      OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

    参考官网:

    Registration of Native SerDes

    As of Hive 0.14 a registration mechanism has been introduced for native Hive SerDes.  This allows dynamic binding between a "STORED AS" keyword in place of a triplet of {SerDe, InputFormat, and OutputFormat} specification, in CreateTable statements.

    The following mappings have been added through this registration mechanism:

    Syntax
    Equivalent
    Syntax
    Equivalent

    STORED AS AVRO /

    STORED AS AVROFILE

    ROW FORMAT SERDE
      'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
      STORED AS INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
      OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'

    STORED AS ORC /

    STORED AS ORCFILE

    ROW FORMAT SERDE
      'org.apache.hadoop.hive.ql.io.orc.OrcSerde'
      STORED AS INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcInputFormat'
      OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat'

    STORED AS PARQUET /

    STORED AS PARQUETFILE

    ROW FORMAT SERDE
      'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
      STORED AS INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
      OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
    STORED AS RCFILE
    STORED AS INPUTFORMAT
      'org.apache.hadoop.hive.ql.io.RCFileInputFormat'
      OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.RCFileOutputFormat'
    STORED AS SEQUENCEFILE
    STORED AS INPUTFORMAT
      'org.apache.hadoop.mapred.SequenceFileInputFormat'
      OUTPUTFORMAT
      'org.apache.hadoop.mapred.SequenceFileOutputFormat'
    STORED AS TEXTFILE
    STORED AS INPUTFORMAT
      'org.apache.hadoop.mapred.TextInputFormat'
      OUTPUTFORMAT
      'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
    欢迎各路侠客多多指教^_^
  • 相关阅读:
    html向servlet传乱码解决办法
    myeclipse破解方法
    django cookie
    django 时间计数
    bzoj1698 / P1606 [USACO07FEB]白银莲花池Lilypad Pond
    bzoj1689 / P1589 [Usaco2005 Open] Muddy roads 泥泞的路
    bzoj1660 / P2866 [USACO06NOV]糟糕的一天Bad Hair Day
    bzoj1657: [Usaco2006 Mar]Mooo 奶牛的歌声
    bzoj1655: [Usaco2006 Jan] Dollar Dayz 奶牛商店
    bzoj1654 / P2863 [USACO06JAN]牛的舞会The Cow Prom
  • 原文地址:https://www.cnblogs.com/cailingsunny/p/14691326.html
Copyright © 2011-2022 走看看