zoukankan      html  css  js  c++  java
  • hive序列化和反序列化serde

    一.简介

    SerDe是Serializer/Deserializer的缩写。
    SerDe允许Hive读取表中的数据,并将其以任何自定义格式写回HDFS。 任何人都可以为自己的数据格式编写自己的SerDe。

    序列化与反序列化的作用

    1,序列化是对象转化为字节序列的过程;

    2,反序列化是字节码恢复为对象的过程;

    序列化的作用主要有两个:

    (1)对象向的持久化;即把对象转换成字节码后保存文件;

    (2)对象数据的传输;

    反序列化的主要作用:

    对<key,value>反序列化成Hive table的每一列的值;Hive可以方便的将数据加载到表中而不需要对数据进行转换,这样在海量数据处理时,可以节省大量的时间。


    二.内置SerDes
    Avro
    ORC
    Regex
    Thrift
    parquet
    CSV
    JsonSerDe

    三.Serde使用

    1.RegexSerde

    CREATE TABLE apachelog (
      host STRING,
      identity STRING,
      user STRING,
      time STRING,
      request STRING,
      status STRING,
      size STRING,
      referer STRING,
      agent STRING)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
    WITH SERDEPROPERTIES (
      "input.regex" = "([^]*) ([^]*) ([^]*) (-|\[^\]*\]) ([^ "]*|"[^"]*") (-|[0-9]*) (-|[0-9]*)(?: ([^ "]*|".*") ([^ "]*|".*"))?"
    )
    STORED AS TEXTFILE;

    2.CsvSerde

    CREATE TABLE my_table(a string, b string, ...)
    ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
    WITH SERDEPROPERTIES (
       "separatorChar" = "	",
       "quoteChar"     = "'",
       "escapeChar"    = "\"
    )  
    STORED AS TEXTFILE;
    DEFAULT_ESCAPE_CHARACTER
    DEFAULT_QUOTE_CHARACTER  "
    DEFAULT_SEPARATOR        ,

    3.JsonSerde

    CREATE TABLE json_nested_test (
    country string,
    languages array<string>,
    religions map<string,array<int>>
    )
    ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
    STORED AS TEXTFILE;
    
    #查询
    select country,languages,languages[0],religions,religions['catholic'][0] from json_nested_test;

    记录格式:

    {"country":"Switzerland","languages":["German","French","Italian"],"religions":{"catholic":[10,20],"protestant":[40,50]}}
  • 相关阅读:
    LeetCode 326. Power of Three
    LeetCode 324. Wiggle Sort II
    LeetCode 322. Coin Change
    LeetCode 321. Create Maximum Number
    LeetCode 319. Bulb Switcher
    LeetCode 318. Maximum Product of Word Lengths
    LeetCode 310. Minimum Height Trees (DFS)
    个人站点大开发!--起始篇
    LeetCode 313. Super Ugly Number
    LeetCode 309. Best Time to Buy and Sell Stock with Cooldown (DP)
  • 原文地址:https://www.cnblogs.com/wangbin2188/p/11739090.html
Copyright © 2011-2022 走看看