zoukankan      html  css  js  c++  java
  • Hive Serde(四)

    Hive Serde

    目的:

    ​ Hive Serde用来做序列化和反序列化,构建在数据存储和执行引擎之间,对两者实现解耦。

    应用场景:

    ​ 1、hive主要用来存储结构化数据,如果结构化数据存储的格式嵌套比较复杂的时候,可以使用serde的方式,利用正则表达式匹配的方法来读取数据,例如,表字段如下:id,name,map<string,array<map<string,string>>>

    ​ 2、当读取数据的时候,数据的某些特殊格式不希望显示在数据中,如:

    192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-upper.png HTTP/1.1" 304 -

    不希望数据显示的时候包含[]或者"",此时可以考虑使用serde的方式

    语法规则:

    		row_format
    		: DELIMITED 
              [FIELDS TERMINATED BY char [ESCAPED BY char]] 
              [COLLECTION ITEMS TERMINATED BY char] 
              [MAP KEYS TERMINATED BY char] 
              [LINES TERMINATED BY char] 
    		: SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, 										            property_name=property_value, ...)]
    

    应用案例:

    ​ 1、数据文件

    192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-nav.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /asf-logo.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-button.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-middle.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /asf-logo.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-middle.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-button.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-nav.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-button.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
    

    ​ 2、基本操作:

    --创建表
    	CREATE TABLE logtbl (
    	    host STRING,
    	    identity STRING,
    	    t_user STRING,
    	    time STRING,
    	    request STRING,
    	    referer STRING,
    	    agent STRING)
    	  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
    	  WITH SERDEPROPERTIES (
    	    "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) \[(.*)\] "(.*)" (-|[0-9]*) (-|[0-		9]*)"
    	  )
      STORED AS TEXTFILE;
    --加载数据
    	load data local inpath '/root/data/log' into table logtbl;
    --查询操作
    	select * from logtbl;
    --数据显示如下(不包含[]和")
    192.168.57.4	-	-	29/Feb/2019:18:14:35 +0800	GET /bg-upper.png HTTP/1.1	304	-
    192.168.57.4	-	-	29/Feb/2019:18:14:35 +0800	GET /bg-nav.png HTTP/1.1	304	-
    192.168.57.4	-	-	29/Feb/2019:18:14:35 +0800	GET /asf-logo.png HTTP/1.1	304	-
    192.168.57.4	-	-	29/Feb/2019:18:14:35 +0800	GET /bg-button.png HTTP/1.1	304	-
    192.168.57.4	-	-	29/Feb/2019:18:14:35 +0800	GET /bg-middle.png HTTP/1.1	304	-
    	
    
  • 相关阅读:
    EM算法
    最大熵模型中的对数似然函数的解释
    PySpark 自定义函数 UDF
    PySpark 自定义聚合函数 UDAF
    Numpy总结
    Examples
    Examples
    Examples
    Examples
    Examples
  • 原文地址:https://www.cnblogs.com/littlepage/p/11380766.html
Copyright © 2011-2022 走看看