zoukankan      html  css  js  c++  java
  • Hive Serde(四)

    Hive Serde

    目的:

    ​ Hive Serde用来做序列化和反序列化,构建在数据存储和执行引擎之间,对两者实现解耦。

    应用场景:

    ​ 1、hive主要用来存储结构化数据,如果结构化数据存储的格式嵌套比较复杂的时候,可以使用serde的方式,利用正则表达式匹配的方法来读取数据,例如,表字段如下:id,name,map<string,array<map<string,string>>>

    ​ 2、当读取数据的时候,数据的某些特殊格式不希望显示在数据中,如:

    192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-upper.png HTTP/1.1" 304 -

    不希望数据显示的时候包含[]或者"",此时可以考虑使用serde的方式

    语法规则:

    		row_format
    		: DELIMITED 
              [FIELDS TERMINATED BY char [ESCAPED BY char]] 
              [COLLECTION ITEMS TERMINATED BY char] 
              [MAP KEYS TERMINATED BY char] 
              [LINES TERMINATED BY char] 
    		: SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, 										            property_name=property_value, ...)]
    

    应用案例:

    ​ 1、数据文件

    192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-nav.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /asf-logo.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-button.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:35 +0800] "GET /bg-middle.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /asf-logo.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-middle.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-button.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-nav.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET / HTTP/1.1" 200 11217
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.css HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /tomcat.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-button.png HTTP/1.1" 304 -
    192.168.57.4 - - [29/Feb/2019:18:14:36 +0800] "GET /bg-upper.png HTTP/1.1" 304 -
    

    ​ 2、基本操作:

    --创建表
    	CREATE TABLE logtbl (
    	    host STRING,
    	    identity STRING,
    	    t_user STRING,
    	    time STRING,
    	    request STRING,
    	    referer STRING,
    	    agent STRING)
    	  ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
    	  WITH SERDEPROPERTIES (
    	    "input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) \[(.*)\] "(.*)" (-|[0-9]*) (-|[0-		9]*)"
    	  )
      STORED AS TEXTFILE;
    --加载数据
    	load data local inpath '/root/data/log' into table logtbl;
    --查询操作
    	select * from logtbl;
    --数据显示如下(不包含[]和")
    192.168.57.4	-	-	29/Feb/2019:18:14:35 +0800	GET /bg-upper.png HTTP/1.1	304	-
    192.168.57.4	-	-	29/Feb/2019:18:14:35 +0800	GET /bg-nav.png HTTP/1.1	304	-
    192.168.57.4	-	-	29/Feb/2019:18:14:35 +0800	GET /asf-logo.png HTTP/1.1	304	-
    192.168.57.4	-	-	29/Feb/2019:18:14:35 +0800	GET /bg-button.png HTTP/1.1	304	-
    192.168.57.4	-	-	29/Feb/2019:18:14:35 +0800	GET /bg-middle.png HTTP/1.1	304	-
    	
    
  • 相关阅读:
    209. Minimum Size Subarray Sum
    208. Implement Trie (Prefix Tree)
    207. Course Schedule
    206. Reverse Linked List
    205. Isomorphic Strings
    204. Count Primes
    203. Remove Linked List Elements
    201. Bitwise AND of Numbers Range
    199. Binary Tree Right Side View
    ArcGIS API for JavaScript 4.2学习笔记[8] 2D与3D视图同步
  • 原文地址:https://www.cnblogs.com/littlepage/p/11380766.html
Copyright © 2011-2022 走看看