zoukankan      html  css  js  c++  java
  • HIVE分区

    HIVE分区

    简单分区

    hive> CREATE TABLE psn_2(
        > id int,
        > name string,
        > likes array<string>,
        > address map<string,string>
        > )
        > PARTITIONED BY (age int)
        > ROW FORMAT DELIMITED
        > FIELDS TERMINATED BY ','
        > COLLECTION ITEMS TERMINATED BY '-'
        > MAP KEYS TERMINATED BY ':';
    OK
    Time taken: 20.107 seconds
    hive> desc formtted psn_2;
    FAILED: SemanticException [Error 10001]: Table not found formtted
    hive> desc formatted psn_2;
    OK
    # col_name            	data_type           	comment             
    	 	 
    id                  	int                 	                    
    name                	string              	                    
    likes               	array<string>       	                    
    address             	map<string,string>  	                    
    	 	 
    # Partition Information	 	 
    # col_name            	data_type           	comment             
    	 	 
    age                 	int                 	                    
    	 	 
    # Detailed Table Information	 	 
    Database:           	default             	 
    Owner:              	root                	 
    CreateTime:         	Sat Apr 25 23:45:22 CST 2020	 
    LastAccessTime:     	UNKNOWN             	 
    Protect Mode:       	None                	 
    Retention:          	0                   	 
    Location:           	hdfs://mycluster/user/hive/warehouse/psn_2	 
    Table Type:         	MANAGED_TABLE       	 
    Table Parameters:	 	 
    	transient_lastDdlTime	1587829522          
    	 	 
    # Storage Information	 	 
    SerDe Library:      	org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe	 
    InputFormat:        	org.apache.hadoop.mapred.TextInputFormat	 
    OutputFormat:       	org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat	 
    Compressed:         	No                  	 
    Num Buckets:        	-1                  	 
    Bucket Columns:     	[]                  	 
    Sort Columns:       	[]                  	 
    Storage Desc Params:	 	 
    	colelction.delim    	-                   
    	field.delim         	,                   
    	mapkey.delim        	:                   
    	serialization.format	,                   
    Time taken: 0.682 seconds, Fetched: 37 row(s)
    hive> LOAD DATA LOCAL INPATH '/root/data/data' INTO TABLE psn_2 PARTITION(age=10);
    Loading data to table default.psn_2 partition (age=10)
    Partition default.psn_2{age=10} stats: [numFiles=1, numRows=0, totalSize=419, rawDataSize=0]
    OK
    Time taken: 16.219 seconds
    hive> LOAD DATA LOCAL INPATH '/root/data/data' INTO TABLE psn_2 PARTITION(age=20);
    Loading data to table default.psn_2 partition (age=20)
    Partition default.psn_2{age=20} stats: [numFiles=1, numRows=0, totalSize=419, rawDataSize=0]
    OK
    Time taken: 1.096 seconds
    hive> select * from psn_2;
    OK
    1	小明1	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
    2	小明2	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
    3	小明3	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
    4	小明4	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
    5	小明5	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
    6	小明6	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
    7	小明7	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	10
    1	小明1	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
    2	小明2	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
    3	小明3	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
    4	小明4	["lol","book","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
    5	小明5	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
    6	小明6	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
    7	小明7	["lol","movie"]	{"beijing":"shangxuetang","shanghai":"pudong"}	20
    Time taken: 0.656 seconds, Fetched: 14 row(s)
    

    注意分区的字段不需要在table中定义出来,

    这样的分区是在表的文件下面创建分区文件夹,文件夹的名称为分区键的值,文件夹下面则是数据内容。

    多分区

    hive> CREATE TABLE psn_3(
        > id int,
        > name string,
        > likes array<string>,
        > address map<string,string>
        > )
        > PARTITIONED BY (age int,sex string)
        > ROW FORMAT DELIMITED
        > FIELDS TERMINATED BY ','
        > COLLECTION ITEMS TERMINATED BY '-'
        > MAP KEYS TERMINATED BY ':';
    OK
    Time taken: 20.107 seconds
    hive> LOAD DATA LOCAL INPATH '/root/data/data' INTO TABLE psn_2 PARTITION(age=10,sex='man');
    

    相同在hdfs的文件夹 psn_3/age=?/sex='?'/这样的组合文件夹。

    查看data的数据时候:

    [root@hadoopNode02 mgs]# hdfs dfs -cat /user/hive/warehouse/psn_2/age=10/data
    1,小明1,lol-book-movie,beijing:shangxuetang-shanghai:pudong
    2,小明2,lol-book-movie,beijing:shangxuetang-shanghai:pudong
    3,小明3,lol-book-movie,beijing:shangxuetang-shanghai:pudong
    4,小明4,lol-book-movie,beijing:shangxuetang-shanghai:pudong
    5,小明5,lol-movie,beijing:shangxuetang-shanghai:pudong
    6,小明6,lol-movie,beijing:shangxuetang-shanghai:pudong
    7,小明7,lol-movie,beijing:shangxuetang-shanghai:pudong
    

    age =10 的字段是文件夹带人给读入的

    添加分区

    添加分区是不按照顺序的

    ALTER TABLE table_name ADD [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];
     
    partition_spec:
      : (partition_column = partition_col_value, partition_column = partition_col_value, ...)
    

    例如:

    ALTEL TABLE table_name add PARTITION([这里添加进去的分区名称,定义好的分区所有字段])
    
    ALTER TABLE page_view ADD PARTITION (dt='2008-08-08', country='us') location '/path/to/us/part080808'
                              PARTITION (dt='2008-08-09', country='us') location '/path/to/us/part080809';
    

    删除分区

    ·添加的时候必须一次添加必须是指定分区的所有字段,但是删除可以分开删除字段,

    ALTER TABLE table_name drop [IF NOT EXISTS] PARTITION partition_spec [LOCATION 'location'][, PARTITION partition_spec [LOCATION 'location'], ...];
    
    记得加油学习哦^_^
  • 相关阅读:
    例解 Linux 下 Make 命令
    linux使用bin文件安装jdk
    Linux查看及设置系统字符集
    FTP的两种主动模式和被动模式
    Mongodb之主从复制
    Nginx配置认证登录
    AWK
    Redis+Keepalived实现高可用
    Redis哨兵配置
    Keepalived指定文件接收日志
  • 原文地址:https://www.cnblogs.com/shaoyayu/p/13434233.html
Copyright © 2011-2022 走看看