Hive SQL 语法详解:http://blog.csdn.net/hguisu/article/details/7256833
Hive SQL 学习笔记(常用):http://blog.sina.com.cn/s/blog_66474b16010182yu.html
Hive中的分区:http://blog.csdn.net/jiedushi/article/details/6660185
Hive基础知识:http://www.csdn.net/article/2014-01-07/2818052-about-hive
HiveJavaAPI:http://787141854-qq-com.iteye.com/blog/2068303
hive的group by速度慢,因为需要用到hadoop的map-reduce。这个可以在spark中实现
启动:hive --service hiveserver2
常用:
建表:CREATE TABLE pokes (foo INT, bar STRING);
建分区表:分区有data和pos, ip的描述:'IP Address of the User',用COMMENT来定义
字段之间用' '分割,行之间是断行
如果文件数据是纯文本,可以使用 STORED AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCE
CREATE TABLE par_table(viewTime INT, userid BIGINT,
page_url STRING, referrer_url STRING,
ip STRING COMMENT 'IP Address of the User')
COMMENT 'This is the page view table'
PARTITIONED BY(date STRING, pos STRING)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
lines terminated by '
'
STORED AS SEQUENCEFILE;
对分区的操作
(1). 如何定义分区,创建分区
创建分区表:
hive> create table test(name string,sex int) partitioned by (birth string, age string);
添加3个分区:
hive> alter table test add partition (birth='1980', age ='30');
hive> alter table test add partition (birth='1981', age ='29');
hive> alter table test add partition (birth='1982', age ='28');
hive> show partitions test;
birth=1980/age =30
birth=1981/age =29
birth=1982/age =28
(2)如何删除分区
hive> alter table test drop partition (birth='1980',age='30');
(3)加载数据到指定分区
load data local inpath '/home/hadoop/data.log' overwrite into table test partition(birth='1980-01-01',age='30');
创建分区原则: 最少粒度原则
(4)向partition_test的分区中插入数据:
hive> insert overwrite table partition_test
partition(stat_date='20110728',province='henan') select member_id,name
from partition_test_input where stat_date='20110728' and
province='henan';
(5)选择某一个分区的所有数据
select * from test where (birth = '1982')