zoukankan      html  css  js  c++  java
  • Hive后续应用部分

    Hive应用部分:
    准备工作:
    删除bnyw库
    1、创建员工表:
    create table if not exists emp(eno int,ename string,eage int,bithday date,sal double,com double,gender string,dno int) row format delimited fields terminated by ' ';
    2、创建部门表:
    create table dept(dno int,dname string,loc string) row format delimited fields terminated by ' ';
    a、根据部门id和性别
    hive> select dno,gender,count(1) from emp group by dno,gender;
    b、根据部门id和性别,然后根据人数降序排列
    hive> select dno,gender,count(1) renshu from emp group by dno,gender order by renshu desc;
    c、多列排序
    hive> select eno,ename,sal,com from emp order by sal desc,com desc;
    d、多表连接与子查询
    hive> select e.*,d.* from emp e ,dept d where e.dno=d.dno;(sql92语法)
    hive> select e.*,d.* from emp e inner join dept d on e.dno=d.dno;(sql99语法)
    hive> select d.dno avg(sal) avgsal from emp e inner join dept d on e.dno=d.dno where eage>20 group by dno order by avgsal;
    子查询:
    select d.dname,avgsal from (select d.dno,avg(sal) avgsal from emp e inner join dept d on e.dno=d.dno where eage>20 group by d.dno order by avgsal) mid,dept d where mid.dno=d.dno;
    分页查询:
    hive> select row_number() over(),e.* from emp e;
    hive> select row_number() over(order by sal desc),e.* from emp e;
    select * from (select row_number() over(order by sal desc) seq,e.* from emp e) mid where mid.seq>5 and mid.seq<11;

    Hive分区
    A、静态分区表操作:
    ---------
    待续。。。
    ./hive -S -e "create table if not exists bnyw.pemp(eno int,ename string,eage int,gender string,sal int,com int) partitioned by (dno int) row format delimited fields terminated by ' '"
    创建分区表
    hive> create table if not exists t_user(uid int,uname string,uage int) partitioned by (dno int) row format delimited fields terminated by 's+';
    1、建分区下的表(dno=1、dno=2、dno=3、dno=4)
    hive> alter table t_user add partition(dno=1);---->添加分区

    2、分区上传用put方式到dno=1区下:
    [hadoop@CloudDeskTop usergroup]$ hdfs dfs -put bak01 /user/hive/warehouse/bunfly.db/t_user/dno=1/
    3、分区上传用load方式到dno=2区下:
    hive> load data local inpath '/home/hadoop/test/usergroup/bak02' into table t_user partition(dno=2);
    4、分区上传用insert方式到dno=3区下:
    hive> insert into t_user partition(dno=3) values(3,'zhaoyun',34);
    从其他表中取数据放入到分区dno=3区下:
    hive> insert into t_user partition(dno=3) select eno,ename,eage from emp where eno<4;
    hive> insert overwrite table t_user partition(dno=4) select eno,ename,eage where eno>4 and eno<9;

    B、动态分区表操作(自动根据给的字段放入对应的分区中):
    设置动态分区配置:
    ----------
    1、当前设置:
    hive> set hive.exec.dynamic.partition.mode=nonstrict;
    2、永久设置:
    进入hive-site.xml
    修改hive.exec.dynamic.partition.mode的name值改为nonstrict;

    创建分区表与静态一致!
    truncate table t_user--->清空表内容
    alter table t_user drop partition(dno=1);--->删除分区
    动态上传不支持put、load。只支持insert三种语句:
    1、hive> insert into t_user partition(dno) values(1,'zhaoyung',35,1);--->可自动创建没有的分区
    2、insert into t_user partition(dno) select eno,ename,eage,dno from emp;---->可将含有分区字段的表插入到新的表中
    3、insert overwrite table t_user partition(dno) select eno,ename,eage,dno from emp;

    C、 多级分区:
    -------------
    1、创建多级分区表:
    hive> create table if not exists t_user(uid int,uname string,uage int) partitioned by (dno int,gender string) row format delimited fields terminated by 's+';

    2、多级分区数据导入:
    insert into t_user partition(dno,gender) select eno,ename,eage,dno,gender from emp;
    ------------

    1、创建外部表:----->在其他路径下的数据表放入到hive表中
    hive> create external table t_user(uid int,uname string,uage int) row format delimited fields terminated by ' ' location '/hive/user/';

    2、创建含有数组类型的表(collection items terminated by ','定义数组元素之间的间隔符):
    create table if not exists tuser(id bigint,name string,loves array<string>) row format delimited fields terminated by ' ' collection items terminated by ',';

    3、创建含有map对象的表(map keys terminated by ':' 定义key与value值之间的链接符,在解析文本时需要):
    create table if not exists tuser(id bigint,name string,info map<string,string>) row format delimited fields terminated by ' ' collection items terminated by ',' map keys terminated by ':';

    4、复合类型创建表:
    create table zuser(id bigint,name string,loves array<string>,info map<String,double>) row format delimited fields terminated by ' ' collection items terminated by ',' map keys terminated by ':';
    hive> select * from zuser where info.age=23;

    使用MySql存储Hive元数据:
    上传MySql安装包,解压安装---sh install.sh
    --------
    [root@CloudDeskTop sbin]# ./mysqld start
    [root@CloudDeskTop sbin]# lsof -i:3306

    注:切换到hadoop用户下操作一下内容

    将mysql驱动jar包放入hive环境下:
    [hadoop@CloudDeskTop ~]$ cd /software/hive-1.2.2/lib/
    【mysql-connector-java-3.0.17-ga-bin.jar】

    配置环境:
    [hadoop@CloudDeskTop lib]$ vi /software/hive-1.2.2/conf/hive-site.xml
    <property>
    <name>javax.jdo.option.ConnectionURL</name>
    <value>jdbc:mysql://192.168.29.134:3306/hive?createDatabaseIfNotExist=true&amp;characterEncoding=utf8
    </value>
    <description>JDBC connect string for a JDBC metastore</description>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionDriverName</name>
    <value>com.mysql.jdbc.Driver</value>
    <description>Driver class name for a JDBC metastore</description>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionUserName</name>
    <value>root</value>
    <description>Username to use against metastore database</description>
    </property>

    <property>
    <name>javax.jdo.option.ConnectionPassword</name>
    <value>123456</value>
    <description>password to use against metastore database</description>
    </property>

    [root@CloudDeskTop lib]# vi /etc/my.cnf
    binlog_format=row
    [root@CloudDeskTop sbin]# ./mysqld restart
    创建数据库测试:
    [root@CloudDeskTop bin]# ./mysql -uroot -p123456 -e "drop database if exists hive;create database hive character set latin1"

    数据移植小结:
    A、从本地到Hive表:
    使用HDFS的put命令上传数据文件
    使用Hive的load data local inpath句法导入数据文件
    B、从Hive表到Hive表
    使用HDFS的cp命令实现数据文件拷贝
    使用普通的insert into句法插入单条记录数据
    使用insert....select...from...句法实现批量条件数据拷贝
    使用insert overwrite table....select....句法实现数据拷贝
    C、从Hive表到本地
    使用HDFS的get命令下载数据文件
    使用insert overwrite local directory句法实现Hive表批量条件数据导出
    使用输出定向符(>或>>)直接通过标准输出流将select查询结果其写入本地文件

     

  • 相关阅读:
    设计模式之桥接模式
    设计模式之适配器模式
    设计模式之代理模式
    设计模式之建造者模式
    设计模式之抽象工厂模式
    设计模式之工厂方法模式
    设计模式之简单工厂模式
    设计模式之原型模式
    api接口测试的步骤
    3种优化回归测试的方法
  • 原文地址:https://www.cnblogs.com/pandazhao/p/8124025.html
Copyright © 2011-2022 走看看