zoukankan      html  css  js  c++  java
  • Hive配置及随笔

    1、Hive简介:
    --------
    解决繁琐的Map和reduce分析,设计,拆解,以及编码,编译过程,

    2、Hive架构原理:
    ---------------

    3、Hive服务器搭建:
    -----------------
    A、在客户端安装Hive1.1.2
    B、配置Hive环境
    ---[hadoop@CloudDeskTop bin]$ vi hive-config.sh
    export JAVA_HOME=/softwarek/jdk1.7.0_79
    export HADOOP_HOME=/software/hadoop-2.7.3
    export HIVE_HOME=/software/hive-1.2.2

    [hadoop@CloudDeskTop conf]$ cp hive-default.xml.template hive-site.xml
    [hadoop@CloudDeskTop conf]$ vi hive-site.xml
    2911 <name>hive.server2.logging.operation.log.location</name>
    2912 <value>/tmp/hive/operation_logs</value>
    51 <name>hive.exec.local.scratchdir</name>
    52 <value>/tmp/hive</value>
    56 <name>hive.downloaded.resources.dir</name>
    57 <value>/tmp/hive/resources</value>

    [hadoop@CloudDeskTop conf]$ cp -a hive-log4j.properties.template hive-log4j.properties
    [hadoop@CloudDeskTop conf]$ vi hive-log4j.properties +72
    #log4j.appender.EventCounter=org.apache.hadoop.hive.shims.HiveEventCounter
    log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter

    4、启动HDFS和YARN集群,在客户端启动Hive
    A、初识:
    ----交互式执行sql语句
    [hadoop@CloudDeskTop bin]$ ./hive --hiveconf hive.root.logger=ERROR,console
    hive>show databases;
    hive> create database bunfly;
    hive> use bunfly;
    hive> create table t_user(uid int,uname string,password string);
    hive>show tables;

    ----非交互式运行sql语句
    [hadoop@CloudDeskTop bin]$ ./hive -S -e"show databases;"
    [hadoop@CloudDeskTop src]$ vi test.sql
    [hadoop@CloudDeskTop bin]$ ./hive -S -f /home/hadoop/test/hive/src/test.sql
    [hadoop@CloudDeskTop bin]$ echo -e "use bnyw; select * from myuser;">>/install/myuser.sql
    [hadoop@CloudDeskTop bin]$ ./hive -S -e "select * from bunfly.t_user">>/install/result.data
    [hadoop@CloudDeskTop bin]$ ./hive -S<<EOF

    B、进阶:

    换行符:row format delimited(默认 )
    字段符:fields terminated by ' '
    hive> create table bunfly.t_user(uid int,uname string,uage int,uhight double) row format delimited fields
    terminated by ' ';
    ----数据导入操作:
    [hadoop@CloudDeskTop src]$ hdfs dfs -put myuser02 /user/hive/warehouse/bunfly.db/t_user
    将HDFS中数据导入Hive表;
    ------
    注意:
    使用HDFS上传和使用load data导入本地文件从本质意义上讲都是文件的转移过程,
    如果转移的文件是来自于本地则发生数据拷贝,如果转移的文件是来自于HDFS文件系统
    则发生数据移动
    overwrite关键字在load data句法中将导致hive表中的数据先被清空,然后再转移数据,
    即发生hive表的覆盖写入操作;如果没有overwrite关键字则发生数据文件的追加操作

    Hive不支持delete和update两种DML操作


    将所需数据导出到指定本地目录下:
    hive> insert overwrite local directory '/home/hadoop/test/hive/dst/out.data' select * from t_user
    将所需数据导出到指定集群目录下:
    hive> insert overwrite directory '/data/out.data' select * from t_user

    表操作:
    多表的数据迁移:
    ----------A、
    insert into bunfly.myuser select * from bunfly.t_user where uid=1;
    ----------B、注意拷贝的表和存在的表的格式是否一至(Tab--> Ctrl+v+a)
    hdfs dfs -cp /user/hive/warehouse/bunfly.db/t_user/myuser03 /user/hive/warehouse/bunfly.db/myuser

    表之间的数据复制:
    ---------------
    insert into bunfly.myuser select * from bunfly.t_user where userid=1;

    创建分隔符为Tab键的表:
    --------------------
    create table if not exists bunfly.t_user(uid int,uname string,uage int,uhight double) row format delimited
    fields terminated by ' ';

    Hive高级运维部分:
    准备工作:
    删除bnyw库
    1、创建员工表:
    create table if not exists emp(eno int,ename string,eage int,bithday date,sal double,com double,gender
    string,dno int) row format delimited fields terminated by ' ';
    2、创建部门表:
    create table dept(dno int,dname string,loc string) row format delimited fields terminated by ' ';
    a、根据部门id和性别
    hive> select dno,gender,count(1) from emp group by dno,gender;
    b、根据部门id和性别,然后根据人数降序排列
    hive> select dno,gender,count(1) renshu from emp group by dno,gender order by renshu desc;
    c、多列排序
    hive> select eno,ename,sal,com from emp order by sal desc,com desc;
    d、多表连接与子查询
    hive> select e.*,d.* from emp e ,dept d where e.dno=d.dno;(sql92语法)
    hive> select e.*,d.* from emp e inner join dept d on e.dno=d.dno;(sql99语法)
    hive> select d.dno avg(sal) avgsal from emp e inner join dept d on e.dno=d.dno where eage>20 group by dno
    order by avgsal;
    子查询:
    select d.dname,avgsal from (select d.dno,avg(sal) avgsal from emp e inner join dept d on e.dno=d.dno where
    eage>20 group by d.dno order by avgsal) mid,dept d where mid.dno=d.dno;
    分页查询:
    hive> select row_number() over(),e.* from emp e;
    hive> select row_number() over(order by sal desc),e.* from emp e;
    select * from (select row_number() over(order by sal desc) seq,e.* from emp e) mid where mid.seq>5 and
    mid.seq<11;

  • 相关阅读:
    python基础-------模块与包(一)
    python基础-------函数(三)
    python基础-------函数(二)
    python基础-------函数(一)
    python基础(三)----字符编码以及文件处理
    python基础(二)-------数据类型
    python基础(一)------Python基础语法与介绍
    linux进程、软件包的安装和删除,及安装python3.6(源代码方式)
    Linux磁盘分区、打包压缩、软硬链接练习
    linux关于目录或文件权限的练习
  • 原文地址:https://www.cnblogs.com/pandazhao/p/8087102.html
Copyright © 2011-2022 走看看