Hadoop生态圈-Hive快速入门篇之HQL的基础语法
作者:尹正杰
版权声明:原创作品,谢绝转载!否则将追究法律责任。
本篇博客的重点是介绍Hive中常见的数据类型,DDL数据定义,DML数据操作以及常用的查询操作。如果你没有hive的安装环境的话,可以参考我之前分析搭建hive的笔记:https://www.cnblogs.com/yinzhengjie/p/9154324.html
一.Hive常见的属性配置
1>.Hive数据仓库位置配置
1>.Default数据仓库的最原始位置在“hdfs:/user/hive/warehouse/ ”路径下 2>.在仓库目录下,没有对默认的数据库default的创建文件夹(也就是说,如果有表属于default数据库,那么默认会存放在根路径下)。如果某张表属于default数据库,直接在数据仓库目录下创建一个文件夹 3>.修改default数据仓库原始位置(将默认配置文件“hive-defalut.xml.template”如下配置信息拷贝到hive-site.xml文件中 <property> <name>hive.metastore.warehouse.dir</name> <value>/user/hive/warehouse</value> <description>location of default database for the warehouse</description> </property>
2>.配置当前数据库,以及查询表的头信息配置
在hive-site.xml文件中添加如下配置信息,即可以实现显示当前数据库,以及查询表的头信息配置。配置之后需要重启hive客户端 <property> <name>hive.cli.print.header</name> <value>true</value> <description>Whether to print the names of the columns in query output.</description> </property> <property> <name>hive.cli.print.current.db</name> <value>true</value> <description>Whether to include the current database in the Hive prompt.</description> </property>
配置以上设置后,重启hive客户端,你会发现多了两个功能,可以查看表头以及当前所在的数据库:
3>.Hive运行日志信息配置
1>.Hive的log默认存放在"/tmp/atguigu/hive.log"目录下(当前用户名下)。 2>.修改hive的log存放日志到"/home/yinzhengjie/hive/logs",我们可以修改hive-log4j2.properties进行配置,具体操作如下: [yinzhengjie@s101 ~]$ cd /soft/hive/conf/ [yinzhengjie@s101 conf]$ [yinzhengjie@s101 conf]$ cp hive-log4j2.properties.template hive-log4j2.properties #拷贝模板文件生成配置文件 [yinzhengjie@s101 conf]$ grep property.hive.log.dir hive-log4j2.properties | grep -v ^# property.hive.log.dir = /home/yinzhengjie/hive/logs #指定log的存放位置 [yinzhengjie@s101 conf]$ [yinzhengjie@s101 conf]$ ll /home/yinzhengjie/hive/logs/hive.log -rw-rw-r-- 1 yinzhengjie yinzhengjie 4265 Aug 5 21:20 /home/yinzhengjie/hive/logs/hive.log #重启hive,查看日志文件中的内容 [yinzhengjie@s101 conf]$
4>.查看参数配置方式
1>.查看当前的所有配置信息(hive (yinzhengjie)> set;) 配置文件方式: 默认配置文件: hive-default.xml 用户自定义配置文件: hive-site.xml 注意:用户自定义配置会覆盖默认配置。另外,Hive也会读入Hadoop的配置,因为Hive是作为Hadoop的客户端启用的,Hive的配置会覆盖Hadoop的配置。配置文件的设定对本机启动的所有Hive进程都有效。 2>.参数的配置三种方式以及优先级介绍 启动命令行时声明参数方式: [yinzhengjie@s101 ~]$ hive -hiveconf mapred.reduce.tasks=10 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Logging initialized using configuration in file:/soft/apache-hive-2.1.1-bin/conf/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive (default)> set mapred.reduce.tasks; mapred.reduce.tasks=10 hive (default)> quit; [yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Logging initialized using configuration in file:/soft/apache-hive-2.1.1-bin/conf/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive (default)> set mapred.reduce.tasks; mapred.reduce.tasks=-1 hive (default)> exit; [yinzhengjie@s101 ~]$ 启动命令行后参数声明方式: [yinzhengjie@s101 ~]$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Logging initialized using configuration in file:/soft/apache-hive-2.1.1-bin/conf/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive (default)> set mapred.reduce.tasks; mapred.reduce.tasks=-1 hive (default)> set mapred.reduce.tasks=100; hive (default)> set mapred.reduce.tasks; mapred.reduce.tasks=100 hive (default)> quit; [yinzhengjie@s101 ~]$ 三种方式优先级温馨提示: 以上三种设定方式的优先级依次递增。即"配置文件"<"启动命令行时"<"启动命令行后"。注意某些系统级的参数,例如log4j相关的设定,必须用前两种方式设定,因为那些参数的读取在会话建立以前已经完成了。
二.Hive数据类型
1>.基本数据类型
对于Hive的String类型相当于数据库的varchar类型,该类型是一个可变的字符串,不过它不能声明其中最多能存储多少个字符,理论上它可以存储2GB的字符数。
Hive数据类型 |
Java数据类型 |
长度 |
例子 |
TINYINT |
byte |
1byte有符号整数 |
20 |
SMALINT |
short |
2byte有符号整数 |
20 |
INT |
int |
4byte有符号整数 |
20 |
BIGINT |
long |
8byte有符号整数 |
20 |
BOOLEAN |
boolean |
布尔类型,true或者false |
TRUE FALSE |
FLOAT |
float |
单精度浮点数 |
3.14159 |
DOUBLE |
double |
双精度浮点数 |
3.14159 |
STRING |
string |
字符系列。可以指定字符集。可以使用单引号或者双引号。 |
‘now is the time’ “for all good men” |
TIMESTAMP |
时间类型 |
||
BINARY |
字节数组 |
2>.集合数据类型
Hive有三种复杂数据类型ARRAY、MAP 和 STRUCT。ARRAY和MAP与Java中的Array和Map类似,而STRUCT与C语言中的Struct类似,它封装了一个命名字段集合,复杂数据类型允许任意层次的嵌套。
数据类型 |
描述 |
语法示例 |
STRUCT |
和c语言中的struct类似,都可以通过“点”符号访问元素内容。例如,如果某个列的数据类型是STRUCT{first STRING, last STRING},那么第1个元素可以通过字段.first来引用。 |
struct() |
MAP |
MAP是一组键-值对元组集合,使用数组表示法可以访问数据。例如,如果某个列的数据类型是MAP,其中键->值对是’first’->’John’和’last’->’Doe’,那么可以通过字段名[‘last’]获取最后一个元素 |
map() |
ARRAY |
数组是一组具有相同类型和名称的变量的集合。这些变量称为数组的元素,每个数组元素都有一个编号,编号从零开始。例如,数组值为[‘John’, ‘Doe’],那么第2个元素可以通过数组名[1]进行引用。 |
Array() |
3>类型转化
Hive的原子数据类型是可以进行隐式转换的,类似于Java的类型转换,例如某表达式使用INT类型,TINYINT会自动转换为INT类型,但是Hive不会进行反向转化,例如,某表达式使用TINYINT类型,INT不会自动转换为TINYINT类型,它会返回错误,除非使用CAST操作。隐式类型转换规则如下。
第一:任何整数类型都可以隐式地转换为一个范围更广的类型,如TINYINT可以转换成INT,INT可以转换成BIGINT。
第二:所有整数类型、FLOAT和STRING类型都可以隐式地转换成DOUBLE。
第三:TINYINT、SMALLINT、INT都可以转换为FLOAT。
第四:BOOLEAN类型不可以转换为任何其它的类型。
温馨提示:可以使用CAST操作显示进行数据类型转换,例如CAST('1' AS INT)将把字符串'1' 转换成整数1;如果强制类型转换失败,如执行CAST('X' AS INT),表达式返回空值 NULL。
4>.小试牛刀
假设某表有如下一行,我们用JSON格式来表示其数据结构。在Hive下访问的格式为:
基于上述数据结构,我们在Hive里创建对应的表,并导入数据。创建本地测试文件test.txt内容如下:(注意,MAP,STRUCT和ARRAY里的元素间关系都可以用同一个字符表示,这里用“_”。)
[yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/test.txt 漩涡鸣人,我爱罗_佐助,漩涡博人:18_漩涡向日葵:16,一乐拉面附近_木业忍者村 宇智波富岳,宇智波美琴_志村团藏,宇智波鼬:28_宇智波佐助:19,木叶警务部_木业忍者村 [yinzhengjie@s101 download]$
Hive上创建测试表test,如下:
create table test( name string, friends array<string>, children map<string, int>, address struct<street:string, city:string> ) row format delimited fields terminated by ',' collection items terminated by '_' map keys terminated by ':' lines terminated by ' ';
导入文本数据到测试表:
hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/test.txt' into table test; Loading data to table yinzhengjie.test OK Time taken: 0.335 seconds hive (yinzhengjie)> select * from test; OK test.name test.friends test.children test.address 漩涡鸣人 ["我爱罗","佐助"] {"漩涡博人":18,"漩涡向日葵":16} {"street":"一乐拉面附近","city":"木业忍者村"} 宇智波富岳 ["宇智波美琴","志村团藏"] {"宇智波鼬":28,"宇智波佐助":19} {"street":"木叶警务部","city":"木业忍者村"} Time taken: 0.099 seconds, Fetched: 2 row(s) hive (yinzhengjie)>
访问三种集合列里的数据,以下分别是ARRAY,MAP,STRUCT的访问方式:
hive (yinzhengjie)> select * from test; OK test.name test.friends test.children test.address 漩涡鸣人 ["我爱罗","佐助"] {"漩涡博人":18,"漩涡向日葵":16} {"street":"一乐拉面附近","city":"木业忍者村"} 宇智波富岳 ["宇智波美琴","志村团藏"] {"宇智波鼬":28,"宇智波佐助":19} {"street":"木叶警务部","city":"木业忍者村"} Time taken: 0.085 seconds, Fetched: 2 row(s) hive (yinzhengjie)> select friends[0],children['漩涡博人'],address.city from test where name="漩涡鸣人"; OK _c0 _c1 city 我爱罗 18 木业忍者村 Time taken: 0.096 seconds, Fetched: 1 row(s) hive (yinzhengjie)> select friends[1],children['漩涡向日葵'],address.city from test where name="漩涡鸣人"; OK _c0 _c1 city 佐助 16 木业忍者村 Time taken: 0.1 seconds, Fetched: 1 row(s) hive (yinzhengjie)>
三.Hive的常用命令(HQL)用法展示
温馨提示:在使用Hive交互命令或是执行HQL语句时都会启动Hive,而hive依赖于Hadoop的hdfs提供存储和MapReduce提供计算,因此在启动Hive之前,需要启动Hadoop集群哟。
[yinzhengjie@s101 ~]$ more `which xcall.sh` #!/bin/bash #@author :yinzhengjie #blog:http://www.cnblogs.com/yinzhengjie #EMAIL:y1053419035@qq.com #判断用户是否传参 if [ $# -lt 1 ];then echo "请输入参数" exit fi #获取用户输入的命令 cmd=$@ for (( i=101;i<=105;i++ )) do #使终端变绿色 tput setaf 2 echo ============= s$i $cmd ============ #使终端变回原来的颜色,即白灰色 tput setaf 7 #远程执行命令 ssh s$i $cmd #判断命令是否执行成功 if [ $? == 0 ];then echo "命令执行成功" fi done [yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ more `which start-dfs.sh` | grep -v ^# | grep -v ^$ usage="Usage: start-dfs.sh [-upgrade|-rollback] [other options such as -clusterId]" bin=`dirname "${BASH_SOURCE-$0}"` bin=`cd "$bin"; pwd` DEFAULT_LIBEXEC_DIR="$bin"/../libexec HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR} . $HADOOP_LIBEXEC_DIR/hdfs-config.sh if [[ $# -ge 1 ]]; then startOpt="$1" shift case "$startOpt" in -upgrade) nameStartOpt="$startOpt" ;; -rollback) dataStartOpt="$startOpt" ;; *) echo $usage exit 1 ;; esac fi nameStartOpt="$nameStartOpt $@" NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes) echo "Starting namenodes on [$NAMENODES]" "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" --config "$HADOOP_CONF_DIR" --hostnames "$NAMENODES" --script "$bin/hdfs" start namenode $nameStartOpt if [ -n "$HADOOP_SECURE_DN_USER" ]; then echo "Attempting to start secure cluster, skipping datanodes. " "Run start-secure-dns.sh as root to complete startup." else "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" --config "$HADOOP_CONF_DIR" --script "$bin/hdfs" start datanode $dataStartOpt fi SECONDARY_NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -secondarynamenodes 2>/dev/null) if [ -n "$SECONDARY_NAMENODES" ]; then echo "Starting secondary namenodes [$SECONDARY_NAMENODES]" "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" --config "$HADOOP_CONF_DIR" --hostnames "$SECONDARY_NAMENODES" --script "$bin/hdfs" start secondarynamenode fi SHARED_EDITS_DIR=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.namenode.shared.edits.dir 2>&-) case "$SHARED_EDITS_DIR" in qjournal://*) JOURNAL_NODES=$(echo "$SHARED_EDITS_DIR" | sed 's,qjournal://([^/]*)/.*,1,g; s/;/ /g; s/:[0-9]*//g') echo "Starting journal nodes [$JOURNAL_NODES]" "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" --config "$HADOOP_CONF_DIR" --hostnames "$JOURNAL_NODES" --script "$bin/hdfs" start journalnode ;; esac AUTOHA_ENABLED=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.ha.automatic-failover.enabled) if [ "$(echo "$AUTOHA_ENABLED" | tr A-Z a-z)" = "true" ]; then echo "Starting ZK Failover Controllers on NN hosts [$NAMENODES]" "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" --config "$HADOOP_CONF_DIR" --hostnames "$NAMENODES" --script "$bin/hdfs" start zkfc fi [yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ cat /soft/hadoop/sbin/start-yarn.sh | grep -v ^# | grep -v ^$ echo "starting yarn daemons" bin=`dirname "${BASH_SOURCE-$0}"` bin=`cd "$bin"; pwd` DEFAULT_LIBEXEC_DIR="$bin"/../libexec HADOOP_LIBEXEC_DIR=${HADOOP_LIBEXEC_DIR:-$DEFAULT_LIBEXEC_DIR} . $HADOOP_LIBEXEC_DIR/yarn-config.sh "$bin"/yarn-daemon.sh --config $YARN_CONF_DIR start resourcemanager "$bin"/yarn-daemons.sh --config $YARN_CONF_DIR start nodemanager [yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ more `which xzk.sh` #!/bin/bash #@author :yinzhengjie #blog:http://www.cnblogs.com/yinzhengjie #EMAIL:y1053419035@qq.com #判断用户是否传参 if [ $# -ne 1 ];then echo "无效参数,用法为: $0 {start|stop|restart|status}" exit fi #获取用户输入的命令 cmd=$1 #定义函数功能 function zookeeperManger(){ case $cmd in start) echo "启动服务" remoteExecution start ;; stop) echo "停止服务" remoteExecution stop ;; restart) echo "重启服务" remoteExecution restart ;; status) echo "查看状态" remoteExecution status ;; *) echo "无效参数,用法为: $0 {start|stop|restart|status}" ;; esac } #定义执行的命令 function remoteExecution(){ for (( i=102 ; i<=104 ; i++ )) ; do tput setaf 2 echo ========== s$i zkServer.sh $1 ================ tput setaf 9 ssh s$i "source /etc/profile ; zkServer.sh $1" done } #调用函数 zookeeperManger [yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ xzk.sh start 启动服务 ========== s102 zkServer.sh start ================ ZooKeeper JMX enabled by default Using config: /soft/zk/bin/../conf/zoo.cfg Starting zookeeper ... STARTED ========== s103 zkServer.sh start ================ ZooKeeper JMX enabled by default Starting zookeeper ... Using config: /soft/zk/bin/../conf/zoo.cfg STARTED ========== s104 zkServer.sh start ================ ZooKeeper JMX enabled by default Using config: /soft/zk/bin/../conf/zoo.cfg Starting zookeeper ... STARTED [yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ xcall.sh jps ============= s101 jps ============ 6232 Jps 命令执行成功 ============= s102 jps ============ 4081 QuorumPeerMain 4110 Jps 命令执行成功 ============= s103 jps ============ 4044 QuorumPeerMain 4079 Jps 命令执行成功 ============= s104 jps ============ 4076 Jps 4047 QuorumPeerMain 命令执行成功 ============= s105 jps ============ 3383 Jps 命令执行成功 [yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ start-dfs.sh SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Starting namenodes on [s101 s105] s101: starting namenode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-namenode-s101.out s105: starting namenode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-namenode-s105.out s103: starting datanode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-datanode-s103.out s102: starting datanode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-datanode-s102.out s104: starting datanode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-datanode-s104.out Starting journal nodes [s102 s103 s104] s102: starting journalnode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-journalnode-s102.out s103: starting journalnode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-journalnode-s103.out s104: starting journalnode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-journalnode-s104.out SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory] Starting ZK Failover Controllers on NN hosts [s101 s105] s101: starting zkfc, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-zkfc-s101.out s105: starting zkfc, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-zkfc-s105.out [yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ xcall.sh jps ============= s101 jps ============ 6755 Jps 6380 NameNode 6685 DFSZKFailoverController 命令执行成功 ============= s102 jps ============ 4240 JournalNode 4081 QuorumPeerMain 4159 DataNode 4335 Jps 命令执行成功 ============= s103 jps ============ 4304 Jps 4130 DataNode 4211 JournalNode 4044 QuorumPeerMain 命令执行成功 ============= s104 jps ============ 4300 Jps 4125 DataNode 4047 QuorumPeerMain 4207 JournalNode 命令执行成功 ============= s105 jps ============ 3538 DFSZKFailoverController 3436 NameNode 3597 Jps 命令执行成功 [yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ start-yarn.sh starting yarn daemons s101: starting resourcemanager, logging to /soft/hadoop-2.7.3/logs/yarn-yinzhengjie-resourcemanager-s101.out s105: starting resourcemanager, logging to /soft/hadoop-2.7.3/logs/yarn-yinzhengjie-resourcemanager-s105.out s103: starting nodemanager, logging to /soft/hadoop-2.7.3/logs/yarn-yinzhengjie-nodemanager-s103.out s102: starting nodemanager, logging to /soft/hadoop-2.7.3/logs/yarn-yinzhengjie-nodemanager-s102.out s104: starting nodemanager, logging to /soft/hadoop-2.7.3/logs/yarn-yinzhengjie-nodemanager-s104.out [yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ xcall.sh jps ============= s101 jps ============ 6883 ResourceManager 6982 Jps 6380 NameNode 6685 DFSZKFailoverController 命令执行成功 ============= s102 jps ============ 4240 JournalNode 4081 QuorumPeerMain 4387 NodeManager 4424 Jps 4159 DataNode 命令执行成功 ============= s103 jps ============ 4130 DataNode 4211 JournalNode 4356 NodeManager 4436 Jps 4044 QuorumPeerMain 命令执行成功 ============= s104 jps ============ 4352 NodeManager 4390 Jps 4125 DataNode 4047 QuorumPeerMain 4207 JournalNode 命令执行成功 ============= s105 jps ============ 3538 DFSZKFailoverController 3436 NameNode 3710 Jps 命令执行成功 [yinzhengjie@s101 ~]$
1>.hive交互命令
[yinzhengjie@s101 download]$ cat teachers.txt 70 Dennis MacAlistair Ritchie 49 Linus Benedict Torvalds 68 Bjarne Stroustrup 62 Guido van Rossum 63 James Gosling 60 Martin Odersky 62 Rob Pike 50 Rasmus Lerdorf 50 Brendan Eich [yinzhengjie@s101 download]$
[yinzhengjie@s101 ~]$ hive -help SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. usage: hive -d,--define <key=value> Variable subsitution to apply to hive commands. e.g. -d A=B or --define A=B --database <databasename> Specify the database to use -e <quoted-query-string> SQL from command line -f <filename> SQL from files -H,--help Print help information --hiveconf <property=value> Use value for given property --hivevar <key=value> Variable subsitution to apply to hive commands. e.g. --hivevar A=B -i <filename> Initialization SQL file -S,--silent Silent mode in interactive shell -v,--verbose Verbose mode (echo executed SQL to the console) [yinzhengjie@s101 ~]$
[yinzhengjie@s101 ~]$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true default yinzhengjie Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive>
[yinzhengjie@s101 ~]$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true default yinzhengjie Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> show databases; OK default yinzhengjie Time taken: 0.01 seconds, Fetched: 2 row(s) hive>
[yinzhengjie@s101 ~]$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true default yinzhengjie Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> show databases; OK default yinzhengjie Time taken: 0.008 seconds, Fetched: 2 row(s) hive> use yinzhengjie; OK Time taken: 0.018 seconds hive>
hive> show databases; OK default yinzhengjie Time taken: 0.008 seconds, Fetched: 2 row(s) hive> use yinzhengjie; OK Time taken: 0.018 seconds hive> show tables; OK az_top3 az_wc test1 test2 test3 test4 yzj Time taken: 0.025 seconds, Fetched: 7 row(s) hive>
hive> show databases; OK default yinzhengjie Time taken: 0.008 seconds, Fetched: 2 row(s) hive> use yinzhengjie; OK Time taken: 0.018 seconds hive> show tables; OK az_top3 az_wc test1 test2 test3 test4 yzj Time taken: 0.025 seconds, Fetched: 7 row(s) hive> create table Teacher(id int,name string)row format delimited fields terminated by ' '; OK Time taken: 0.626 seconds hive> show tables; OK az_top3 az_wc teacher test1 test2 test3 test4 yzj Time taken: 0.028 seconds, Fetched: 8 row(s) hive>
hive> show tables; OK teacher yzj Time taken: 0.022 seconds, Fetched: 2 row(s) hive> select * from teacher; OK Time taken: 0.105 seconds hive> load data local inpath '/home/yinzhengjie/download/teachers.txt' into table yinzhengjie.teacher; Loading data to table yinzhengjie.teacher OK Time taken: 0.256 seconds hive> select * from teacher; OK 70 Dennis MacAlistair Ritchie 49 Linus Benedict Torvalds 68 Bjarne Stroustrup 62 Guido van Rossum 63 James Gosling 60 Martin Odersky 62 Rob Pike 50 Rasmus Lerdorf 50 Brendan Eich Time taken: 0.104 seconds, Fetched: 9 row(s) hive>
hive (yinzhengjie)> load data inpath '/home/yinzhengjie/data/logs/umeng/raw-log/201808/06/2346' into table raw_logs partition(ym=201808 , day=06 ,hm=2346); Loading data to table yinzhengjie.raw_logs partition (ym=201808, day=6, hm=2346) OK Time taken: 1.846 seconds hive (yinzhengjie)>
[yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/umeng_create_logs_ddl.sql use yinzhengjie ; --startuplogs create table if not exists startuplogs ( appChannel string , appId string , appPlatform string , appVersion string , brand string , carrier string , country string , createdAtMs bigint , deviceId string , deviceStyle string , ipAddress string , network string , osType string , province string , screenSize string , tenantId string ) partitioned by (ym int ,day int , hm int) stored as parquet ; --eventlogs create table if not exists eventlogs ( appChannel string , appId string , appPlatform string , appVersion string , createdAtMs bigint , deviceId string , deviceStyle string , eventDurationSecs bigint , eventId string , osType string , tenantId string ) partitioned by (ym int ,day int , hm int) stored as parquet ; --errorlogs create table if not exists errorlogs ( appChannel string , appId string , appPlatform string , appVersion string , createdAtMs bigint , deviceId string , deviceStyle string , errorBrief string , errorDetail string , osType string , tenantId string ) partitioned by (ym int ,day int , hm int) stored as parquet ; --usagelogs create table if not exists usagelogs ( appChannel string , appId string , appPlatform string , appVersion string , createdAtMs bigint , deviceId string , deviceStyle string , osType string , singleDownloadTraffic bigint , singleUploadTraffic bigint , singleUseDurationSecs bigint , tenantId string ) partitioned by (ym int ,day int , hm int) stored as parquet ; --pagelogs create table if not exists pagelogs ( appChannel string , appId string , appPlatform string , appVersion string , createdAtMs bigint , deviceId string , deviceStyle string , nextPage string , osType string , pageId string , pageViewCntInSession int , stayDurationSecs bigint , tenantId string , visitIndex int ) partitioned by (ym int ,day int , hm int) stored as parquet ; [yinzhengjie@s101 download]$
hive (yinzhengjie)> show tables; OK tab_name myusers raw_logs student teacher teacherbak teachercopy Time taken: 0.044 seconds, Fetched: 6 row(s) hive (yinzhengjie)> hive (yinzhengjie)> source /home/yinzhengjie/download/umeng_create_logs_ddl.sql; OK Time taken: 0.008 seconds OK Time taken: 0.257 seconds OK Time taken: 0.058 seconds OK Time taken: 0.073 seconds OK Time taken: 0.065 seconds OK Time taken: 0.053 seconds hive (yinzhengjie)> show tables; OK tab_name errorlogs eventlogs myusers pagelogs raw_logs startuplogs student teacher teacherbak teachercopy usagelogs Time taken: 0.014 seconds, Fetched: 11 row(s) hive (yinzhengjie)>
[yinzhengjie@s101 ~]$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> dfs -cat /user/hive/warehouse/yinzhengjie.db/teacher/teachers.txt; 70 Dennis MacAlistair Ritchie 49 Linus Benedict Torvalds 68 Bjarne Stroustrup 62 Guido van Rossum 63 James Gosling 60 Martin Odersky 62 Rob Pike 50 Rasmus Lerdorf 50 Brendan Eich hive>
[yinzhengjie@s101 ~]$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> ! ls /home/yinzhengjie/download; 1 derby.log hivef.sql metastore_db MySpark.jar spark-2.1.0-bin-hadoop2.7.tgz teachers.txt temp hive>
[yinzhengjie@s101 download]$ cat ~/.hivehistory show databases; quit; show databases; quit ; create table(id int,name string) row format delimited fields terminated by ' ' lines terminated by ' ' stored as textfile; create table users(id int , name string) row format delimited fields terminated by ' ' lines terminated by ' ' stored as textfile; load data local inpath 'user.txt' into table users; !pwd ; !cd /home/yinzhengjie ; !pwd ; quit; load data local inpath 'user.txt' into table users; load data inpath 'user.txt' into table users; hdfs dfs -put 'user.txt'; hdfs dfs put 'user.txt'; dfs put 'user.txt'; dfs -put 'user.txt'; dfs -put 'user.txt' /; dfs -put user.txt ; dfs -put user.txt /; load data inpath 'user.txt' into table users; load data inpath '/user.txt' into table users; ;; ; ;; ipconfig ; quit quit; exit exit; show databases; use yinzhengjie ; show tables; SET hive.support.concurrency = true; show tables; use yinzhengjie; show tables; select * from yzj; SET hive.support.concurrency = true; SET hive.enforce.bucketing = true; SET hive.exec.dynamic.partition.mode = nonstrict; SET hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; SET hive.compactor.initiator.on = true; SET hive.compactor.worker.threads = 1; select * from yzj; use yinzhengjie; SET hive.support.concurrency = true; SET hive.enforce.bucketing = true; SET hive.exec.dynamic.partition.mode = nonstrict; SET hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager; SET hive.compactor.initiator.on = true; SET hive.compactor.worker.threads = 1; show tables; select * from yzj; show databases; use yinzhengjie; show tables; hive show databases; use yinzhengjie; show tables; select * from az_top3; quit; show databases; use yinzhengjie ; show tables; use yinzhengjie; show databases; use yinzhengjie; show tables; create table Teacher(id int,name string)row format delimited fields terminated by ' '; show tables; load data local inpath '/home/yinzhengjie/download/teachers.txt' ; show tables; drop table taacher; show databases; use yinzhengjie; show tables; drop table teacher; show tables; ; show tables; create table Teacher(id int,name string)row format delimited fields terminated by ' '; show tables; drop table test1,test2,test3; drop table test1; drop table test2; drop table test3; drop table test4; show tables; drop table az_top3; drop table az_wc; show tbales; show databasers; show databases; drop database yinzhengjie; ; use yinzhengjie; show tables; drop table teacher; show tables; create table Teacher(id int,name string)row format delimited fields terminated by ' '; show tables; load data local inpath '/home/yinzhengjie/download/teachers.txt'; load data local inpath `/home/yinzhengjie/download/teachers.txt`; use yinzhengjie ; show tables; load data local inpath '/home/yinzhengjie/download/teachers.txt' into table yinzhengjie.teacher; select * from teacher; drop table teacher; ; create table Teacher(id int,name string)row format delimited fields terminated by ' '; show tables; select * from teacher; load data local inpath '/home/yinzhengjie/download/teachers.txt' into table yinzhengjie.teacher; select * from teacher; quit; exit; exit ; dfs -cat /user/hive/warehouse/yinzhengjie.db/teacher/teachers.txt; dfs -lsr /; ; ! ls /home/yinzhengjie; ! ls /home/yinzhengjie/download; [yinzhengjie@s101 download]$
[yinzhengjie@s101 download]$ hive -e "select * from yinzhengjie.teacher;" SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true default yinzhengjie OK 70 Dennis MacAlistair Ritchie 49 Linus Benedict Torvalds 68 Bjarne Stroustrup 62 Guido van Rossum 63 James Gosling 60 Martin Odersky 62 Rob Pike 50 Rasmus Lerdorf 50 Brendan Eich Time taken: 3.414 seconds, Fetched: 9 row(s) [yinzhengjie@s101 download]$
[yinzhengjie@s101 download]$ hive -f /home/yinzhengjie/download/hivef.sql SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true default yinzhengjie OK Time taken: 0.023 seconds OK teacher yzj Time taken: 0.085 seconds, Fetched: 2 row(s) OK 70 Dennis MacAlistair Ritchie 49 Linus Benedict Torvalds 68 Bjarne Stroustrup 62 Guido van Rossum 63 James Gosling 60 Martin Odersky 62 Rob Pike 50 Rasmus Lerdorf 50 Brendan Eich Time taken: 2.044 seconds, Fetched: 9 row(s) [yinzhengjie@s101 download]$
[yinzhengjie@s101 ~]$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> quit; [yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ hive SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. hive> exit; [yinzhengjie@s101 ~]$
2>.DDL数据定义
hive (yinzhengjie)> show databases; OK database_name default yinzhengjie Time taken: 0.007 seconds, Fetched: 2 row(s) hive (yinzhengjie)> create database if not exists db_hive; OK Time taken: 0.034 seconds hive (yinzhengjie)> show databases; OK database_name db_hive default yinzhengjie Time taken: 0.009 seconds, Fetched: 3 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> show databases; OK database_name db_hive default yinzhengjie Time taken: 0.008 seconds, Fetched: 3 row(s) hive (yinzhengjie)> create database if not exists db_hive2 location "/db_hive2"; OK Time taken: 0.04 seconds hive (yinzhengjie)> show databases; OK database_name db_hive db_hive2 default yinzhengjie Time taken: 0.006 seconds, Fetched: 4 row(s) hive (yinzhengjie)>
用户可以使用ALTER DATABASE 命令为某个数据库的DBPROPERTIES设置键-值对属性值,来描述这个数据库的属性信息。 数据库的其他元数据信息都是不可更改的,包括数据库名和数据库所在的目录位置。 hive (yinzhengjie)> show databases; OK database_name db_hive db_hive2 default yinzhengjie Time taken: 0.007 seconds, Fetched: 4 row(s) hive (yinzhengjie)> ALTER DATABASE db_hive set dbproperties('Owner'='yinzhengjie'); #给数据库添加额外的属性,注意,这里并没有修改数据库里的元数据! OK Time taken: 0.03 seconds hive (yinzhengjie)> desc database db_hive; #使用这条命令是查不到的咱们定义的属性的哟! OK db_name comment location owner_name owner_type parameters db_hive hdfs://mycluster/user/hive/warehouse/db_hive.db yinzhengjie USER Time taken: 0.017 seconds, Fetched: 1 row(s) hive (yinzhengjie)> desc database extended db_hive; #我们需要在数据库前加一个extended关键字,就能查看到我们定义的数据库属性。 OK db_name comment location owner_name owner_type parameters db_hive hdfs://mycluster/user/hive/warehouse/db_hive.db yinzhengjie USER {Owner=yinzhengjie} Time taken: 0.011 seconds, Fetched: 1 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> show databases; #显示所有的数据库 OK database_name db_hive db_hive2 default yinzhengjie Time taken: 0.008 seconds, Fetched: 4 row(s) hive (yinzhengjie)> hive (yinzhengjie)> show databases like 'yin*'; #过滤显示的查询的数据库 OK database_name yinzhengjie Time taken: 0.009 seconds, Fetched: 1 row(s) hive (yinzhengjie)> hive (yinzhengjie)> desc database db_hive; #显示数据库信息 OK db_name comment location owner_name owner_type parameters db_hive hdfs://mycluster/user/hive/warehouse/db_hive.db yinzhengjie USER Time taken: 0.012 seconds, Fetched: 1 row(s) hive (yinzhengjie)> desc database extended db_hive; #显示数据库详细信息,使用关键字:extended OK db_name comment location owner_name owner_type parameters db_hive hdfs://mycluster/user/hive/warehouse/db_hive.db yinzhengjie USER {Owner=yinzhengjie} Time taken: 0.013 seconds, Fetched: 1 row(s) hive (yinzhengjie)> hive (yinzhengjie)> show databases; OK database_name db_hive db_hive2 default yinzhengjie Time taken: 0.006 seconds, Fetched: 4 row(s) hive (yinzhengjie)> use default; #使用数据库 OK Time taken: 0.012 seconds hive (default)>
hive (yinzhengjie)> show databases; OK database_name db_hive db_hive2 default yinzhengjie Time taken: 0.006 seconds, Fetched: 4 row(s) hive (yinzhengjie)> use db_hive2; #使用db_hive2数据库 OK Time taken: 0.014 seconds hive (db_hive2)> show tables; #db_hive2数据库中没有任何表 OK tab_name Time taken: 0.015 seconds hive (db_hive2)> drop database if exists db_hive2; #删除空的数据库db_hive2 OK Time taken: 0.05 seconds hive (db_hive2)> show databases; OK database_name db_hive default yinzhengjie Time taken: 0.006 seconds, Fetched: 3 row(s) hive (db_hive2)> use db_hive; #使用db_hive数据库 OK Time taken: 0.012 seconds hive (db_hive)> show tables; #db_hive2数据库中是有数据表的 OK tab_name classlist student teacher Time taken: 0.016 seconds, Fetched: 3 row(s) hive (db_hive)> drop database db_hive cascade; #使用关键字cascade强制删除有数据的数据库db_hive OK Time taken: 0.304 seconds hive (db_hive)> use yinzhengjie; OK Time taken: 0.016 seconds hive (yinzhengjie)> show databases; OK database_name default yinzhengjie Time taken: 0.007 seconds, Fetched: 2 row(s) hive (yinzhengjie)>
一.建表语法以及字段解释 1>.建表语句如下: CREATE [EXTERNAL] TABLE [IF NOT EXISTS] table_name [(col_name data_type [COMMENT col_comment], ...)] [COMMENT table_comment] [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] [CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] [ROW FORMAT row_format] [STORED AS file_format] [LOCATION hdfs_path] 2>.字段解释说明: a>.CREATE TABLE 创建一个指定名字的表。如果相同名字的表已经存在,则抛出异常;用户可以用 IF NOT EXISTS 选项来忽略这个异常。 b>.EXTERNAL关键字可以让用户创建一个外部表,在建表的同时指定一个指向实际数据的路径(LOCATION),Hive创建内部表时,会将数据移动到数据仓库指向的路径;若创建外部表,仅记录数据所在的路径,不对数据的位置做任何改变。在删除表的时候,内部表的元数据和数据会被一起删除,而外部表只删除元数据,不删除数据。 c>.COMMENT:为表和列添加注释。 d>.PARTITIONED BY创建分区表 e>.CLUSTERED BY创建分桶表 f>.SORTED BY不常用 g>.ROW FORMAT DELIMITED [FIELDS TERMINATED BY char] [COLLECTION ITEMS TERMINATED BY char] [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)] 用户在建表的时候可以自定义SerDe或者使用自带的SerDe。如果没有指定ROW FORMAT 或者ROW FORMAT DELIMITED,将会使用自带的SerDe。在建表的时候,用户还需要为表指定列,用户在指定表的列的同时也会指定自定义的SerDe,Hive通过SerDe确定表的具体的列的数据。 h>.STORED AS指定存储文件类型 常用的存储文件类型:SEQUENCEFILE(二进制序列文件)、TEXTFILE(文本)、RCFILE(列式存储格式文件) 如果文件数据是纯文本,可以使用STORED AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCEFILE。 i>.LOCATION :指定表在HDFS上的存储位置。 j>.LIKE允许用户复制现有的表结构,但是不复制数据。 二.管理表(内部表)理论 默认创建的表都是所谓的管理表,有时也被称为内部表。因为这种表,Hive会(或多或少地)控制着数据的生命周期。Hive默认情况下会将这些表的数据存储在由配置项hive.metastore.warehouse.dir(例如,/user/hive/warehouse)所定义的目录的子目录下。 当我们删除一个管理表时,Hive也会删除这个表中数据。管理表不适合和其他工具共享数据。 三.外部表 1>.理论 因为表是外部表,所以Hive并非认为其完全拥有这份数据。删除该表并不会删除掉这份数据,不过描述表的元数据信息会被删除掉。 2>.管理表和外部表的使用场景: 每天将收集到的网站日志定期流入HDFS文本文件。在外部表(原始日志表)的基础上做大量的统计分析,用到的中间表、结果表使用内部表存储,数据通过SELECT+INSERT进入内部表。 四.分区表 分区表实际上就是对应一个HDFS文件系统上的独立的文件夹,该文件夹下是该分区所有的数据文件。Hive中的分区就是分目录,把一个大的数据集根据业务需要分割成小的数据集。在查询时通过WHERE子句中的表达式选择查询所需要的指定的分区,这样的查询效率会提高很多。
hive (yinzhengjie)> show tables; OK tab_name student teacher Time taken: 0.015 seconds, Fetched: 2 row(s) hive (yinzhengjie)> create table if not exists teacherbak as select id, name from teacher; #根据查询结果创建表,即查询的结果会添加到新创建的表中,它会自动启用一个job WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180806000505_71d796a2-3129-4497-9741-b5d39abd58c9 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1533518652134_0001, Tracking URL = http://s101:8088/proxy/application_1533518652134_0001/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533518652134_0001 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2018-08-06 00:05:26,132 Stage-1 map = 0%, reduce = 0% 2018-08-06 00:05:37,668 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.02 sec MapReduce Total cumulative CPU time: 2 seconds 20 msec Ended Job = job_1533518652134_0001 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/.hive-staging_hive_2018-08-06_00-05-05_947_8165112419833752968-1/-ext-10002 Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/teacherbak MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.02 sec HDFS Read: 3610 HDFS Write: 258 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 20 msec OK id name Time taken: 33.117 seconds hive (yinzhengjie)> show tables; OK tab_name student teacher teacherbak Time taken: 0.014 seconds, Fetched: 3 row(s) hive (yinzhengjie)> select id, name from teacher; #查看teacher表中的数据 OK id name 70 Dennis MacAlistair Ritchie 49 Linus Benedict Torvalds 68 Bjarne Stroustrup 62 Guido van Rossum 63 James Gosling 60 Martin Odersky 62 Rob Pike 50 Rasmus Lerdorf 50 Brendan Eich Time taken: 0.093 seconds, Fetched: 9 row(s) hive (yinzhengjie)> hive (yinzhengjie)> select id, name from teacherbak; #查看teacherbak表中的数据,我们会发现其内容和teacher一致 OK id name 70 Dennis MacAlistair Ritchie 49 Linus Benedict Torvalds 68 Bjarne Stroustrup 62 Guido van Rossum 63 James Gosling 60 Martin Odersky 62 Rob Pike 50 Rasmus Lerdorf 50 Brendan Eich Time taken: 0.083 seconds, Fetched: 9 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> show tables; OK tab_name student teacher teacherbak Time taken: 0.013 seconds, Fetched: 3 row(s) hive (yinzhengjie)> desc teacher; OK col_name data_type comment id int name string Time taken: 0.029 seconds, Fetched: 2 row(s) hive (yinzhengjie)> select * from teacher; OK teacher.id teacher.name 70 Dennis MacAlistair Ritchie 49 Linus Benedict Torvalds 68 Bjarne Stroustrup 62 Guido van Rossum 63 James Gosling 60 Martin Odersky 62 Rob Pike 50 Rasmus Lerdorf 50 Brendan Eich Time taken: 0.1 seconds, Fetched: 9 row(s) hive (yinzhengjie)> create table if not exists teacherCopy like teacher; #根据已经存在的表结构创建表,即只复制表结构,并不会复制表中的数据 OK Time taken: 0.181 seconds hive (yinzhengjie)> show tables; OK tab_name student teacher teacherbak teachercopy Time taken: 0.014 seconds, Fetched: 4 row(s) hive (yinzhengjie)> select * from teachercopy; OK teachercopy.id teachercopy.name Time taken: 0.103 seconds hive (yinzhengjie)> desc teachercopy; OK col_name data_type comment id int name string Time taken: 0.03 seconds, Fetched: 2 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> show tables; OK tab_name student teacher teacherbak teachercopy Time taken: 0.012 seconds, Fetched: 4 row(s) hive (yinzhengjie)> desc formatted teacher; #查询表的类型 OK col_name data_type comment # col_name data_type comment id int name string # Detailed Table Information Database: yinzhengjie Owner: yinzhengjie CreateTime: Sun Aug 05 19:55:34 PDT 2018 LastAccessTime: UNKNOWN Retention: 0 Location: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/teacher Table Type: MANAGED_TABLE #显示器对面的小哥哥小姐姐往这里看,这里可以查看当前表的类型哟,这里明显是管理表,也称为内部表。 Table Parameters: numFiles 1 numRows 0 rawDataSize 0 totalSize 179 transient_lastDdlTime 1533524151 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim serialization.format Time taken: 0.036 seconds, Fetched: 31 row(s) hive (yinzhengjie)>
一.查看原始数据 [yinzhengjie@s101 download]$ pwd /home/yinzhengjie/download [yinzhengjie@s101 download]$ [yinzhengjie@s101 download]$ cat dept.dat 10 ACCOUNTING 2700 20 RESEARCH 3800 30 SALES 5900 40 OPERATIONS 4700 [yinzhengjie@s101 download]$ [yinzhengjie@s101 download]$ more emp.dat 7369 SMITH CLERK 7902 1980-12-17 800.00 20 7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30 7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30 7566 JONES MANAGER 7839 1981-4-2 2975.00 20 7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30 7782 CLARK MANAGER 7839 1981-6-9 2450.00 10 7788 SCOTT ANALYST 7566 1987-4-19 3000.00 20 7839 KING PRESIDENT 1981-11-17 5000.00 10 7844 TURNER SALESMAN 7698 1981-9-8 1500.00 0.00 30 7876 ADAMS CLERK 7788 1987-5-23 1100.00 20 7900 JAMES CLERK 7698 1981-12-3 950.00 30 7902 FORD ANALYST 7566 1981-12-3 3000.00 20 7934 MILLER CLERK 7782 1982-1-23 1300.00 10 [yinzhengjie@s101 download]$ 二.使用关键字external创建外部表语句 1>.创建部门表 hive (yinzhengjie)> create external table if not exists yinzhengjie.dept( > deptno int, > dname string, > loc int > ) > row format delimited fields terminated by ' '; OK Time taken: 0.096 seconds hive (yinzhengjie)> 2>.创建员工表 hive (yinzhengjie)> create external table if not exists yinzhengjie.emp( > empno int, > ename string, > job string, > mgr int, > hiredate string, > sal double, > comm double, > deptno int > ) > row format delimited fields terminated by ' '; OK Time taken: 0.064 seconds hive (yinzhengjie)> 3>.向外部表中导入数据 hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.dat' into table yinzhengjie.dept; Loading data to table yinzhengjie.dept OK Time taken: 0.222 seconds hive (yinzhengjie)> hive (yinzhengjie)> select * from dept; #导入成功后需要查看dept表中是否有数据 OK dept.deptno dept.dname dept.loc 10 ACCOUNTING 2700 20 RESEARCH 3800 30 SALES 5900 40 OPERATIONS 4700 Time taken: 0.088 seconds, Fetched: 4 row(s) hive (yinzhengjie)> hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/emp.dat' into table yinzhengjie.emp; Loading data to table yinzhengjie.emp OK Time taken: 0.21 seconds hive (yinzhengjie)> hive (yinzhengjie)> select * from emp; #导入成功后需要查看emp表中是否有数据 OK emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 Time taken: 0.079 seconds, Fetched: 14 row(s) hive (yinzhengjie)> 4>.查看表类型 hive (yinzhengjie)> desc formatted dept; #查看dept表格式化数据 OK col_name data_type comment # col_name data_type comment deptno int dname string loc int # Detailed Table Information Database: yinzhengjie Owner: yinzhengjie CreateTime: Mon Aug 06 00:52:48 PDT 2018 LastAccessTime: UNKNOWN Retention: 0 Location: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/dept Table Type: EXTERNAL_TABLE #Duang~显示器面前的小哥哥小姐姐往这看,这里有查看dept表的的类型是外部表哟! Table Parameters: EXTERNAL TRUE numFiles 1 numRows 0 rawDataSize 0 totalSize 69 transient_lastDdlTime 1533542290 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim serialization.format Time taken: 0.036 seconds, Fetched: 33 row(s) hive (yinzhengjie)> desc formatted emp; #查看emp表格式化数据 OK col_name data_type comment # col_name data_type comment empno int ename string job string mgr int hiredate string sal double comm double deptno int # Detailed Table Information Database: yinzhengjie Owner: yinzhengjie CreateTime: Mon Aug 06 00:55:41 PDT 2018 LastAccessTime: UNKNOWN Retention: 0 Location: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/emp Table Type: EXTERNAL_TABLE #Duang~显示器面前的小哥哥小姐姐往这看,这里有查看emp表的的类型是外部表哟! Table Parameters: EXTERNAL TRUE numFiles 1 numRows 0 rawDataSize 0 totalSize 657 transient_lastDdlTime 1533542299 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim serialization.format Time taken: 0.036 seconds, Fetched: 38 row(s) hive (yinzhengjie)> 5>.在hive中删除外部表并不会删除hdfs的真实数据 hive (yinzhengjie)> show tables; OK tab_name dept emp student teacher teacherbak teachercopy Time taken: 0.014 seconds, Fetched: 6 row(s) hive (yinzhengjie)> drop table dept; OK Time taken: 0.122 seconds hive (yinzhengjie)> drop table emp; OK Time taken: 0.079 seconds hive (yinzhengjie)> show tables; #你会发现删除了元数据表,并没有删除真实数据,我们可以在hive中通过dfs命令查看真实数据 OK tab_name student teacher teacherbak teachercopy Time taken: 0.013 seconds, Fetched: 4 row(s) hive (yinzhengjie)> hive (yinzhengjie)> dfs -cat /user/hive/warehouse/yinzhengjie.db/dept/dept.dat; #怎么样?hdfs中的文件内容依旧存在,并没有删除,hive只是删除了元数据而已。 10 ACCOUNTING 2700 20 RESEARCH 3800 30 SALES 5900 40 OPERATIONS 4700 hive (yinzhengjie)> > dfs -cat /user/hive/warehouse/yinzhengjie.db/emp/emp.dat; #怎么样?hdfs中的文件内容依旧存在,并没有删除,hive只是删除了元数据而已。 7369 SMITH CLERK 7902 1980-12-17 800.00 20 7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30 7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30 7566 JONES MANAGER 7839 1981-4-2 2975.00 20 7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30 7782 CLARK MANAGER 7839 1981-6-9 2450.00 10 7788 SCOTT ANALYST 7566 1987-4-19 3000.00 20 7839 KING PRESIDENT 1981-11-17 5000.00 10 7844 TURNER SALESMAN 7698 1981-9-8 1500.00 0.00 30 7876 ADAMS CLERK 7788 1987-5-23 1100.00 20 7900 JAMES CLERK 7698 1981-12-3 950.00 30 7902 FORD ANALYST 7566 1981-12-3 3000.00 20 7934 MILLER CLERK 7782 1982-1-23 1300.00 10 hive (yinzhengjie)>
分区表的特点总结如下: 1>.分区表实际上就是对应一个HDFS文件系统上的独立的文件夹,该文件夹下是该分区所有的数据文件。 2>.Hive中的分区就是对应一个HDFS文件系统上分目录,把一个大的数据集根据业务的需要分割成小的数据集。 3>.在查询时通过where子句中的表达式选择查询所需要的指定分区,这样的查询效率会提高很多。 [yinzhengjie@s101 download]$ cat users.txt 1 yinzhengjie 26 2 Guido van Rossum 62 3 Martin Odersky 60 4 Rasmus Lerdorf 50 [yinzhengjie@s101 download]$ [yinzhengjie@s101 download]$ cat dept.txt 10 开发部门 20000 20 运维部门 13000 30 测试部门 8000 40 产品部门 6000 50 销售部门 15000 60 财务部门 17000 70 人事部门 16000 [yinzhengjie@s101 download]$
hive (yinzhengjie)> show tables; OK tab_name Time taken: 0.038 seconds hive (yinzhengjie)> create table dept_partition( > deptno int, > dname string, > loc string > ) > partitioned by (month string) > row format delimited fields terminated by ' '; OK Time taken: 0.262 seconds hive (yinzhengjie)> hive (yinzhengjie)> show tables; OK tab_name dept_partition Time taken: 0.035 seconds, Fetched: 1 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> show tables; OK tab_name dept_partition raw_logs student teacher teacherbak teachercopy Time taken: 0.016 seconds, Fetched: 6 row(s) hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.txt' into table yinzhengjie.dept_partition partition(month='201803'); #加载数据指定分区 Loading data to table yinzhengjie.dept_partition partition (month=201803) OK Time taken: 0.609 seconds hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.txt' into table yinzhengjie.dept_partition partition(month='201804'); Loading data to table yinzhengjie.dept_partition partition (month=201804) OK Time taken: 0.868 seconds hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.txt' into table yinzhengjie.dept_partition partition(month='201805'); Loading data to table yinzhengjie.dept_partition partition (month=201805) OK Time taken: 0.462 seconds hive (yinzhengjie)> select * from dept_partition; OK dept_partition.deptno dept_partition.dname dept_partition.loc dept_partition.month 10 开发部门 20000 201803 20 运维部门 13000 201803 30 测试部门 8000 201803 40 产品部门 6000 201803 50 销售部门 15000 201803 60 财务部门 17000 201803 70 人事部门 16000 201803 10 开发部门 20000 201804 20 运维部门 13000 201804 30 测试部门 8000 201804 40 产品部门 6000 201804 50 销售部门 15000 201804 60 财务部门 17000 201804 70 人事部门 16000 201804 10 开发部门 20000 201805 20 运维部门 13000 201805 30 测试部门 8000 201805 40 产品部门 6000 201805 50 销售部门 15000 201805 60 财务部门 17000 201805 70 人事部门 16000 201805 Time taken: 0.129 seconds, Fetched: 21 row(s) hive (yinzhengjie)> select * from dept_partition where month='201805'; OK dept_partition.deptno dept_partition.dname dept_partition.loc dept_partition.month 10 开发部门 20000 201805 20 运维部门 13000 201805 30 测试部门 8000 201805 40 产品部门 6000 201805 50 销售部门 15000 201805 60 财务部门 17000 201805 70 人事部门 16000 201805 Time taken: 1.017 seconds, Fetched: 7 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> show partitions dept_partition; OK partition month=201803 month=201804 month=201805 Time taken: 0.563 seconds, Fetched: 3 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select * from dept_partition where month='201805'; #单分区查询 OK dept_partition.deptno dept_partition.dname dept_partition.loc dept_partition.month 10 开发部门 20000 201805 20 运维部门 13000 201805 30 测试部门 8000 201805 40 产品部门 6000 201805 50 销售部门 15000 201805 60 财务部门 17000 201805 70 人事部门 16000 201805 Time taken: 1.017 seconds, Fetched: 7 row(s) hive (yinzhengjie)> hive (yinzhengjie)> select * from dept_partition where month='201803' > union > select * from dept_partition where month='201804' > union > select * from dept_partition where month='201805'; #多分区联合查询,你会发现它的速度还不如select * from dept_partition; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180808214447_1a70bd61-3355-4f99-ba74-de7503593798 Total jobs = 2 Launching Job 1 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0001, Tracking URL = http://s101:8088/proxy/application_1533789743141_0001/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0001 Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1 2018-08-08 21:45:46,855 Stage-1 map = 0%, reduce = 0% 2018-08-08 21:46:32,103 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 6.11 sec 2018-08-08 21:47:09,769 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.95 sec MapReduce Total cumulative CPU time: 8 seconds 950 msec Ended Job = job_1533789743141_0001 Launching Job 2 out of 2 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0002, Tracking URL = http://s101:8088/proxy/application_1533789743141_0002/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0002 Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 1 2018-08-08 21:47:41,300 Stage-2 map = 0%, reduce = 0% 2018-08-08 21:48:41,349 Stage-2 map = 0%, reduce = 0%, Cumulative CPU 5.88 sec 2018-08-08 21:48:42,776 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 7.33 sec 2018-08-08 21:49:23,133 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 10.41 sec MapReduce Total cumulative CPU time: 10 seconds 410 msec Ended Job = job_1533789743141_0002 MapReduce Jobs Launched: Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 8.95 sec HDFS Read: 17348 HDFS Write: 708 SUCCESS Stage-Stage-2: Map: 2 Reduce: 1 Cumulative CPU: 10.41 sec HDFS Read: 17496 HDFS Write: 1194 SUCCESS Total MapReduce CPU Time Spent: 19 seconds 360 msec OK u3.deptno u3.dname u3.loc u3.month 10 开发部门 20000 201803 10 开发部门 20000 201804 10 开发部门 20000 201805 20 运维部门 13000 201803 20 运维部门 13000 201804 20 运维部门 13000 201805 30 测试部门 8000 201803 30 测试部门 8000 201804 30 测试部门 8000 201805 40 产品部门 6000 201803 40 产品部门 6000 201804 40 产品部门 6000 201805 50 销售部门 15000 201803 50 销售部门 15000 201804 50 销售部门 15000 201805 60 财务部门 17000 201803 60 财务部门 17000 201804 60 财务部门 17000 201805 70 人事部门 16000 201803 70 人事部门 16000 201804 70 人事部门 16000 201805 Time taken: 278.849 seconds, Fetched: 21 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> show partitions dept_partition; #查看分区表中已经有的分区数 OK partition month=201803 month=201804 month=201805 Time taken: 0.563 seconds, Fetched: 3 row(s) hive (yinzhengjie)> ALTER TABLE dept_partition ADD PARTITION(month='201806'); #添加单个分区 OK Time taken: 0.562 seconds hive (yinzhengjie)> show partitions dept_partition; OK partition month=201803 month=201804 month=201805 month=201806 Time taken: 0.096 seconds, Fetched: 4 row(s) hive (yinzhengjie)> ALTER TABLE dept_partition ADD PARTITION(month='201807') PARTITION(month='201808') PARTITION(month='201809'); #添加多个分区 OK Time taken: 0.22 seconds hive (yinzhengjie)> show partitions dept_partition; OK partition month=201803 month=201804 month=201805 month=201806 month=201807 month=201808 month=201809 Time taken: 0.097 seconds, Fetched: 7 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> hive (yinzhengjie)> show partitions dept_partition; #查看当前已经有的分区数 OK partition month=201803 month=201804 month=201805 month=201806 month=201807 month=201808 month=201809 Time taken: 0.114 seconds, Fetched: 7 row(s) hive (yinzhengjie)> ALTER TABLE dept_partition DROP PARTITION(month='201807'); #删除单个分区 Dropped the partition month=201807 OK Time taken: 0.893 seconds hive (yinzhengjie)> show partitions dept_partition; OK partition month=201803 month=201804 month=201805 month=201806 month=201808 month=201809 Time taken: 0.083 seconds, Fetched: 6 row(s) hive (yinzhengjie)> ALTER TABLE dept_partition DROP PARTITION(month='201808'),PARTITION(month='201809'); #同时删除多个分区 Dropped the partition month=201808 Dropped the partition month=201809 OK Time taken: 0.364 seconds hive (yinzhengjie)> show partitions dept_partition; OK partition month=201803 month=201804 month=201805 month=201806 Time taken: 0.104 seconds, Fetched: 4 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> DESC FORMATTED dept_partition; OK col_name data_type comment # col_name data_type comment deptno int dname string loc string # Partition Information #这里是分区的详细信息 # col_name data_type comment month string # Detailed Table Information Database: yinzhengjie Owner: yinzhengjie CreateTime: Wed Aug 08 21:08:14 PDT 2018 LastAccessTime: UNKNOWN Retention: 0 Location: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/dept_partition Table Type: MANAGED_TABLE Table Parameters: transient_lastDdlTime 1533787694 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim serialization.format Time taken: 1.813 seconds, Fetched: 33 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> create table users ( > id int, > name string, > age int > ) > partitioned by (province string, city string) > row format delimited fields terminated by ' '; OK Time taken: 1.046 seconds hive (yinzhengjie)> show tables; OK tab_name dept_partition raw_logs student teacher teacherbak teachercopy users Time taken: 0.26 seconds, Fetched: 7 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> create table users (id int,name string, age int) partitioned by (province string, city string) row format delimited fields terminated by ' '; #创建二级分区 OK Time taken: 0.071 seconds hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/users.txt' into table users partition(province='hebei',city='shijiazhuang'); #加载数到擦创建的二级分区中 Loading data to table yinzhengjie.users partition (province=hebei, city=shijiazhuang) OK Time taken: 0.482 seconds hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/users.txt' into table users partition(province='shanxi',city='xian'); Loading data to table yinzhengjie.users partition (province=shanxi, city=xian) OK Time taken: 0.414 seconds hive (yinzhengjie)> select * from users; OK users.id users.name users.age users.province users.city 1 yinzhengjie 26 hebei shijiazhuang 2 Guido van Rossum 62 hebei shijiazhuang 3 Martin Odersky 60 hebei shijiazhuang 4 Rasmus Lerdorf 50 hebei shijiazhuang 1 yinzhengjie 26 shanxi xian 2 Guido van Rossum 62 shanxi xian 3 Martin Odersky 60 shanxi xian 4 Rasmus Lerdorf 50 shanxi xian Time taken: 0.101 seconds, Fetched: 8 row(s) hive (yinzhengjie)> select * from users where province='hebei'; #查询分区表中仅含有province='hebei'的数据 OK users.id users.name users.age users.province users.city 1 yinzhengjie 26 hebei shijiazhuang 2 Guido van Rossum 62 hebei shijiazhuang 3 Martin Odersky 60 hebei shijiazhuang 4 Rasmus Lerdorf 50 hebei shijiazhuang Time taken: 1.775 seconds, Fetched: 4 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> dfs -mkdir -p /user/hive/warehouse/yinzhengjie.db/users/province=hebei/city=handan; #在hdfs上创建目录 hive (yinzhengjie)> hive (yinzhengjie)> dfs -put /home/yinzhengjie/download/users.txt /user/hive/warehouse/yinzhengjie.db/users/province=hebei/city=handan; #将本地文件的数据上传到hdfs上 hive (yinzhengjie)> hive (yinzhengjie)> select * from users where province='hebei' and city='handan'; #很显然,查看数据是没有的 OK users.id users.name users.age users.province users.city Time taken: 0.304 seconds hive (yinzhengjie)> hive (yinzhengjie)> msck repair table users; #手动执行修复命令 OK Partitions not in metastore: users:province=hebei/city=handan Repair: Added partition to metastore users:province=hebei/city=handan Time taken: 0.487 seconds, Fetched: 2 row(s) hive (yinzhengjie)> select * from users where province='hebei' and city='handan'; #再次查看数据,发现已经是有数据的 OK users.id users.name users.age users.province users.city 1 yinzhengjie 26 hebei handan 2 Guido van Rossum 62 hebei handan 3 Martin Odersky 60 hebei handan 4 Rasmus Lerdorf 50 hebei handan Time taken: 0.156 seconds, Fetched: 4 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> dfs -mkdir -p /user/hive/warehouse/yinzhengjie.db/users/province=shanxi/city=ankang; hive (yinzhengjie)> dfs -put /home/yinzhengjie/download/users.txt /user/hive/warehouse/yinzhengjie.db/users/province=shanxi/city=ankang; hive (yinzhengjie)> select * from users where province='shanxi' and city='ankang'; #查询数据,此时数据是没有查到的 OK users.id users.name users.age users.province users.city Time taken: 0.112 seconds hive (yinzhengjie)> hive (yinzhengjie)> ALTER TABLE users add partition(province='shanxi',city='ankang'); #上传数据后添加分区 OK Time taken: 0.14 seconds hive (yinzhengjie)> select * from users where province='shanxi' and city='ankang'; #再次查询数据,你会发现数据又有了 OK users.id users.name users.age users.province users.city 1 yinzhengjie 26 shanxi ankang 2 Guido van Rossum 62 shanxi ankang 3 Martin Odersky 60 shanxi ankang 4 Rasmus Lerdorf 50 shanxi ankang Time taken: 0.156 seconds, Fetched: 4 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> dfs -mkdir -p /user/hive/warehouse/yinzhengjie.db/users/province=shanxi/city=hanzhong; #在hdfs上创建目录 hive (yinzhengjie)> select * from users where province='shanxi' and city='hanzhong'; #很显然,查看数据是没有的 OK users.id users.name users.age users.province users.city Time taken: 0.148 seconds hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/users.txt' into table users partition(province='shanxi',city='hanzhong'); #上传数据后load数据到分区 Loading data to table yinzhengjie.users partition (province=shanxi, city=hanzhong) OK Time taken: 0.593 seconds hive (yinzhengjie)> select * from users where province='shanxi' and city='hanzhong'; #再次查看数据,发现已经是有数据的 OK users.id users.name users.age users.province users.city 1 yinzhengjie 26 shanxi hanzhong 2 Guido van Rossum 62 shanxi hanzhong 3 Martin Odersky 60 shanxi hanzhong 4 Rasmus Lerdorf 50 shanxi hanzhong Time taken: 0.104 seconds, Fetched: 4 row(s) hive (yinzhengjie)>
分桶表-创建分桶表(hive (yinzhengjie)> create table stu_buck(id int,name string) clustered by(id) into 4 buckets row format delimited fields terminated by ' ';) 1>.分区针对的是数据的存储路径;分桶针对的是数据文件。 2>.分区提供一个隔离数据和优化查询的便利方式。不过,并非所有的数据集都可形成合理的分区,特别是之前所提到过的要确定合适的划分大小这个疑虑。分桶是将数据集分解成更容易管理的若干部分的另一个技术。 hive (yinzhengjie)> create table stu_buck( > id int, > name string > ) > clustered by(id) > into 4 buckets > row format delimited fields terminated by ' '; #创建分桶表 OK Time taken: 0.246 seconds hive (yinzhengjie)> hive (yinzhengjie)> desc formatted stu_buck; #查看表结构 OK col_name data_type comment # col_name data_type comment id int name string # Detailed Table Information Database: yinzhengjie Owner: yinzhengjie CreateTime: Fri Aug 10 00:52:10 PDT 2018 LastAccessTime: UNKNOWN Retention: 0 Location: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/stu_buck Table Type: MANAGED_TABLE Table Parameters: COLUMN_STATS_ACCURATE {"BASIC_STATS":"true"} numFiles 0 numRows 0 rawDataSize 0 totalSize 0 transient_lastDdlTime 1533887530 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: 4 #小哥哥小姐姐们,快看这里,这是4个分桶表。 Bucket Columns: [id] Sort Columns: [] Storage Desc Params: field.delim serialization.format Time taken: 0.128 seconds, Fetched: 32 row(s) hive (yinzhengjie)>
分桶表-导入数据到分桶表中(hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/stu_buck.txt' into table stu_buck;) 1>.分区针对的是数据的存储路径;分桶针对的是数据文件。 2>.分区提供一个隔离数据和优化查询的便利方式。不过,并非所有的数据集都可形成合理的分区,特别是之前所提到过的要确定合适的划分大小这个疑虑。分桶是将数据集分解成更容易管理的若干部分的另一个技术。 hive (yinzhengjie)> ! cat /home/yinzhengjie/download/stu_buck.txt; #查看本地文件内容 1001 ss1 1002 ss2 1003 ss3 1004 ss4 1005 ss5 1006 ss6 1007 ss7 1008 ss8 1009 ss9 1010 ss10 1011 ss11 1012 ss12 1013 ss13 1014 ss14 1015 ss15 1016 ss16 hive (yinzhengjie)> hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/stu_buck.txt' into table stu_buck; #将本地文件内容导入到hive表中 Loading data to table yinzhengjie.stu_buck OK Time taken: 0.306 seconds hive (yinzhengjie)> hive (yinzhengjie)> select * from stu_buck; #查询桶表的内容 OK stu_buck.id stu_buck.name 1001 ss1 1002 ss2 1003 ss3 1004 ss4 1005 ss5 1006 ss6 1007 ss7 1008 ss8 1009 ss9 1010 ss10 1011 ss11 1012 ss12 1013 ss13 1014 ss14 1015 ss15 1016 ss16 Time taken: 0.088 seconds, Fetched: 16 row(s) hive (yinzhengjie)>
分桶表-创建分桶表时,数据通过子查询的方式导入(hive (yinzhengjie)> insert into table stu_buck select id, name from stu;) hive (yinzhengjie)> create table stu( > id int, > name string > ) > row format delimited fields terminated by ' '; #先建一个普通的stu表 OK Time taken: 0.148 seconds hive (yinzhengjie)> hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/stu_buck.txt' into table stu; #向普通的stu表中导入数据 Loading data to table yinzhengjie.stu OK Time taken: 0.186 seconds hive (yinzhengjie)> hive (yinzhengjie)> truncate table stu_buck; #清空stu_buck表中数据 OK Time taken: 0.098 seconds hive (yinzhengjie)> select * from stu_buck; #导入数据到分桶表,通过子查询的方式 OK stu_buck.id stu_buck.name Time taken: 0.103 seconds hive (yinzhengjie)> hive (yinzhengjie)> insert into table stu_buck select id, name from stu; #导入数据到分桶表,通过子查询的方式 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180810010832_901bd21c-690c-48b5-9282-c3900c960245 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 2 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0049, Tracking URL = http://s101:8088/proxy/application_1533789743141_0049/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0049 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 2 2018-08-10 01:08:54,781 Stage-1 map = 0%, reduce = 0% 2018-08-10 01:09:34,871 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.52 sec 2018-08-10 01:10:01,903 Stage-1 map = 100%, reduce = 50%, Cumulative CPU 5.3 sec 2018-08-10 01:10:03,970 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.01 sec MapReduce Total cumulative CPU time: 8 seconds 10 msec Ended Job = job_1533789743141_0049 Loading data to table yinzhengjie.stu_buck MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 2 Cumulative CPU: 8.01 sec HDFS Read: 11021 HDFS Write: 303 SUCCESS Total MapReduce CPU Time Spent: 8 seconds 10 msec OK id name Time taken: 95.111 seconds hive (yinzhengjie)> hive (yinzhengjie)> select * from stu_buck; #查询分桶的数据 OK stu_buck.id stu_buck.name 1016 ss16 1012 ss12 1008 ss8 1004 ss4 1001 ss1 1013 ss13 1005 ss5 1009 ss9 1014 ss14 1010 ss10 1006 ss6 1002 ss2 1015 ss15 1007 ss7 1003 ss3 1011 ss11 Time taken: 0.073 seconds, Fetched: 16 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> show tables; #查看当前数据库已经存在的表 OK tab_name dept_partition raw_logs student teacher teacherbak teachercopy users Time taken: 0.071 seconds, Fetched: 7 row(s) hive (yinzhengjie)> ALTER TABLE users RENAME TO myusers; #重命名表,将users表名改为myusers OK Time taken: 0.341 seconds hive (yinzhengjie)> show tables; #再次查看当前数据库已经存在的表,发现表名称已经修改了 OK tab_name dept_partition myusers raw_logs student teacher teacherbak teachercopy Time taken: 0.011 seconds, Fetched: 7 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> desc dept_partition; #查看表结构 OK col_name data_type comment deptno int dname string loc string month string # Partition Information # col_name data_type comment month string Time taken: 0.054 seconds, Fetched: 9 row(s) hive (yinzhengjie)> ALTER TABLE dept_partition ADD COLUMNS(desc string); #添加新字段(列),温馨提示:ADD是代表新增一字段,字段位置在所有列后面(partition列前),REPLACE则是表示替换表中所有字段。 OK Time taken: 0.176 seconds hive (yinzhengjie)> desc dept_partition; #再次查看表结构 OK col_name data_type comment deptno int dname string loc string desc string month string # Partition Information # col_name data_type comment month string Time taken: 0.059 seconds, Fetched: 10 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> desc dept_partition; #查看表结构 OK col_name data_type comment deptno int dname string loc string month string # Partition Information # col_name data_type comment month string Time taken: 0.054 seconds, Fetched: 9 row(s) hive (yinzhengjie)> hive (yinzhengjie)> alter table dept_partition change column desc deptdesc string; #修改列名实操案例 OK Time taken: 0.153 seconds hive (yinzhengjie)> desc dept_partition; OK col_name data_type comment deptno int dname string loc string deptdesc string month string # Partition Information # col_name data_type comment month string Time taken: 0.027 seconds, Fetched: 10 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> desc dept_partition; OK col_name data_type comment deptno int dname string loc string deptdesc string month string # Partition Information # col_name data_type comment month string Time taken: 0.031 seconds, Fetched: 10 row(s) hive (yinzhengjie)> alter table dept_partition replace columns(deptno string, dname string, loc string); #替换列名,温馨提示:ADD是代表新增一字段,字段位置在所有列后面(partition列前),REPLACE则是表示替换表中所有字段。 OK Time taken: 0.152 seconds hive (yinzhengjie)> desc dept_partition; OK col_name data_type comment deptno string dname string loc string month string # Partition Information # col_name data_type comment month string Time taken: 0.027 seconds, Fetched: 9 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> show tables; OK tab_name dept_partition myusers raw_logs student teacher teacherbak teachercopy Time taken: 0.015 seconds, Fetched: 7 row(s) hive (yinzhengjie)> DROP TABLE dept_partition; #删除指定的表 OK Time taken: 0.214 seconds hive (yinzhengjie)> show tables; OK tab_name myusers raw_logs student teacher teacherbak teachercopy Time taken: 0.015 seconds, Fetched: 6 row(s) hive (yinzhengjie)>
3>.DML数据操作
数据导入-向表中装载数据(Load)语法 hive>load data [local] inpath '/home/yinzhengjie/download/user.txt' [overwrite] into table student [partition (partcol1=val1,…)]; 以上参数说明: 1>.load data:表示加载数据 2>.local:表示从本地加载数据到hive表;否则从HDFS加载数据到hive表 3>.inpath:表示加载数据的路径 4>.overwrite:表示覆盖表中已有数据,否则表示追加 5>.into table:表示加载到哪张表 6>.student:表示具体的表 7>.partition:表示上传到指定分区
[yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/students.txt 1 sunwukong 2 zhubajie 3 shaheshang 4 bailongma 5 tangsanzang [yinzhengjie@s101 download]$ 登录hive创建表并将数据导入进去: hive (yinzhengjie)> create table xiyouji( > id string, > name string > ) > row format delimited fields terminated by ' '; OK Time taken: 0.635 seconds hive (yinzhengjie)> hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/students.txt' into table yinzhengjie.xiyouji; Loading data to table yinzhengjie.xiyouji OK Time taken: 10.337 seconds hive (yinzhengjie)> hive (yinzhengjie)> select * from xiyouji; OK xiyouji.id xiyouji.name 1 sunwukong 2 zhubajie 3 shaheshang 4 bailongma 5 tangsanzang Time taken: 0.131 seconds, Fetched: 5 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select * from xiyouji; OK xiyouji.id xiyouji.name 1 sunwukong 2 zhubajie 3 shaheshang 4 bailongma 5 tangsanzang Time taken: 0.207 seconds, Fetched: 5 row(s) hive (yinzhengjie)> truncate table xiyouji; #温馨提示:Truncate只能删除管理表,不能删除外部表中数据 OK Time taken: 0.169 seconds hive (yinzhengjie)> select * from xiyouji; OK xiyouji.id xiyouji.name Time taken: 0.086 seconds hive (yinzhengjie)>
hive (yinzhengjie)> select * from xiyouji; #查看表中数据是空的 OK xiyouji.id xiyouji.name Time taken: 0.077 seconds hive (yinzhengjie)> dfs -put /home/yinzhengjie/download/students.txt /home/yinzhengjie/data; #上传文件到HDFS hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/students.txt; 1 sunwukong 2 zhubajie 3 shaheshang 4 bailongma 5 tangsanzang hive (yinzhengjie)> load data inpath '/home/yinzhengjie/data/students.txt' into table yinzhengjie.xiyouji; #加载HDFS上数据,注意数据会被剪切走哟 Loading data to table yinzhengjie.xiyouji OK Time taken: 0.228 seconds hive (yinzhengjie)> select * from xiyouji; #再次查看表中数据 OK xiyouji.id xiyouji.name 1 sunwukong 2 zhubajie 3 shaheshang 4 bailongma 5 tangsanzang Time taken: 0.073 seconds, Fetched: 5 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select * from xiyouji; #查看上传之前表中数据 OK xiyouji.id xiyouji.name 1 sunwukong 2 zhubajie 3 shaheshang 4 bailongma 5 tangsanzang 1 sunwukong 2 zhubajie 3 shaheshang 4 bailongma 5 tangsanzang 1 sunwukong 2 zhubajie 3 shaheshang 4 bailongma 5 tangsanzang Time taken: 0.077 seconds, Fetched: 15 row(s) hive (yinzhengjie)> dfs -put /home/yinzhengjie/download/students.txt /home/yinzhengjie/data; #上传文件到HDFS hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/students.txt; #查看上传到HDFS的文件内容 1 sunwukong 2 zhubajie 3 shaheshang 4 bailongma 5 tangsanzang hive (yinzhengjie)> load data inpath '/home/yinzhengjie/data/students.txt' overwrite into table yinzhengjie.xiyouji; #加载HDFS上数据覆盖表中已有的数据,注意数据会被剪切走哟 Loading data to table yinzhengjie.xiyouji OK Time taken: 0.346 seconds hive (yinzhengjie)> select * from xiyouji; #再次查看表中数据。发现之前的数据已经被覆盖了 OK xiyouji.id xiyouji.name 1 sunwukong 2 zhubajie 3 shaheshang 4 bailongma 5 tangsanzang Time taken: 0.086 seconds, Fetched: 5 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> drop table xiyouji; #删除之前的测试表 OK Time taken: 1.645 seconds hive (yinzhengjie)> hive (yinzhengjie)> create table xiyouji( > id int, > name string > ) > partitioned by (position string) > row format delimited fields terminated by ' '; #创建一张分区表 OK Time taken: 0.137 seconds hive (yinzhengjie)> hive (yinzhengjie)> insert into table xiyouji partition(position='wuzhishan') values(1,'孙悟空'); #基本插入数据 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809181325_1275bf7f-0089-4d56-afaf-ecd310467701 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1533789743141_0004, Tracking URL = http://s101:8088/proxy/application_1533789743141_0004/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0004 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2018-08-09 18:14:06,514 Stage-1 map = 0%, reduce = 0% 2018-08-09 18:15:07,295 Stage-1 map = 0%, reduce = 0% 2018-08-09 18:15:31,461 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.12 sec MapReduce Total cumulative CPU time: 2 seconds 620 msec Ended Job = job_1533789743141_0004 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=wuzhishan/.hive-staging_hive_2018-08-09_18-13-25_269_2859222729747025112-1/-ext-10000 Loading data to table yinzhengjie.xiyouji partition (position=wuzhishan) MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.62 sec HDFS Read: 4190 HDFS Write: 106 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 620 msec OK _col0 _col1 Time taken: 136.695 seconds hive (yinzhengjie)> select * from xiyouji; OK xiyouji.id xiyouji.name xiyouji.position 1 孙悟空 wuzhishan Time taken: 0.169 seconds, Fetched: 1 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select * from xiyouji; #查看表中的数据 OK xiyouji.id xiyouji.name xiyouji.position 1 孙悟空 wuzhishan Time taken: 0.117 seconds, Fetched: 1 row(s) hive (yinzhengjie)> insert overwrite table xiyouji partition(position='sandabaigujing') select id, name from xiyouji where position='wuzhishan'; #根据单张表查询结果向表中插入数据 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809182335_4f9c3b89-bc30-4afb-95f7-bd294520afe9 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1533789743141_0005, Tracking URL = http://s101:8088/proxy/application_1533789743141_0005/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0005 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2018-08-09 18:23:58,547 Stage-1 map = 0%, reduce = 0% 2018-08-09 18:24:23,779 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.61 sec MapReduce Total cumulative CPU time: 2 seconds 610 msec Ended Job = job_1533789743141_0005 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=sandabaigujing/.hive-staging_hive_2018-08-09_18-23-35_915_1607485649232911242-1/-ext-10000 Loading data to table yinzhengjie.xiyouji partition (position=sandabaigujing) MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.61 sec HDFS Read: 4068 HDFS Write: 111 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 610 msec OK id name Time taken: 50.478 seconds hive (yinzhengjie)> select * from xiyouji; #再次查看表中的数据,你会发现多了一条数据,只不过position的值发生了变化 OK xiyouji.id xiyouji.name xiyouji.position 1 孙悟空 sandabaigujing 1 孙悟空 wuzhishan Time taken: 0.105 seconds, Fetched: 2 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select * from xiyouji; #查看数据表当前的数据 OK xiyouji.id xiyouji.name xiyouji.position 1 孙悟空 sandabaigujing 1 孙悟空 wuzhishan Time taken: 0.14 seconds, Fetched: 2 row(s) hive (yinzhengjie)> from xiyouji > insert overwrite table xiyouji partition(position='nverguo') > select id, name where position='wuzhishan' > insert overwrite table xiyouji partition(position='zhenjiameihouwang') > select id, name where position='wuzhishan'; #根据多张表查询结果多插入模式,我测试时只插入了2条数据 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809183740_ef71ba4e-acec-4ef7-8510-0f01c57bd49d Total jobs = 5 Launching Job 1 out of 5 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1533789743141_0009, Tracking URL = http://s101:8088/proxy/application_1533789743141_0009/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0009 Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 0 2018-08-09 18:38:07,195 Stage-2 map = 0%, reduce = 0% 2018-08-09 18:38:39,132 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.08 sec MapReduce Total cumulative CPU time: 2 seconds 80 msec Ended Job = job_1533789743141_0009 Stage-5 is selected by condition resolver. Stage-4 is filtered out by condition resolver. Stage-6 is filtered out by condition resolver. Stage-11 is selected by condition resolver. Stage-10 is filtered out by condition resolver. Stage-12 is filtered out by condition resolver. Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=nverguo/.hive-staging_hive_2018-08-09_18-37-40_573_1576742180177937358-1/-ext-10000 Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=zhenjiameihouwang/.hive-staging_hive_2018-08-09_18-37-40_573_1576742180177937358-1/-ext-10002 Loading data to table yinzhengjie.xiyouji partition (position=nverguo) Loading data to table yinzhengjie.xiyouji partition (position=zhenjiameihouwang) MapReduce Jobs Launched: Stage-Stage-2: Map: 1 Cumulative CPU: 2.08 sec HDFS Read: 5239 HDFS Write: 218 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 80 msec OK id name Time taken: 63.367 seconds hive (yinzhengjie)> select * from xiyouji; #再次查看数据表当前的数据,你会发现又多了2条数据 OK xiyouji.id xiyouji.name xiyouji.position 1 孙悟空 nverguo 1 孙悟空 sandabaigujing 1 孙悟空 wuzhishan 1 孙悟空 zhenjiameihouwang Time taken: 0.141 seconds, Fetched: 4 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select * from xiyouji; #查看表中的数据 OK xiyouji.id xiyouji.name xiyouji.position 1 孙悟空 nverguo 1 孙悟空 sandabaigujing 1 孙悟空 wuzhishan 1 孙悟空 zhenjiameihouwang Time taken: 0.087 seconds, Fetched: 4 row(s) hive (yinzhengjie)> create table if not exists xiyouji2 as select id, name from xiyouji; #根据查询结果创建表(查询的结果会添加到新创建的表中) WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809184435_d18b1d0b-3454-4fbe-bffa-ec501fa5fd09 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1533789743141_0010, Tracking URL = http://s101:8088/proxy/application_1533789743141_0010/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0010 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2018-08-09 18:44:53,798 Stage-1 map = 0%, reduce = 0% 2018-08-09 18:45:26,674 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.39 sec MapReduce Total cumulative CPU time: 2 seconds 390 msec Ended Job = job_1533789743141_0010 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/.hive-staging_hive_2018-08-09_18-44-35_127_6564594081639052485-1/-ext-10002 Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji2 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.39 sec HDFS Read: 5463 HDFS Write: 124 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 390 msec OK id name Time taken: 54.907 seconds hive (yinzhengjie)> select * from xiyouji2; #查看新生成表的数据 OK xiyouji2.id xiyouji2.name 1 孙悟空 1 孙悟空 1 孙悟空 1 孙悟空 Time taken: 0.065 seconds, Fetched: 4 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> create table if not exists Student( > id int, > name string > ) > row format delimited fields terminated by ' ' > location '/home/yinzhengjie/data/students.txt'; #创建表,并指定在hdfs上的加载数据路径 OK Time taken: 0.017 seconds hive (yinzhengjie)> hive (yinzhengjie)> dfs -put /home/yinzhengjie/download/students.txt /home/yinzhengjie/data/students.txt; #上传数据到hdfs上 hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/students.txt; #查看上传到hdfs上的数据,这个数据会被Student表自动加载。 1 sunwukong 2 zhubajie 3 shaheshang 4 bailongma 5 tangsanzang hive (yinzhengjie)> hive (yinzhengjie)> select * from Student; #我们会发现Student表会自动加载数据,神奇不? OK student.id student.name 1 sunwukong 2 zhubajie 3 shaheshang 4 bailongma 5 tangsanzang Time taken: 0.054 seconds, Fetched: 5 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> import table xiyoujihouzhuan partition(position='zhenjiameihouwang') from '/home/yinzhengjie/data/xiyouji2'; #从hdfs中导入指定的分区到指定的表中 Copying data from hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=zhenjiameihouwang Copying file: hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=zhenjiameihouwang/000000_0 Loading data to table yinzhengjie.xiyoujihouzhuan partition (position=zhenjiameihouwang) OK Time taken: 3.966 seconds hive (yinzhengjie)> select * from xiyoujihouzhuan; #查看是否导入成功 OK xiyoujihouzhuan.id xiyoujihouzhuan.name xiyoujihouzhuan.position 1 孙悟空 zhenjiameihouwang Time taken: 0.293 seconds, Fetched: 1 row(s) hive (yinzhengjie)> import table xiyoujihouzhuan partition(position='nverguo') from '/home/yinzhengjie/data/xiyouji2'; Copying data from hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=nverguo Copying file: hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=nverguo/000000_0 Loading data to table yinzhengjie.xiyoujihouzhuan partition (position=nverguo) OK Time taken: 0.751 seconds hive (yinzhengjie)> import table xiyoujihouzhuan partition(position='wuzhishan') from '/home/yinzhengjie/data/xiyouji2'; Copying data from hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=wuzhishan Copying file: hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=wuzhishan/000000_0 Loading data to table yinzhengjie.xiyoujihouzhuan partition (position=wuzhishan) OK Time taken: 1.363 seconds hive (yinzhengjie)> select * from xiyoujihouzhuan; OK xiyoujihouzhuan.id xiyoujihouzhuan.name xiyoujihouzhuan.position 1 孙悟空 nverguo 1 孙悟空 wuzhishan 1 孙悟空 zhenjiameihouwang Time taken: 0.488 seconds, Fetched: 3 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/xiyouji' select * from xiyouji; #将查询的结果导出到本地路径,注意这里导出的是一个目录哟 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809190854_cc079ee4-1d8b-43a0-b360-89ff65fb39fb Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1533789743141_0011, Tracking URL = http://s101:8088/proxy/application_1533789743141_0011/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0011 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2018-08-09 19:09:25,742 Stage-1 map = 0%, reduce = 0% 2018-08-09 19:10:05,644 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.96 sec 2018-08-09 19:10:07,764 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.36 sec MapReduce Total cumulative CPU time: 2 seconds 360 msec Ended Job = job_1533789743141_0011 Moving data to local directory /home/yinzhengjie/download/xiyouji MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.36 sec HDFS Read: 5554 HDFS Write: 99 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 360 msec OK xiyouji.id xiyouji.name xiyouji.position Time taken: 77.687 seconds hive (yinzhengjie)> ! cat /home/yinzhengjie/download/xiyouji/000000_0; #查看导出到本地的文本信息 1孙悟空nverguo 1孙悟空sandabaigujing 1孙悟空wuzhishan 1孙悟空zhenjiameihouwang hive (yinzhengjie)>
hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/xiyouji2' > ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' > select * from xiyouji; #我们指定以" "进行风格字段 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809191439_7461de80-7522-4e07-82ac-fd54b85a0891 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1533789743141_0012, Tracking URL = http://s101:8088/proxy/application_1533789743141_0012/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0012 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2018-08-09 19:15:50,162 Stage-1 map = 0%, reduce = 0% 2018-08-09 19:16:14,236 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.31 sec MapReduce Total cumulative CPU time: 2 seconds 310 msec Ended Job = job_1533789743141_0012 Moving data to local directory /home/yinzhengjie/download/xiyouji2 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.31 sec HDFS Read: 5575 HDFS Write: 99 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 310 msec OK xiyouji.id xiyouji.name xiyouji.position Time taken: 100.57 seconds hive (yinzhengjie)> ! cat /home/yinzhengjie/download/xiyouji2/000000_0; #查看导出的数据内容 1 孙悟空 nverguo 1 孙悟空 sandabaigujing 1 孙悟空 wuzhishan 1 孙悟空 zhenjiameihouwang hive (yinzhengjie)>
hive (yinzhengjie)> insert overwrite directory '/home/yinzhengjie/data/xiyouji' > ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' > select * from xiyouji; #将查询的结果导出到HDFS上 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809192105_183285e8-bf4e-4044-93c5-4312a8a31716 Total jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1533789743141_0013, Tracking URL = http://s101:8088/proxy/application_1533789743141_0013/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0013 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2018-08-09 19:21:39,136 Stage-1 map = 0%, reduce = 0% 2018-08-09 19:22:30,081 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.38 sec MapReduce Total cumulative CPU time: 2 seconds 380 msec Ended Job = job_1533789743141_0013 Stage-3 is selected by condition resolver. Stage-2 is filtered out by condition resolver. Stage-4 is filtered out by condition resolver. Moving data to directory hdfs://mycluster/home/yinzhengjie/data/xiyouji/.hive-staging_hive_2018-08-09_19-21-05_012_3955068750863516339-1/-ext-10000 Moving data to directory /home/yinzhengjie/data/xiyouji MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Cumulative CPU: 2.38 sec HDFS Read: 5455 HDFS Write: 99 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 380 msec OK xiyouji.id xiyouji.name xiyouji.position Time taken: 88.306 seconds hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/xiyouji/000000_0; #查询导出在hdfs上的数据 1 孙悟空 nverguo 1 孙悟空 sandabaigujing 1 孙悟空 wuzhishan 1 孙悟空 zhenjiameihouwang hive (yinzhengjie)>
hive (yinzhengjie)> dfs -get /home/yinzhengjie/data/xiyouji/000000_0 /home/yinzhengjie/download/xiyouji3; #通过Hadoop命令将数据导出到本地 hive (yinzhengjie)> ! cat /home/yinzhengjie/download/xiyouji3; #查看导出到Linux的文本信息 1 孙悟空 nverguo 1 孙悟空 sandabaigujing 1 孙悟空 wuzhishan 1 孙悟空 zhenjiameihouwang hive (yinzhengjie)>
hive (yinzhengjie)> hive (yinzhengjie)> export table yinzhengjie.xiyouji to '/home/yinzhengjie/data/xiyouji2'; #通过Export将数据导出到HDFS上 Copying data from file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_19-30-58_906_1594217512913959561-1/-local-10000/_metadata Copying file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_19-30-58_906_1594217512913959561-1/-local-10000/_metadata Copying data from hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=??? Copying file: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=五指山/000000_0 Copying data from hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=nverguo Copying file: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=nverguo/000000_0 Copying data from hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=sandabaigujing Copying file: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=sandabaigujing/000000_0 Copying data from hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=wuzhishan Copying file: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=wuzhishan/000000_0 Copying data from hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=zhenjiameihouwang Copying file: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=zhenjiameihouwang/000000_0 OK Time taken: 0.978 seconds hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/xiyouji2/position=wuzhishan/000000_0; 1 孙悟空 hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/xiyouji2/position=nverguo/000000_0; 1 孙悟空 hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/xiyouji2/position=sandabaigujing/000000_0; 1 孙悟空 hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/xiyouji2/position=zhenjiameihouwang/000000_0; 1 孙悟空 hive (yinzhengjie)>
[yinzhengjie@s101 ~]$ hive -e 'select * from yinzhengjie.xiyouji;' > /home/yinzhengjie/download/xiyouji6 #通过命令行访问hive,并将数据重定向到本地的一个文件中。 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. Logging initialized using configuration in file:/soft/apache-hive-2.1.1-bin/conf/hive-log4j2.properties Async: true OK Time taken: 20.367 seconds, Fetched: 4 row(s) [yinzhengjie@s101 ~]$ [yinzhengjie@s101 ~]$ cat /home/yinzhengjie/download/xiyouji6 #查看查询的结果 xiyouji.id xiyouji.name xiyouji.position 1 孙悟空 nverguo 1 孙悟空 sandabaigujing 1 孙悟空 wuzhishan 1 孙悟空 zhenjiameihouwang [yinzhengjie@s101 ~]$
4>.查询
关于HQL的查询(select)语法,官网已经进行了详细说明,我这里就不搬运了,详情请参考:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select。
hive (yinzhengjie)> select * from teacher; #全表查询 OK teacher.id teacher.name 70 Dennis MacAlistair Ritchie 49 Linus Benedict Torvalds 68 Bjarne Stroustrup 62 Guido van Rossum 63 James Gosling 60 Martin Odersky 62 Rob Pike 50 Rasmus Lerdorf 50 Brendan Eich Time taken: 0.108 seconds, Fetched: 9 row(s) hive (yinzhengjie)> hive (yinzhengjie)> select name from teacher; #选择特定列查询 OK name Dennis MacAlistair Ritchie Linus Benedict Torvalds Bjarne Stroustrup Guido van Rossum James Gosling Martin Odersky Rob Pike Rasmus Lerdorf Brendan Eich Time taken: 0.1 seconds, Fetched: 9 row(s) hive (yinzhengjie)> 温馨提示: 1>.SQL 语言大小写不敏感。 2>.SQL 可以写在一行或者多行 3>.关键字不能被缩写也不能分行 4>.各子句一般要分行写。 5>.使用缩进提高语句的可读性。
hive (yinzhengjie)> select id AS tid, name AS Tname from teacher; OK tid tname 70 Dennis MacAlistair Ritchie 49 Linus Benedict Torvalds 68 Bjarne Stroustrup 62 Guido van Rossum 63 James Gosling 60 Martin Odersky 62 Rob Pike 50 Rasmus Lerdorf 50 Brendan Eich Time taken: 0.088 seconds, Fetched: 9 row(s) hive (yinzhengjie)> 温馨提示: 1>.重命名一个列。 2>.便于计算。 3>.紧跟列名,也可以在列名和别名之间加入关键字‘AS’
hive (yinzhengjie)> select id AS age, name AS Tname from teacher; OK age tname 70 Dennis MacAlistair Ritchie 49 Linus Benedict Torvalds 68 Bjarne Stroustrup 62 Guido van Rossum 63 James Gosling 60 Martin Odersky 62 Rob Pike 50 Rasmus Lerdorf 50 Brendan Eich Time taken: 0.157 seconds, Fetched: 9 row(s) hive (yinzhengjie)> select id+20 AS age, name AS Tname from teacher; OK age tname 90 Dennis MacAlistair Ritchie 69 Linus Benedict Torvalds 88 Bjarne Stroustrup 82 Guido van Rossum 83 James Gosling 80 Martin Odersky 82 Rob Pike 70 Rasmus Lerdorf 70 Brendan Eich Time taken: 0.091 seconds, Fetched: 9 row(s) hive (yinzhengjie)> 算术运算符 描述 A+B A和B 相加 A-B A减去B A*B A和B 相乘 A/B A除以B A%B A对B取余 A&B A和B按位取与 A|B A和B按位取或 A^B A和B按位取异或 ~A A按位取反
hive (yinzhengjie)> select count(*)cnt from teacher; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809202019_6a4b05d8-8807-410b-af4e-3c1839e0bdc6 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0014, Tracking URL = http://s101:8088/proxy/application_1533789743141_0014/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0014 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-08-09 20:21:06,776 Stage-1 map = 0%, reduce = 0% 2018-08-09 20:21:35,994 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.61 sec 2018-08-09 20:22:19,562 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.51 sec MapReduce Total cumulative CPU time: 5 seconds 510 msec Ended Job = job_1533789743141_0014 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.51 sec HDFS Read: 7766 HDFS Write: 101 SUCCESS Total MapReduce CPU Time Spent: 5 seconds 510 msec OK cnt 9 Time taken: 123.864 seconds, Fetched: 1 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select max(id) max_age from teacher; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809202410_0146f895-4c54-440f-aa1b-bee4fb566b91 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0015, Tracking URL = http://s101:8088/proxy/application_1533789743141_0015/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0015 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-08-09 20:24:47,751 Stage-1 map = 0%, reduce = 0% 2018-08-09 20:25:09,196 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.46 sec 2018-08-09 20:25:22,584 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.08 sec MapReduce Total cumulative CPU time: 5 seconds 80 msec Ended Job = job_1533789743141_0015 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.08 sec HDFS Read: 7950 HDFS Write: 102 SUCCESS Total MapReduce CPU Time Spent: 5 seconds 80 msec OK max_age 70 Time taken: 74.014 seconds, Fetched: 1 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select min(id) min_age from teacher; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809202623_b1b99783-b7d3-4994-901e-4e901795a128 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0016, Tracking URL = http://s101:8088/proxy/application_1533789743141_0016/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0016 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-08-09 20:26:41,646 Stage-1 map = 0%, reduce = 0% 2018-08-09 20:27:10,432 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.34 sec 2018-08-09 20:27:38,200 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 3.77 sec 2018-08-09 20:27:40,261 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.42 sec MapReduce Total cumulative CPU time: 4 seconds 420 msec Ended Job = job_1533789743141_0016 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.42 sec HDFS Read: 7956 HDFS Write: 102 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 420 msec OK min_age 49 Time taken: 79.135 seconds, Fetched: 1 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select sum(id) sum_age from teacher; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809202800_14580ea4-3e65-461e-a1c6-6607e960c3d7 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0017, Tracking URL = http://s101:8088/proxy/application_1533789743141_0017/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0017 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-08-09 20:28:16,698 Stage-1 map = 0%, reduce = 0% 2018-08-09 20:28:29,168 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.27 sec 2018-08-09 20:28:42,627 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.58 sec MapReduce Total cumulative CPU time: 4 seconds 580 msec Ended Job = job_1533789743141_0017 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.58 sec HDFS Read: 7948 HDFS Write: 103 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 580 msec OK sum_age 534 Time taken: 43.081 seconds, Fetched: 1 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select avg(id) avg_age from teacher; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809202900_618a9c9f-535a-45ac-94de-16723f47d9b9 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0018, Tracking URL = http://s101:8088/proxy/application_1533789743141_0018/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0018 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-08-09 20:29:18,939 Stage-1 map = 0%, reduce = 0% 2018-08-09 20:29:38,527 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.19 sec 2018-08-09 20:29:58,143 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.25 sec MapReduce Total cumulative CPU time: 5 seconds 250 msec Ended Job = job_1533789743141_0018 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.25 sec HDFS Read: 8551 HDFS Write: 118 SUCCESS Total MapReduce CPU Time Spent: 5 seconds 250 msec OK avg_age 59.333333333333336 Time taken: 59.897 seconds, Fetched: 1 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select id AS age , name from teacher; OK age name 70 Dennis MacAlistair Ritchie 49 Linus Benedict Torvalds 68 Bjarne Stroustrup 62 Guido van Rossum 63 James Gosling 60 Martin Odersky 62 Rob Pike 50 Rasmus Lerdorf 50 Brendan Eich Time taken: 0.068 seconds, Fetched: 9 row(s) hive (yinzhengjie)> select id AS age , name from teacher limit 3; #典型的查询会返回多行数据。LIMIT子句用于限制返回的行数。 OK age name 70 Dennis MacAlistair Ritchie 49 Linus Benedict Torvalds 68 Bjarne Stroustrup Time taken: 0.1 seconds, Fetched: 3 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select id, name from teacher where id> 60; #使用WHERE子句,将不满足条件的行过滤掉。WHERE子句紧随FROM子句。 OK id name 70 Dennis MacAlistair Ritchie 68 Bjarne Stroustrup 62 Guido van Rossum 63 James Gosling 62 Rob Pike Time taken: 0.056 seconds, Fetched: 5 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select * from teacher where id = 60; #查询出id等于60的老师 OK teacher.id teacher.name 60 Martin Odersky Time taken: 0.075 seconds, Fetched: 1 row(s) hive (yinzhengjie)> hive (yinzhengjie)> select * from teacher where id between 40 and 60; #查询id在40到60的老师 OK teacher.id teacher.name 49 Linus Benedict Torvalds 60 Martin Odersky 50 Rasmus Lerdorf 50 Brendan Eich Time taken: 0.05 seconds, Fetched: 4 row(s) hive (yinzhengjie)> hive (yinzhengjie)> select * from teacher where name is null; #查询name字段为空的所有老师信息,很显然我没有这样的数据 OK teacher.id teacher.name Time taken: 0.104 seconds hive (yinzhengjie)> hive (yinzhengjie)> select * from teacher where id IN(50,60); #查询id是50和60的老师信息 OK teacher.id teacher.name 60 Martin Odersky 50 Rasmus Lerdorf 50 Brendan Eich Time taken: 0.07 seconds, Fetched: 3 row(s) hive (yinzhengjie)> 下面表中描述了谓词操作符,这些操作符同样可以用于JOIN…ON和HAVING语句中。 操作符 支持的数据类型 描述 A=B 基本数据类型 如果A等于B则返回TRUE,反之返回FALSE A<=>B 基本数据类型 如果A和B都为NULL,则返回TRUE,其他的和等号(=)操作符的结果一致,如果任一为NULL则结果为NULL A<>B, A!=B 基本数据类型 A或者B为NULL则返回NULL;如果A不等于B,则返回TRUE,反之返回FALSE A<B 基本数据类型 A或者B为NULL,则返回NULL;如果A小于B,则返回TRUE,反之返回FALSE A<=B 基本数据类型 A或者B为NULL,则返回NULL;如果A小于等于B,则返回TRUE,反之返回FALSE A>B 基本数据类型 A或者B为NULL,则返回NULL;如果A大于B,则返回TRUE,反之返回FALSE A>=B 基本数据类型 A或者B为NULL,则返回NULL;如果A大于等于B,则返回TRUE,反之返回FALSE A [NOT] BETWEEN B AND C 基本数据类型 如果A,B或者C任一为NULL,则结果为NULL。如果A的值大于等于B而且小于或等于C,则结果为TRUE,反之为FALSE。如果使用NOT关键字则可达到相反的效果。 A IS NULL 所有数据类型 如果A等于NULL,则返回TRUE,反之返回FALSE A IS NOT NULL 所有数据类型 如果A不等于NULL,则返回TRUE,反之返回FALSE IN(数值1, 数值2) 所有数据类型 使用 IN运算显示列表中的值 A [NOT] LIKE B STRING类型 B是一个SQL下的简单正则表达式,如果A与其匹配的话,则返回TRUE;反之返回FALSE。B的表达式说明如下:‘x%’表示A必须以字母‘x’开头,‘%x’表示A必须以字母’x’结尾,而‘%x%’表示A包含有字母’x’,可以位于开头,结尾或者字符串中间。如果使用NOT关键字则可达到相反的效果。 A RLIKE B, A REGEXP B STRING类型 B是一个正则表达式,如果A与其匹配,则返回TRUE;反之返回FALSE。匹配使用的是JDK中的正则表达式接口实现的,因为正则也依据其中的规则。例如,正则表达式必须和整个字符串A相匹配,而不是只需与其字符串匹配。
1>.使用LIKE运算选择类似的值 2>.选择条件可以包含字符或数字: % :代表零个或多个字符(任意个字符)。 _ :代表一个字符。 3>.RLIKE子句是Hive中这个功能的一个扩展,其可以通过Java的正则表达式这个更强大的语言来指定匹配条件。 hive (yinzhengjie)> select * from teacher where id LIKE '5%'; #查找以5开头id的老师信息 OK teacher.id teacher.name 50 Rasmus Lerdorf 50 Brendan Eich Time taken: 0.126 seconds, Fetched: 2 row(s) hive (yinzhengjie)> hive (yinzhengjie)> select * from teacher where id LIKE '_2%'; #查找第二个数值为2的id的老师信息 OK teacher.id teacher.name 62 Guido van Rossum 62 Rob Pike Time taken: 0.065 seconds, Fetched: 2 row(s) hive (yinzhengjie)> hive (yinzhengjie)> select * from teacher where name RLIKE '[P]'; #查找name字段中含有“P”字母的老师信息 OK teacher.id teacher.name 62 Rob Pike Time taken: 0.049 seconds, Fetched: 1 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select * from teacher where id NOT IN(50,70,49,68,62); OK teacher.id teacher.name 63 James Gosling 60 Martin Odersky Time taken: 0.076 seconds, Fetched: 2 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select * from dept_partition; OK dept_partition.deptno dept_partition.dname dept_partition.loc dept_partition.month 10 开发部门 20000 201805 20 运维部门 13000 201805 30 测试部门 8000 201805 40 产品部门 6000 201805 50 销售部门 15000 201805 60 财务部门 17000 201805 70 人事部门 16000 201805 10 开发部门 25000 201805 10 开发部门 10000 201805 20 运维部门 13000 201805 30 测试部门 7000 201805 40 产品部门 9000 201805 50 销售部门 26000 201805 60 财务部门 11000 201805 70 人事部门 16000 201805 20 运维部门 21000 201805 30 测试部门 8000 201805 40 产品部门 9800 201805 50 销售部门 15000 201805 60 财务部门 17000 201805 70 人事部门 8700 201805 Time taken: 0.059 seconds, Fetched: 21 row(s) hive (yinzhengjie)> hive (yinzhengjie)> select t.deptno, avg(t.loc) avg_sal from dept_partition t group by t.deptno; #计算dept_partition表每个部门的平均工资 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809212224_fcbdaa54-b167-4a43-8a08-c0a984c25a0d Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0021, Tracking URL = http://s101:8088/proxy/application_1533789743141_0021/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0021 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-08-09 21:22:51,029 Stage-1 map = 0%, reduce = 0% 2018-08-09 21:23:15,924 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.62 sec 2018-08-09 21:23:31,362 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.14 sec MapReduce Total cumulative CPU time: 5 seconds 140 msec Ended Job = job_1533789743141_0021 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.14 sec HDFS Read: 9719 HDFS Write: 312 SUCCESS Total MapReduce CPU Time Spent: 5 seconds 140 msec OK t.deptno avg_sal 10 18333.333333333332 20 15666.666666666666 30 7666.666666666667 40 8266.666666666666 50 18666.666666666668 60 15000.0 70 13566.666666666666 Time taken: 68.573 seconds, Fetched: 7 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select * from dept_partition; OK dept_partition.deptno dept_partition.dname dept_partition.loc dept_partition.month 10 开发部门 20000 201805 20 运维部门 13000 201805 30 测试部门 8000 201805 40 产品部门 6000 201805 50 销售部门 15000 201805 60 财务部门 17000 201805 70 人事部门 16000 201805 10 开发部门 25000 201805 10 开发部门 10000 201805 20 运维部门 13000 201805 30 测试部门 7000 201805 40 产品部门 9000 201805 50 销售部门 26000 201805 60 财务部门 11000 201805 70 人事部门 16000 201805 20 运维部门 21000 201805 30 测试部门 8000 201805 40 产品部门 9800 201805 50 销售部门 15000 201805 60 财务部门 17000 201805 70 人事部门 8700 201805 Time taken: 0.072 seconds, Fetched: 21 row(s) hive (yinzhengjie)> hive (yinzhengjie)> select t.deptno, t.dname,max(t.loc) max_sal from dept_partition t group by t.deptno,t.dname; #计算dept_partition每个部门中每个岗位的最高薪水 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809213154_e1ea82c8-897d-40b5-b167-5fe42d0e6476 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0023, Tracking URL = http://s101:8088/proxy/application_1533789743141_0023/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0023 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-08-09 21:32:11,358 Stage-1 map = 0%, reduce = 0% 2018-08-09 21:32:21,651 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.85 sec 2018-08-09 21:32:29,958 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.61 sec MapReduce Total cumulative CPU time: 3 seconds 610 msec Ended Job = job_1533789743141_0023 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.61 sec HDFS Read: 9537 HDFS Write: 406 SUCCESS Total MapReduce CPU Time Spent: 3 seconds 610 msec OK t.deptno t.dname max_sal 10 开发部门 25000 20 运维部门 21000 30 测试部门 8000 40 产品部门 9800 50 销售部门 26000 60 财务部门 17000 70 人事部门 8700 Time taken: 37.781 seconds, Fetched: 7 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select deptno,dname,avg(loc) AS avg_sal from dept_partition group by dname,deptno; #求每个部门的平均工资 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809213945_f7a1a9c2-8c19-4096-9c1a-37faa29fee44 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0024, Tracking URL = http://s101:8088/proxy/application_1533789743141_0024/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0024 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-08-09 21:40:17,366 Stage-1 map = 0%, reduce = 0% 2018-08-09 21:40:33,044 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.2 sec 2018-08-09 21:40:46,435 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.69 sec MapReduce Total cumulative CPU time: 4 seconds 690 msec Ended Job = job_1533789743141_0024 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.69 sec HDFS Read: 10452 HDFS Write: 487 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 690 msec OK deptno dname avg_sal 10 开发部门 18333.333333333332 20 运维部门 15666.666666666666 30 测试部门 7666.666666666667 40 产品部门 8266.666666666666 50 销售部门 18666.666666666668 60 财务部门 15000.0 70 人事部门 13566.666666666666 Time taken: 63.433 seconds, Fetched: 7 row(s) hive (yinzhengjie)> hive (yinzhengjie)> select deptno,dname,avg(loc) AS avg_sal from dept_partition group by dname, deptno having avg_sal > 10000; #求每个部门的平均薪水大于10000的部门 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809214521_d980d9db-3473-4fd4-a062-ec9de0cafca2 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0026, Tracking URL = http://s101:8088/proxy/application_1533789743141_0026/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0026 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-08-09 21:45:37,001 Stage-1 map = 0%, reduce = 0% 2018-08-09 21:45:50,841 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.45 sec 2018-08-09 21:46:03,332 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.45 sec MapReduce Total cumulative CPU time: 4 seconds 450 msec Ended Job = job_1533789743141_0026 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.45 sec HDFS Read: 10711 HDFS Write: 371 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 450 msec OK deptno dname avg_sal 70 人事部门 13566.666666666666 10 开发部门 18333.333333333332 60 财务部门 15000.0 20 运维部门 15666.666666666666 50 销售部门 18666.666666666668 Time taken: 43.701 seconds, Fetched: 5 row(s) hive (yinzhengjie)>
Join语句-等值Join(hive (yinzhengjie)> select e.empno, e.ename, d.deptno, d.dname from emp e join dept d on e.deptno = d.deptno;) Hive支持通常的SQL JOIN语句,但是只支持等值连接,不支持非等值连接。 测试数据如下: [yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/dept.txt 10 ACCOUNTING 1700 20 RESEARCH 1800 30 SALES 1900 40 OPERATIONS 1700 [yinzhengjie@s101 download]$ [yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/emp.txt 7369 SMITH CLERK 7902 1980-12-17 800.00 20 7499 ALLEN SALESMAN 7698 1981-2-20 1600.00 300.00 30 7521 WARD SALESMAN 7698 1981-2-22 1250.00 500.00 30 7566 JONES MANAGER 7839 1981-4-2 2975.00 20 7654 MARTIN SALESMAN 7698 1981-9-28 1250.00 1400.00 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.00 30 7782 CLARK MANAGER 7839 1981-6-9 2450.00 10 7788 SCOTT ANALYST 7566 1987-4-19 3000.00 20 7839 KING PRESIDENT 1981-11-17 5000.00 10 7844 TURNER SALESMAN 7698 1981-9-8 1500.00 0.00 30 7876 ADAMS CLERK 7788 1987-5-23 1100.00 20 7900 JAMES CLERK 7698 1981-12-3 950.00 30 7902 FORD ANALYST 7566 1981-12-3 3000.00 20 7934 MILLER CLERK 7782 1982-1-23 1300.00 10 [yinzhengjie@s101 download]$ hive查询操作如下: hive (yinzhengjie)> create table if not exists yinzhengjie.dept( > deptno int, > dname string, > loc int > ) > row format delimited fields terminated by ' '; #创建部门表dept OK Time taken: 0.204 seconds hive (yinzhengjie)> create table if not exists yinzhengjie.emp( > empno int, > ename string, > job string, > mgr int, > hiredate string, > sal double, > comm double, > deptno int >) > row format delimited fields terminated by ' '; #创建员工表emp OK Time taken: 0.088 seconds hive (yinzhengjie)> hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.txt' into table yinzhengjie.dept; #向dept中导入数据 Loading data to table yinzhengjie.dept OK Time taken: 0.222 seconds hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/emp.txt' into table yinzhengjie.emp; #向emp中导入数据 Loading data to table yinzhengjie.emp OK Time taken: 0.175 seconds hive (yinzhengjie)> hive (yinzhengjie)> select e.empno, e.ename, d.deptno, d.dname from emp e join dept d on e.deptno = d.deptno; #根据员工表和部门表中的部门编号相等,查询员工编号、员工名称和部门编号; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809233409_a9437af4-b312-4dfb-86af-f29bcf679577 Total jobs = 1 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2018-08-09 23:34:25 Starting to launch local task to process map join; maximum memory = 477626368 2018-08-09 23:34:34 Dump the side-table for tag: 1 with group count: 4 into file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-34-09_040_8075868526571286750-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile11--.hashtable 2018-08-09 23:34:34 Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-34-09_040_8075868526571286750-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (430 bytes) 2018-08-09 23:34:34 End of local task; Time Taken: 9.163 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1533789743141_0028, Tracking URL = http://s101:8088/proxy/application_1533789743141_0028/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0028 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 2018-08-09 23:35:21,748 Stage-3 map = 0%, reduce = 0% 2018-08-09 23:35:45,815 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 2.71 sec MapReduce Total cumulative CPU time: 2 seconds 710 msec Ended Job = job_1533789743141_0028 MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 2.71 sec HDFS Read: 8390 HDFS Write: 1999 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 710 msec OK e.empno e.ename d.deptno d.dname 7369 SMITH 20 RESEARCH 7369 SMITH 20 RESEARCH 7499 ALLEN 30 SALES 7499 ALLEN 30 SALES 7521 WARD 30 SALES 7521 WARD 30 SALES 7566 JONES 20 RESEARCH 7566 JONES 20 RESEARCH 7654 MARTIN 30 SALES 7654 MARTIN 30 SALES 7698 BLAKE 30 SALES 7698 BLAKE 30 SALES 7782 CLARK 10 ACCOUNTING 7782 CLARK 10 ACCOUNTING 7788 SCOTT 20 RESEARCH 7788 SCOTT 20 RESEARCH 7839 KING 10 ACCOUNTING 7839 KING 10 ACCOUNTING 7844 TURNER 30 SALES 7844 TURNER 30 SALES 7876 ADAMS 20 RESEARCH 7876 ADAMS 20 RESEARCH 7900 JAMES 30 SALES 7900 JAMES 30 SALES 7902 FORD 20 RESEARCH 7902 FORD 20 RESEARCH 7934 MILLER 10 ACCOUNTING 7934 MILLER 10 ACCOUNTING 7369 SMITH 20 RESEARCH 7369 SMITH 20 RESEARCH 7499 ALLEN 30 SALES 7499 ALLEN 30 SALES 7521 WARD 30 SALES 7521 WARD 30 SALES 7566 JONES 20 RESEARCH 7566 JONES 20 RESEARCH 7654 MARTIN 30 SALES 7654 MARTIN 30 SALES 7698 BLAKE 30 SALES 7698 BLAKE 30 SALES 7782 CLARK 10 ACCOUNTING 7782 CLARK 10 ACCOUNTING 7788 SCOTT 20 RESEARCH 7788 SCOTT 20 RESEARCH 7839 KING 10 ACCOUNTING 7839 KING 10 ACCOUNTING 7844 TURNER 30 SALES 7844 TURNER 30 SALES 7876 ADAMS 20 RESEARCH 7876 ADAMS 20 RESEARCH 7900 JAMES 30 SALES 7900 JAMES 30 SALES 7902 FORD 20 RESEARCH 7902 FORD 20 RESEARCH 7934 MILLER 10 ACCOUNTING 7934 MILLER 10 ACCOUNTING Time taken: 98.923 seconds, Fetched: 56 row(s) hive (yinzhengjie)>
Join语句-表的别名(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e join dept d on e.deptno = d.deptno;) 表的别名有以下两个好处: 1>.使用别名可以简化查询。 2>.使用表名前缀可以提高执行效率。 hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e join dept d on e.deptno = d.deptno; #合并员工表和部门表 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809233120_cdd0ba5f-33b4-41f6-8f49-4a51e3c104ec Total jobs = 1 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2018-08-09 23:31:39 Starting to launch local task to process map join; maximum memory = 477626368 2018-08-09 23:31:55 Dump the side-table for tag: 1 with group count: 4 into file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-31-20_931_5011927912909131499-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile01--.hashtable 2018-08-09 23:31:55 Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-31-20_931_5011927912909131499-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (348 bytes) 2018-08-09 23:31:55 End of local task; Time Taken: 16.147 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1533789743141_0027, Tracking URL = http://s101:8088/proxy/application_1533789743141_0027/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0027 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 2018-08-09 23:32:55,103 Stage-3 map = 0%, reduce = 0% 2018-08-09 23:33:11,944 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 1.82 sec MapReduce Total cumulative CPU time: 1 seconds 820 msec Ended Job = job_1533789743141_0027 MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 1.82 sec HDFS Read: 8221 HDFS Write: 1543 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 820 msec OK e.empno e.ename d.deptno 7369 SMITH 20 7369 SMITH 20 7499 ALLEN 30 7499 ALLEN 30 7521 WARD 30 7521 WARD 30 7566 JONES 20 7566 JONES 20 7654 MARTIN 30 7654 MARTIN 30 7698 BLAKE 30 7698 BLAKE 30 7782 CLARK 10 7782 CLARK 10 7788 SCOTT 20 7788 SCOTT 20 7839 KING 10 7839 KING 10 7844 TURNER 30 7844 TURNER 30 7876 ADAMS 20 7876 ADAMS 20 7900 JAMES 30 7900 JAMES 30 7902 FORD 20 7902 FORD 20 7934 MILLER 10 7934 MILLER 10 7369 SMITH 20 7369 SMITH 20 7499 ALLEN 30 7499 ALLEN 30 7521 WARD 30 7521 WARD 30 7566 JONES 20 7566 JONES 20 7654 MARTIN 30 7654 MARTIN 30 7698 BLAKE 30 7698 BLAKE 30 7782 CLARK 10 7782 CLARK 10 7788 SCOTT 20 7788 SCOTT 20 7839 KING 10 7839 KING 10 7844 TURNER 30 7844 TURNER 30 7876 ADAMS 20 7876 ADAMS 20 7900 JAMES 30 7900 JAMES 30 7902 FORD 20 7902 FORD 20 7934 MILLER 10 7934 MILLER 10 Time taken: 113.095 seconds, Fetched: 56 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e join dept d on e.deptno = d.deptno; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809234054_a83fd2f0-136f-4769-880a-0a928ecb86f0 Total jobs = 1 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2018-08-09 23:41:10 Starting to launch local task to process map join; maximum memory = 477626368 2018-08-09 23:41:15 Dump the side-table for tag: 1 with group count: 4 into file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-40-54_618_7309603760212569588-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable 2018-08-09 23:41:16 Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-40-54_618_7309603760212569588-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable (348 bytes) 2018-08-09 23:41:16 End of local task; Time Taken: 5.741 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1533789743141_0029, Tracking URL = http://s101:8088/proxy/application_1533789743141_0029/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0029 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 2018-08-09 23:41:32,299 Stage-3 map = 0%, reduce = 0% 2018-08-09 23:41:46,692 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 2.69 sec MapReduce Total cumulative CPU time: 2 seconds 690 msec Ended Job = job_1533789743141_0029 MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 2.69 sec HDFS Read: 8208 HDFS Write: 1543 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 690 msec OK e.empno e.ename d.deptno 7369 SMITH 20 7369 SMITH 20 7499 ALLEN 30 7499 ALLEN 30 7521 WARD 30 7521 WARD 30 7566 JONES 20 7566 JONES 20 7654 MARTIN 30 7654 MARTIN 30 7698 BLAKE 30 7698 BLAKE 30 7782 CLARK 10 7782 CLARK 10 7788 SCOTT 20 7788 SCOTT 20 7839 KING 10 7839 KING 10 7844 TURNER 30 7844 TURNER 30 7876 ADAMS 20 7876 ADAMS 20 7900 JAMES 30 7900 JAMES 30 7902 FORD 20 7902 FORD 20 7934 MILLER 10 7934 MILLER 10 7369 SMITH 20 7369 SMITH 20 7499 ALLEN 30 7499 ALLEN 30 7521 WARD 30 7521 WARD 30 7566 JONES 20 7566 JONES 20 7654 MARTIN 30 7654 MARTIN 30 7698 BLAKE 30 7698 BLAKE 30 7782 CLARK 10 7782 CLARK 10 7788 SCOTT 20 7788 SCOTT 20 7839 KING 10 7839 KING 10 7844 TURNER 30 7844 TURNER 30 7876 ADAMS 20 7876 ADAMS 20 7900 JAMES 30 7900 JAMES 30 7902 FORD 20 7902 FORD 20 7934 MILLER 10 7934 MILLER 10 Time taken: 53.142 seconds, Fetched: 56 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e left join dept d on e.deptno = d.deptno; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809234222_5966f5f0-b54a-4644-ae82-fd47e8655582 Total jobs = 1 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2018-08-09 23:42:39 Starting to launch local task to process map join; maximum memory = 477626368 2018-08-09 23:42:43 Dump the side-table for tag: 1 with group count: 4 into file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-42-22_712_6649379300342030940-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile31--.hashtable 2018-08-09 23:42:44 Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-42-22_712_6649379300342030940-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile31--.hashtable (348 bytes) 2018-08-09 23:42:44 End of local task; Time Taken: 4.518 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1533789743141_0030, Tracking URL = http://s101:8088/proxy/application_1533789743141_0030/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0030 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 2018-08-09 23:43:07,580 Stage-3 map = 0%, reduce = 0% 2018-08-09 23:43:18,075 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 2.03 sec MapReduce Total cumulative CPU time: 2 seconds 30 msec Ended Job = job_1533789743141_0030 MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 2.03 sec HDFS Read: 7874 HDFS Write: 1543 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 30 msec OK e.empno e.ename d.deptno 7369 SMITH 20 7369 SMITH 20 7499 ALLEN 30 7499 ALLEN 30 7521 WARD 30 7521 WARD 30 7566 JONES 20 7566 JONES 20 7654 MARTIN 30 7654 MARTIN 30 7698 BLAKE 30 7698 BLAKE 30 7782 CLARK 10 7782 CLARK 10 7788 SCOTT 20 7788 SCOTT 20 7839 KING 10 7839 KING 10 7844 TURNER 30 7844 TURNER 30 7876 ADAMS 20 7876 ADAMS 20 7900 JAMES 30 7900 JAMES 30 7902 FORD 20 7902 FORD 20 7934 MILLER 10 7934 MILLER 10 7369 SMITH 20 7369 SMITH 20 7499 ALLEN 30 7499 ALLEN 30 7521 WARD 30 7521 WARD 30 7566 JONES 20 7566 JONES 20 7654 MARTIN 30 7654 MARTIN 30 7698 BLAKE 30 7698 BLAKE 30 7782 CLARK 10 7782 CLARK 10 7788 SCOTT 20 7788 SCOTT 20 7839 KING 10 7839 KING 10 7844 TURNER 30 7844 TURNER 30 7876 ADAMS 20 7876 ADAMS 20 7900 JAMES 30 7900 JAMES 30 7902 FORD 20 7902 FORD 20 7934 MILLER 10 7934 MILLER 10 Time taken: 57.477 seconds, Fetched: 56 row(s) hive (yinzhengjie)>
hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e right join dept d on e.deptno = d.deptno; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809234332_c83104d3-5265-4e3d-a2bf-342b5c397b9d Total jobs = 1 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2018-08-09 23:43:50 Starting to launch local task to process map join; maximum memory = 477626368 2018-08-09 23:43:54 Dump the side-table for tag: 0 with group count: 3 into file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-43-32_208_373121853797344697-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile40--.hashtable 2018-08-09 23:43:54 Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-43-32_208_373121853797344697-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile40--.hashtable (697 bytes) 2018-08-09 23:43:54 End of local task; Time Taken: 4.69 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1533789743141_0031, Tracking URL = http://s101:8088/proxy/application_1533789743141_0031/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0031 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 2018-08-09 23:44:21,028 Stage-3 map = 0%, reduce = 0% 2018-08-09 23:44:54,359 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 2.38 sec MapReduce Total cumulative CPU time: 2 seconds 380 msec Ended Job = job_1533789743141_0031 MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 2.38 sec HDFS Read: 6395 HDFS Write: 1585 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 380 msec OK e.empno e.ename d.deptno 7782 CLARK 10 7839 KING 10 7934 MILLER 10 7782 CLARK 10 7839 KING 10 7934 MILLER 10 7369 SMITH 20 7566 JONES 20 7788 SCOTT 20 7876 ADAMS 20 7902 FORD 20 7369 SMITH 20 7566 JONES 20 7788 SCOTT 20 7876 ADAMS 20 7902 FORD 20 7499 ALLEN 30 7521 WARD 30 7654 MARTIN 30 7698 BLAKE 30 7844 TURNER 30 7900 JAMES 30 7499 ALLEN 30 7521 WARD 30 7654 MARTIN 30 7698 BLAKE 30 7844 TURNER 30 7900 JAMES 30 NULL NULL 40 7782 CLARK 10 7839 KING 10 7934 MILLER 10 7782 CLARK 10 7839 KING 10 7934 MILLER 10 7369 SMITH 20 7566 JONES 20 7788 SCOTT 20 7876 ADAMS 20 7902 FORD 20 7369 SMITH 20 7566 JONES 20 7788 SCOTT 20 7876 ADAMS 20 7902 FORD 20 7499 ALLEN 30 7521 WARD 30 7654 MARTIN 30 7698 BLAKE 30 7844 TURNER 30 7900 JAMES 30 7499 ALLEN 30 7521 WARD 30 7654 MARTIN 30 7698 BLAKE 30 7844 TURNER 30 7900 JAMES 30 NULL NULL 40 Time taken: 87.954 seconds, Fetched: 58 row(s) hive (yinzhengjie)>
Join语句-满外连接(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e full join dept d on e.deptno = d.deptno;) 满外连接:将会返回所有表中符合WHERE语句条件的所有记录。如果任一表的指定字段没有符合条件的值的话,那么就使用NULL值替代。 hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e full join dept d on e.deptno = d.deptno; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809235025_e7e97788-2d65-45e0-b567-004f2d7057e0 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Estimated from input data size: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0035, Tracking URL = http://s101:8088/proxy/application_1533789743141_0035/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0035 Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1 2018-08-09 23:50:45,807 Stage-1 map = 0%, reduce = 0% 2018-08-09 23:51:08,516 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 2.58 sec 2018-08-09 23:51:14,735 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 4.88 sec 2018-08-09 23:51:27,238 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.56 sec MapReduce Total cumulative CPU time: 7 seconds 560 msec Ended Job = job_1533789743141_0035 MapReduce Jobs Launched: Stage-Stage-1: Map: 2 Reduce: 1 Cumulative CPU: 7.56 sec HDFS Read: 17097 HDFS Write: 1585 SUCCESS Total MapReduce CPU Time Spent: 7 seconds 560 msec OK e.empno e.ename d.deptno 7934 MILLER 10 7934 MILLER 10 7839 KING 10 7839 KING 10 7782 CLARK 10 7782 CLARK 10 7934 MILLER 10 7934 MILLER 10 7839 KING 10 7839 KING 10 7782 CLARK 10 7782 CLARK 10 7788 SCOTT 20 7788 SCOTT 20 7566 JONES 20 7566 JONES 20 7566 JONES 20 7566 JONES 20 7369 SMITH 20 7369 SMITH 20 7902 FORD 20 7902 FORD 20 7876 ADAMS 20 7876 ADAMS 20 7788 SCOTT 20 7788 SCOTT 20 7369 SMITH 20 7369 SMITH 20 7902 FORD 20 7902 FORD 20 7876 ADAMS 20 7876 ADAMS 20 7900 JAMES 30 7900 JAMES 30 7844 TURNER 30 7844 TURNER 30 7844 TURNER 30 7844 TURNER 30 7499 ALLEN 30 7499 ALLEN 30 7698 BLAKE 30 7698 BLAKE 30 7654 MARTIN 30 7654 MARTIN 30 7900 JAMES 30 7900 JAMES 30 7521 WARD 30 7521 WARD 30 7499 ALLEN 30 7499 ALLEN 30 7654 MARTIN 30 7654 MARTIN 30 7521 WARD 30 7521 WARD 30 7698 BLAKE 30 7698 BLAKE 30 NULL NULL 40 NULL NULL 40 Time taken: 63.838 seconds, Fetched: 58 row(s) hive (yinzhengjie)>
Join语句-多表连接查询(hive (yinzhengjie)> SELECT e.ename, d.deptno, l. loc_name FROM emp e JOIN dept d ON d.deptno = e.deptno JOIN location l ON d.loc = l.loc;) 测试文件内容: [yinzhengjie@s101 ~]$ cat /home/yinzhengjie/download/location.txt 1700 Beijing 1800 London 1900 Tokyo [yinzhengjie@s101 ~]$ 大多数情况下,Hive会对每对JOIN连接对象启动一个MapReduce任务。以下案例中会首先启动一个MapReduce job对表e和表d进行连接操作, 然后会再启动一个MapReduce job将第一个MapReduce job的输出和表l;进行连接操作。 温馨提示:为什么不是表d和表l先进行连接操作呢?这是因为Hive总是按照从左到右的顺序执行的。 hive (yinzhengjie)> create table if not exists yinzhengjie.location( > loc int, > loc_name string > ) > row format delimited fields terminated by ' '; #创建location表 OK Time taken: 0.614 seconds hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/location.txt' into table yinzhengjie.location; #向表中导入数据 Loading data to table yinzhengjie.location OK Time taken: 0.478 seconds hive (yinzhengjie)> hive (yinzhengjie)> SELECT e.ename, d.deptno, l. loc_name > FROM emp e > JOIN dept d > ON d.deptno = e.deptno > JOIN location l > ON d.loc = l.loc; #多表连接查询 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180809235602_7fbd82df-9541-4b76-b5c4-9482d4aa2ccc Total jobs = 1 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2018-08-09 23:56:12 Starting to launch local task to process map join; maximum memory = 477626368 2018-08-09 23:56:16 Dump the side-table for tag: 1 with group count: 3 into file: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018-08-09_23-56-02_428_1537442849954313200-1/-local-10005/HashTable-Stage-5/MapJoin-mapfile01--.hashtable 2018-08-09 23:56:16 Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018-08-09_23-56-02_428_1537442849954313200-1/-local-10005/HashTable-Stage-5/MapJoin-mapfile01--.hashtable (344 bytes) 2018-08-09 23:56:16 Dump the side-table for tag: 1 with group count: 4 into file: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018-08-09_23-56-02_428_1537442849954313200-1/-local-10005/HashTable-Stage-5/MapJoin-mapfile11--.hashtable 2018-08-09 23:56:16 Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018-08-09_23-56-02_428_1537442849954313200-1/-local-10005/HashTable-Stage-5/MapJoin-mapfile11--.hashtable (380 bytes) 2018-08-09 23:56:16 End of local task; Time Taken: 3.928 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1533789743141_0036, Tracking URL = http://s101:8088/proxy/application_1533789743141_0036/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0036 Hadoop job information for Stage-5: number of mappers: 1; number of reducers: 0 2018-08-09 23:56:37,193 Stage-5 map = 0%, reduce = 0% 2018-08-09 23:56:54,925 Stage-5 map = 100%, reduce = 0%, Cumulative CPU 2.64 sec MapReduce Total cumulative CPU time: 2 seconds 640 msec Ended Job = job_1533789743141_0036 MapReduce Jobs Launched: Stage-Stage-5: Map: 1 Cumulative CPU: 2.64 sec HDFS Read: 9513 HDFS Write: 865 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 640 msec OK e.ename d.deptno l.loc_name SMITH 20 London ALLEN 30 Tokyo WARD 30 Tokyo JONES 20 London MARTIN 30 Tokyo BLAKE 30 Tokyo CLARK 10 Beijing SCOTT 20 London KING 10 Beijing TURNER 30 Tokyo ADAMS 20 London JAMES 30 Tokyo FORD 20 London MILLER 10 Beijing SMITH 20 London ALLEN 30 Tokyo WARD 30 Tokyo JONES 20 London MARTIN 30 Tokyo BLAKE 30 Tokyo CLARK 10 Beijing SCOTT 20 London KING 10 Beijing TURNER 30 Tokyo ADAMS 20 London JAMES 30 Tokyo FORD 20 London MILLER 10 Beijing Time taken: 56.659 seconds, Fetched: 28 row(s) hive (yinzhengjie)>
Join语句-笛卡尔积(hive (yinzhengjie)> select * from emp, dept;) 笛卡尔集会在下面条件下产生: 1>.省略连接条件 2>.连接条件无效 3>.所有表中的所有行互相连接 hive (yinzhengjie)> set hive.mapred.mode=strict; hive (yinzhengjie)> set hive.mapred.mode; hive.mapred.mode=strict hive (yinzhengjie)> select * from emp, dept; #在strict模式执行笛卡尔积操作是失败的 FAILED: SemanticException Cartesian products are disabled for safety reasons. If you know what you are doing, please make sure that hive.strict.checks.cartesian.product is set to false and that hive.mapred.mode is not set to 'strict' to enable them. hive (yinzhengjie)> hive (yinzhengjie)> set hive.mapred.mode=nonstrict; hive (yinzhengjie)> set hive.mapred.mode; hive.mapred.mode=nonstrict hive (yinzhengjie)> select empno, deptno from emp, dept; FAILED: SemanticException Column deptno Found in more than One Tables/Subqueries hive (yinzhengjie)> select * from emp, dept; #在nonstrict模式执行笛卡尔积操作是可以的,但不推荐使用这样的查询语句,意义不大! Warning: Map Join MAPJOIN[9][bigTable=?] in task 'Stage-3:MAPRED' is a cross product WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180810000249_98e28c13-db4d-4e2b-81c6-28e44bf51f1d Total jobs = 1 SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/soft/apache-hive-2.1.1-bin/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hbase-1.2.6/lib/phoenix-4.10.0-HBase-1.2-client.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/soft/hadoop-2.7.3/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. 2018-08-10 00:03:00 Starting to launch local task to process map join; maximum memory = 477626368 2018-08-10 00:03:04 Dump the side-table for tag: 1 with group count: 1 into file: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018-08-10_00-02-49_246_882868568149391185-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable 2018-08-10 00:03:04 Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018-08-10_00-02-49_246_882868568149391185-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable (418 bytes) 2018-08-10 00:03:04 End of local task; Time Taken: 3.916 sec. Execution completed successfully MapredLocal task succeeded Launching Job 1 out of 1 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1533789743141_0037, Tracking URL = http://s101:8088/proxy/application_1533789743141_0037/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0037 Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0 2018-08-10 00:03:27,349 Stage-3 map = 0%, reduce = 0% 2018-08-10 00:03:40,822 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 1.8 sec MapReduce Total cumulative CPU time: 1 seconds 800 msec Ended Job = job_1533789743141_0037 MapReduce Jobs Launched: Stage-Stage-3: Map: 1 Cumulative CPU: 1.8 sec HDFS Read: 8853 HDFS Write: 17375 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 800 msec OK emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno dept.deptno dept.dname dept.loc 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 10 ACCOUNTING 2700 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 20 RESEARCH 3800 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 30 SALES 5900 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 40 OPERATIONS 4700 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 10 ACCOUNTING 1700 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 20 RESEARCH 1800 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 30 SALES 1900 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 40 OPERATIONS 1700 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 10 ACCOUNTING 2700 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 20 RESEARCH 3800 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 30 SALES 5900 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 40 OPERATIONS 4700 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 10 ACCOUNTING 1700 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 20 RESEARCH 1800 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 30 SALES 1900 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 40 OPERATIONS 1700 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 10 ACCOUNTING 2700 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 20 RESEARCH 3800 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 30 SALES 5900 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 40 OPERATIONS 4700 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 10 ACCOUNTING 1700 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 20 RESEARCH 1800 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 30 SALES 1900 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 40 OPERATIONS 1700 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 10 ACCOUNTING 2700 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 20 RESEARCH 3800 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 30 SALES 5900 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 40 OPERATIONS 4700 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 10 ACCOUNTING 1700 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 20 RESEARCH 1800 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 30 SALES 1900 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 40 OPERATIONS 1700 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 10 ACCOUNTING 2700 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 20 RESEARCH 3800 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 30 SALES 5900 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 40 OPERATIONS 4700 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 10 ACCOUNTING 1700 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 20 RESEARCH 1800 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 30 SALES 1900 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 40 OPERATIONS 1700 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 10 ACCOUNTING 2700 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 20 RESEARCH 3800 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 30 SALES 5900 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 40 OPERATIONS 4700 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 10 ACCOUNTING 1700 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 20 RESEARCH 1800 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 30 SALES 1900 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 40 OPERATIONS 1700 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 10 ACCOUNTING 2700 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 20 RESEARCH 3800 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 30 SALES 5900 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 40 OPERATIONS 4700 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 10 ACCOUNTING 1700 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 20 RESEARCH 1800 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 30 SALES 1900 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 40 OPERATIONS 1700 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 10 ACCOUNTING 2700 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 20 RESEARCH 3800 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 30 SALES 5900 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 40 OPERATIONS 4700 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 10 ACCOUNTING 1700 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 20 RESEARCH 1800 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 30 SALES 1900 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 40 OPERATIONS 1700 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 10 ACCOUNTING 2700 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 20 RESEARCH 3800 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 30 SALES 5900 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 40 OPERATIONS 4700 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 10 ACCOUNTING 1700 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 20 RESEARCH 1800 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 30 SALES 1900 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 40 OPERATIONS 1700 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 10 ACCOUNTING 2700 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 20 RESEARCH 3800 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 30 SALES 5900 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 40 OPERATIONS 4700 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 10 ACCOUNTING 1700 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 20 RESEARCH 1800 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 30 SALES 1900 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 40 OPERATIONS 1700 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 10 ACCOUNTING 2700 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 20 RESEARCH 3800 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 30 SALES 5900 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 40 OPERATIONS 4700 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 10 ACCOUNTING 1700 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 20 RESEARCH 1800 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 30 SALES 1900 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 40 OPERATIONS 1700 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 10 ACCOUNTING 2700 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 20 RESEARCH 3800 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 30 SALES 5900 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 40 OPERATIONS 4700 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 10 ACCOUNTING 1700 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 20 RESEARCH 1800 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 30 SALES 1900 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 40 OPERATIONS 1700 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 10 ACCOUNTING 2700 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 20 RESEARCH 3800 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 30 SALES 5900 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 40 OPERATIONS 4700 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 10 ACCOUNTING 1700 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 20 RESEARCH 1800 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 30 SALES 1900 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 40 OPERATIONS 1700 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 10 ACCOUNTING 2700 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 20 RESEARCH 3800 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 30 SALES 5900 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 40 OPERATIONS 4700 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 10 ACCOUNTING 1700 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 20 RESEARCH 1800 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 30 SALES 1900 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 40 OPERATIONS 1700 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 10 ACCOUNTING 2700 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 20 RESEARCH 3800 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 30 SALES 5900 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 40 OPERATIONS 4700 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 10 ACCOUNTING 1700 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 20 RESEARCH 1800 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 30 SALES 1900 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 40 OPERATIONS 1700 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 10 ACCOUNTING 2700 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 20 RESEARCH 3800 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 30 SALES 5900 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 40 OPERATIONS 4700 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 10 ACCOUNTING 1700 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 20 RESEARCH 1800 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 30 SALES 1900 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 40 OPERATIONS 1700 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 10 ACCOUNTING 2700 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 20 RESEARCH 3800 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 30 SALES 5900 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 40 OPERATIONS 4700 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 10 ACCOUNTING 1700 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 20 RESEARCH 1800 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 30 SALES 1900 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 40 OPERATIONS 1700 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 10 ACCOUNTING 2700 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 20 RESEARCH 3800 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 30 SALES 5900 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 40 OPERATIONS 4700 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 10 ACCOUNTING 1700 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 20 RESEARCH 1800 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 30 SALES 1900 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 40 OPERATIONS 1700 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 10 ACCOUNTING 2700 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 20 RESEARCH 3800 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 30 SALES 5900 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 40 OPERATIONS 4700 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 10 ACCOUNTING 1700 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 20 RESEARCH 1800 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 30 SALES 1900 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 40 OPERATIONS 1700 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 10 ACCOUNTING 2700 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 20 RESEARCH 3800 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 30 SALES 5900 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 40 OPERATIONS 4700 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 10 ACCOUNTING 1700 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 20 RESEARCH 1800 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 30 SALES 1900 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 40 OPERATIONS 1700 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 10 ACCOUNTING 2700 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 20 RESEARCH 3800 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 30 SALES 5900 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 40 OPERATIONS 4700 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 10 ACCOUNTING 1700 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 20 RESEARCH 1800 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 30 SALES 1900 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 40 OPERATIONS 1700 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 10 ACCOUNTING 2700 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 20 RESEARCH 3800 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 30 SALES 5900 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 40 OPERATIONS 4700 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 10 ACCOUNTING 1700 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 20 RESEARCH 1800 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 30 SALES 1900 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 40 OPERATIONS 1700 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 10 ACCOUNTING 2700 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 20 RESEARCH 3800 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 30 SALES 5900 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 40 OPERATIONS 4700 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 10 ACCOUNTING 1700 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 20 RESEARCH 1800 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 30 SALES 1900 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 40 OPERATIONS 1700 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 10 ACCOUNTING 2700 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 20 RESEARCH 3800 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 30 SALES 5900 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 40 OPERATIONS 4700 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 10 ACCOUNTING 1700 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 20 RESEARCH 1800 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 30 SALES 1900 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 40 OPERATIONS 1700 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 10 ACCOUNTING 2700 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 20 RESEARCH 3800 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 30 SALES 5900 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 40 OPERATIONS 4700 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 10 ACCOUNTING 1700 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 20 RESEARCH 1800 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 30 SALES 1900 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 40 OPERATIONS 1700 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 10 ACCOUNTING 2700 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 20 RESEARCH 3800 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 30 SALES 5900 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 40 OPERATIONS 4700 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 10 ACCOUNTING 1700 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 20 RESEARCH 1800 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 30 SALES 1900 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 40 OPERATIONS 1700 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 10 ACCOUNTING 2700 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 20 RESEARCH 3800 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 30 SALES 5900 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 40 OPERATIONS 4700 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 10 ACCOUNTING 1700 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 20 RESEARCH 1800 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 30 SALES 1900 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 40 OPERATIONS 1700 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 10 ACCOUNTING 2700 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 20 RESEARCH 3800 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 30 SALES 5900 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 40 OPERATIONS 4700 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 10 ACCOUNTING 1700 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 20 RESEARCH 1800 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 30 SALES 1900 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 40 OPERATIONS 1700 Time taken: 52.698 seconds, Fetched: 224 row(s) hive (yinzhengjie)>
排序-全局排序(hive (yinzhengjie)> select * from emp order by sal desc;) Order By:全局排序,一个MapReduce 1>.使用 ORDER BY 子句排序 ASC(ascend): 升序(默认) DESC(descend): 降序 2>.ORDER BY 子句在SELECT语句的结尾。 hive (yinzhengjie)> select * from emp order by sal; #查询员工信息按工资升序排列,默认就是升序排列 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180810001838_6c529433-c84b-447d-89e0-16af47dc89eb Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0039, Tracking URL = http://s101:8088/proxy/application_1533789743141_0039/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0039 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-08-10 00:18:56,082 Stage-1 map = 0%, reduce = 0% 2018-08-10 00:19:37,122 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.66 sec 2018-08-10 00:19:59,288 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.41 sec MapReduce Total cumulative CPU time: 4 seconds 410 msec Ended Job = job_1533789743141_0039 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.41 sec HDFS Read: 10952 HDFS Write: 1745 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 410 msec OK emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 Time taken: 82.564 seconds, Fetched: 28 row(s) hive (yinzhengjie)> hive (yinzhengjie)> select * from emp order by sal desc; #查询员工信息按工资降序排列 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180810002012_ebf1251c-c92b-4010-bea7-bb8a2c34ebdb Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0040, Tracking URL = http://s101:8088/proxy/application_1533789743141_0040/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0040 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-08-10 00:20:30,216 Stage-1 map = 0%, reduce = 0% 2018-08-10 00:20:44,683 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.47 sec 2018-08-10 00:21:00,184 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.31 sec MapReduce Total cumulative CPU time: 5 seconds 310 msec Ended Job = job_1533789743141_0040 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 5.31 sec HDFS Read: 10952 HDFS Write: 1745 SUCCESS Total MapReduce CPU Time Spent: 5 seconds 310 msec OK emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 Time taken: 51.103 seconds, Fetched: 28 row(s) hive (yinzhengjie)>
排序-按照别名排序(hive (yinzhengjie)> select ename, sal*2 twosal from emp order by twosal;) hive (yinzhengjie)> select ename, sal*2 twosal from emp order by twosal; #按照员工薪水的2倍排序 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180810002258_b9f73ab7-2a29-459a-9b27-119eb56f1dde Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0041, Tracking URL = http://s101:8088/proxy/application_1533789743141_0041/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0041 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-08-10 00:23:17,109 Stage-1 map = 0%, reduce = 0% 2018-08-10 00:23:29,497 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.6 sec 2018-08-10 00:23:41,890 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.99 sec MapReduce Total cumulative CPU time: 4 seconds 990 msec Ended Job = job_1533789743141_0041 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 4.99 sec HDFS Read: 10079 HDFS Write: 789 SUCCESS Total MapReduce CPU Time Spent: 4 seconds 990 msec OK ename twosal SMITH 1600.0 SMITH 1600.0 JAMES 1900.0 JAMES 1900.0 ADAMS 2200.0 ADAMS 2200.0 WARD 2500.0 WARD 2500.0 MARTIN 2500.0 MARTIN 2500.0 MILLER 2600.0 MILLER 2600.0 TURNER 3000.0 TURNER 3000.0 ALLEN 3200.0 ALLEN 3200.0 CLARK 4900.0 CLARK 4900.0 BLAKE 5700.0 BLAKE 5700.0 JONES 5950.0 JONES 5950.0 SCOTT 6000.0 SCOTT 6000.0 FORD 6000.0 FORD 6000.0 KING 10000.0 KING 10000.0 Time taken: 44.517 seconds, Fetched: 28 row(s) hive (yinzhengjie)>
排序-多个列排序(hive (yinzhengjie)> select ename, deptno, sal from emp order by deptno, sal ;) hive (yinzhengjie)> select ename, deptno, sal from emp order by deptno, sal ; #按照部门和工资升序排序 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180810002405_c29a1508-8152-4d7c-9b50-e2fc04c8bdbc Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks determined at compile time: 1 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0042, Tracking URL = http://s101:8088/proxy/application_1533789743141_0042/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0042 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1 2018-08-10 00:24:21,693 Stage-1 map = 0%, reduce = 0% 2018-08-10 00:24:35,159 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.77 sec 2018-08-10 00:24:44,565 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 3.85 sec MapReduce Total cumulative CPU time: 3 seconds 850 msec Ended Job = job_1533789743141_0042 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 3.85 sec HDFS Read: 9332 HDFS Write: 867 SUCCESS Total MapReduce CPU Time Spent: 3 seconds 850 msec OK ename deptno sal MILLER 10 1300.0 MILLER 10 1300.0 CLARK 10 2450.0 CLARK 10 2450.0 KING 10 5000.0 KING 10 5000.0 SMITH 20 800.0 SMITH 20 800.0 ADAMS 20 1100.0 ADAMS 20 1100.0 JONES 20 2975.0 JONES 20 2975.0 FORD 20 3000.0 SCOTT 20 3000.0 FORD 20 3000.0 SCOTT 20 3000.0 JAMES 30 950.0 JAMES 30 950.0 WARD 30 1250.0 MARTIN 30 1250.0 MARTIN 30 1250.0 WARD 30 1250.0 TURNER 30 1500.0 TURNER 30 1500.0 ALLEN 30 1600.0 ALLEN 30 1600.0 BLAKE 30 2850.0 BLAKE 30 2850.0 Time taken: 39.975 seconds, Fetched: 28 row(s) hive (yinzhengjie)>
排序-每个MapReduce内部排序(hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result' ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' select * from emp sort by deptno desc;) hive (yinzhengjie)> set mapreduce.job.reduces=3; #设置reduce个数 hive (yinzhengjie)> set mapreduce.job.reduces; #查看设置reduce个数 mapreduce.job.reduces=3 hive (yinzhengjie)> hive (yinzhengjie)> select * from emp sort by empno desc; #根据部门编号降序查看员工信息 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180810002752_cd4d7e0d-be26-4053-8730-9379c1632a3a Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Defaulting to jobconf value of: 3 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0043, Tracking URL = http://s101:8088/proxy/application_1533789743141_0043/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0043 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3 2018-08-10 00:28:08,954 Stage-1 map = 0%, reduce = 0% 2018-08-10 00:28:20,313 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.02 sec 2018-08-10 00:28:33,921 Stage-1 map = 100%, reduce = 11%, Cumulative CPU 2.45 sec 2018-08-10 00:28:36,045 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 4.76 sec 2018-08-10 00:28:37,074 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 7.48 sec 2018-08-10 00:28:54,683 Stage-1 map = 100%, reduce = 89%, Cumulative CPU 10.02 sec 2018-08-10 00:28:57,007 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 10.69 sec MapReduce Total cumulative CPU time: 10 seconds 690 msec Ended Job = job_1533789743141_0043 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 10.69 sec HDFS Read: 20664 HDFS Write: 1919 SUCCESS Total MapReduce CPU Time Spent: 10 seconds 690 msec OK emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 Time taken: 67.599 seconds, Fetched: 28 row(s) hive (yinzhengjie)> hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result' ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' select * from emp sort by deptno desc; #将查询结果导入到文件中(按照部门编号降序排序) WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180810003404_42a220b7-02c7-42ae-bf8a-566c6300f4c3 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Defaulting to jobconf value of: 3 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0045, Tracking URL = http://s101:8088/proxy/application_1533789743141_0045/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0045 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3 2018-08-10 00:34:28,526 Stage-1 map = 0%, reduce = 0% 2018-08-10 00:34:37,987 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.22 sec 2018-08-10 00:34:46,345 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 3.35 sec 2018-08-10 00:34:49,548 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 5.71 sec 2018-08-10 00:35:05,098 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 7.57 sec MapReduce Total cumulative CPU time: 7 seconds 570 msec Ended Job = job_1533789743141_0045 Moving data to local directory /home/yinzhengjie/download/sortby-result MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 7.57 sec HDFS Read: 19815 HDFS Write: 1322 SUCCESS Total MapReduce CPU Time Spent: 7 seconds 570 msec OK emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno Time taken: 62.425 seconds hive (yinzhengjie)>
排序-分区排序(hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result' ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' select * from emp distribute by deptno sort by empno desc;) Distribute By:类似MR中partition,进行分区,结合sort by使用。 温馨提示,Hive要求DISTRIBUTE BY语句要写在SORT BY语句之前。对于distribute by进行测试,一定要分配多reduce进行处理,否则无法看到distribute by的效果。 hive (yinzhengjie)> set mapreduce.job.reduces; mapreduce.job.reduces=3 hive (yinzhengjie)> hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result' ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' select * from emp distribute by deptno sort by empno desc; #先按照部门编号分区,再按照员工编号降序排序。 WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180810003826_af885657-4f0a-4e2a-83f3-62cbdabda4f3 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Defaulting to jobconf value of: 3 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0046, Tracking URL = http://s101:8088/proxy/application_1533789743141_0046/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0046 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3 2018-08-10 00:38:46,632 Stage-1 map = 0%, reduce = 0% 2018-08-10 00:39:27,774 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.07 sec 2018-08-10 00:39:45,945 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 4.54 sec 2018-08-10 00:39:50,095 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 6.44 sec 2018-08-10 00:39:51,122 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.78 sec MapReduce Total cumulative CPU time: 8 seconds 780 msec Ended Job = job_1533789743141_0046 Moving data to local directory /home/yinzhengjie/download/sortby-result MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 8.78 sec HDFS Read: 19858 HDFS Write: 1322 SUCCESS Total MapReduce CPU Time Spent: 8 seconds 780 msec OK emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno Time taken: 86.59 seconds hive (yinzhengjie)>
排序-Cluster By(hive (yinzhengjie)> select * from emp cluster by deptno;) 当distribute by和sorts by字段相同时,可以使用cluster by方式。 cluster by除了具有distribute by的功能外还兼具sort by的功能。但是排序只能是倒序排序,不能指定排序规则为ASC或者DESC。 我们可以看以下两个案例,以下两种写法等价: hive (yinzhengjie)> select * from emp cluster by deptno; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180810004115_0faf59ba-950a-4f86-885a-00865338c95c Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Defaulting to jobconf value of: 3 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0047, Tracking URL = http://s101:8088/proxy/application_1533789743141_0047/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0047 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3 2018-08-10 00:41:31,323 Stage-1 map = 0%, reduce = 0% 2018-08-10 00:41:40,638 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec 2018-08-10 00:41:49,985 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 3.64 sec 2018-08-10 00:41:58,261 Stage-1 map = 100%, reduce = 67%, Cumulative CPU 5.93 sec 2018-08-10 00:42:13,824 Stage-1 map = 100%, reduce = 89%, Cumulative CPU 8.2 sec 2018-08-10 00:42:16,943 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 8.97 sec MapReduce Total cumulative CPU time: 8 seconds 970 msec Ended Job = job_1533789743141_0047 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 8.97 sec HDFS Read: 20707 HDFS Write: 1919 SUCCESS Total MapReduce CPU Time Spent: 8 seconds 970 msec OK emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 Time taken: 64.632 seconds, Fetched: 28 row(s) hive (yinzhengjie)> select * from emp distribute by deptno sort by deptno; WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. Query ID = yinzhengjie_20180810004343_d5ce078f-80a7-4762-8a00-a75b6a97f7b2 Total jobs = 1 Launching Job 1 out of 1 Number of reduce tasks not specified. Defaulting to jobconf value of: 3 In order to change the average load for a reducer (in bytes): set hive.exec.reducers.bytes.per.reducer=<number> In order to limit the maximum number of reducers: set hive.exec.reducers.max=<number> In order to set a constant number of reducers: set mapreduce.job.reduces=<number> Starting Job = job_1533789743141_0048, Tracking URL = http://s101:8088/proxy/application_1533789743141_0048/ Kill Command = /soft/hadoop-2.7.3/bin/hadoop job -kill job_1533789743141_0048 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3 2018-08-10 00:43:58,038 Stage-1 map = 0%, reduce = 0% 2018-08-10 00:44:10,447 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.51 sec 2018-08-10 00:44:23,055 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 4.62 sec 2018-08-10 00:44:29,343 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 10.22 sec MapReduce Total cumulative CPU time: 10 seconds 220 msec Ended Job = job_1533789743141_0048 MapReduce Jobs Launched: Stage-Stage-1: Map: 1 Reduce: 3 Cumulative CPU: 10.22 sec HDFS Read: 20707 HDFS Write: 1919 SUCCESS Total MapReduce CPU Time Spent: 10 seconds 220 msec OK emp.empno emp.ename emp.job emp.mgr emp.hiredate emp.sal emp.comm emp.deptno 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 7844 TURNER SALESMAN 7698 1981-9-8 1500.0 0.0 30 7499 ALLEN SALESMAN 7698 1981-2-20 1600.0 300.0 30 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 7900 JAMES CLERK 7698 1981-12-3 950.0 NULL 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 7654 MARTIN SALESMAN 7698 1981-9-28 1250.0 1400.0 30 7698 BLAKE MANAGER 7839 1981-5-1 2850.0 NULL 30 7521 WARD SALESMAN 7698 1981-2-22 1250.0 500.0 30 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 7934 MILLER CLERK 7782 1982-1-23 1300.0 NULL 10 7839 KING PRESIDENT NULL 1981-11-17 5000.0 NULL 10 7782 CLARK MANAGER 7839 1981-6-9 2450.0 NULL 10 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 7369 SMITH CLERK 7902 1980-12-17 800.0 NULL 20 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 7788 SCOTT ANALYST 7566 1987-4-19 3000.0 NULL 20 7566 JONES MANAGER 7839 1981-4-2 2975.0 NULL 20 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 7902 FORD ANALYST 7566 1981-12-3 3000.0 NULL 20 7876 ADAMS CLERK 7788 1987-5-23 1100.0 NULL 20 Time taken: 48.312 seconds, Fetched: 28 row(s) hive (yinzhengjie)>
分桶表-分桶抽样查询(hive (yinzhengjie)> select * from stu_buck tablesample(bucket 1 out of 4 on id);) 对于非常大的数据集,有时用户需要使用的是一个具有代表性的查询结果而不是全部结果。Hive可以通过对表进行抽样来满足这个需求。 hive (yinzhengjie)> select * from stu_buck; OK stu_buck.id stu_buck.name 1016 ss16 1012 ss12 1008 ss8 1004 ss4 1001 ss1 1013 ss13 1005 ss5 1009 ss9 1014 ss14 1010 ss10 1006 ss6 1002 ss2 1015 ss15 1007 ss7 1003 ss3 1011 ss11 Time taken: 0.073 seconds, Fetched: 16 row(s) hive (yinzhengjie)> select * from stu_buck tablesample(bucket 1 out of 4 on id); #查询表stu_buck中的数据。 OK stu_buck.id stu_buck.name 1016 ss16 1012 ss12 1008 ss8 1004 ss4 Time taken: 0.088 seconds, Fetched: 4 row(s) hive (yinzhengjie)> 温馨提示:tablesample是抽样语句,语法:TABLESAMPLE(BUCKET x OUT OF y) 。 y必须是table总bucket数的倍数或者因子。hive根据y的大小,决定抽样的比例。例如,table总共分了4份,当y=2时,抽取(4/2=)2个bucket的数据,当y=8时,抽取(4/8=)1/2个bucket的数据。 x表示从哪个bucket开始抽取。例如,table总bucket数为4,tablesample(bucket 4 out of 4),表示总共抽取(4/4=)1个bucket的数据,抽取第4个bucket的数据。 注意:x的值必须小于等于y的值,否则会抛异常,FAILED: SemanticException [Error 10061]: Numerator should not be bigger than denominator in sample clause for table stu_buck
分桶表-数据块抽样(hive (yinzhengjie)> select * from stu_buck tablesample(0.1 percent);) Hive提供了另外一种按照百分比进行抽样的方式,这种是基于行数的,按照输入路径下的数据块百分比进行的抽样。 温馨提示: 这种抽样方式不一定适用于所有的文件格式。另外,这种抽样的最小抽样单元是一个HDFS数据块。因此,如果表的数据大小小于普通的块大小128M的话,那么将会返回所有行。 hive (yinzhengjie)> select * from stu_buck; OK stu_buck.id stu_buck.name 1016 ss16 1012 ss12 1008 ss8 1004 ss4 1001 ss1 1013 ss13 1005 ss5 1009 ss9 1014 ss14 1010 ss10 1006 ss6 1002 ss2 1015 ss15 1007 ss7 1003 ss3 1011 ss11 Time taken: 0.078 seconds, Fetched: 16 row(s) hive (yinzhengjie)> select * from stu_buck tablesample(0.1 percent) ; #注意,stu_buck是一个4和桶的桶表,因此他不会把桶表的数据都查询出来,因为它是从四个桶中随机抽取的一个桶的数据 OK stu_buck.id stu_buck.name 1016 ss16 1012 ss12 1008 ss8 1004 ss4 Time taken: 0.04 seconds, Fetched: 4 row(s) hive (yinzhengjie)> select * from stu tablesample(0.1 percent) ; OK stu.id stu.name 1001 ss1 1002 ss2 1003 ss3 1004 ss4 1005 ss5 1006 ss6 1007 ss7 1008 ss8 1009 ss9 1010 ss10 1011 ss11 1012 ss12 1013 ss13 1014 ss14 1015 ss15 1016 ss16 Time taken: 0.059 seconds, Fetched: 16 row(s) hive (yinzhengjie)>
5>.函数
hive (yinzhengjie)> show functions; #查看系统自带的函数 hive (yinzhengjie)> desc function xpath; #显示自带的函数的用法 hive (yinzhengjie)> desc function extended xpath; #详细显示自带的函数的用法 关于自定义函数,可以参考:https://www.cnblogs.com/yinzhengjie/p/9154359.html