zoukankan      html  css  js  c++  java
  • Hadoop生态圈-Hive快速入门篇之HQL的基础语法







    1>.Default数据仓库的最原始位置在“hdfs:/user/hive/warehouse/ ”路径下
            <description>location of default database for the warehouse</description>


        <description>Whether to print the names of the columns in query output.</description>
        <description>Whether to include the current database in the Hive prompt.</description>



    [yinzhengjie@s101 ~]$ cd /soft/hive/conf/
    [yinzhengjie@s101 conf]$ 
    [yinzhengjie@s101 conf]$ cp hive-log4j2.properties.template hive-log4j2.properties                        #拷贝模板文件生成配置文件
    [yinzhengjie@s101 conf]$ grep property.hive.log.dir  hive-log4j2.properties  | grep -v ^#                
    property.hive.log.dir = /home/yinzhengjie/hive/logs                                                        #指定log的存放位置
    [yinzhengjie@s101 conf]$ 
    [yinzhengjie@s101 conf]$ ll /home/yinzhengjie/hive/logs/hive.log 
    -rw-rw-r-- 1 yinzhengjie yinzhengjie 4265 Aug  5 21:20 /home/yinzhengjie/hive/logs/hive.log                #重启hive,查看日志文件中的内容
    [yinzhengjie@s101 conf]$ 


    1>.查看当前的所有配置信息(hive (yinzhengjie)> set;)
            默认配置文件:            hive-default.xml
            用户自定义配置文件:    hive-site.xml
            [yinzhengjie@s101 ~]$ hive -hiveconf mapred.reduce.tasks=10
            Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
            hive (default)> set mapred.reduce.tasks;
            hive (default)> quit;
            [yinzhengjie@s101 ~]$ 
            [yinzhengjie@s101 ~]$ hive
            hive (default)> set mapred.reduce.tasks;
            hive (default)> exit;
            [yinzhengjie@s101 ~]$ 
            [yinzhengjie@s101 ~]$ hive
            hive (default)> set mapred.reduce.tasks;
            hive (default)> set mapred.reduce.tasks=100;
            hive (default)> set mapred.reduce.tasks;
            hive (default)> quit;
            [yinzhengjie@s101 ~]$ 







































    ‘now is the time’ “for all good men”






      Hive有三种复杂数据类型ARRAY、MAP 和 STRUCT。ARRAY和MAP与Java中的Array和Map类似,而STRUCT与C语言中的Struct类似,它封装了一个命名字段集合,复杂数据类型允许任意层次的嵌套。





    和c语言中的struct类似,都可以通过“点”符号访问元素内容。例如,如果某个列的数据类型是STRUCT{first STRING, last STRING},那么第1个元素可以通过字段.first来引用。






    数组是一组具有相同类型和名称的变量的集合。这些变量称为数组的元素,每个数组元素都有一个编号,编号从零开始。例如,数组值为[‘John’, ‘Doe’],那么第2个元素可以通过数组名[1]进行引用。








      温馨提示:可以使用CAST操作显示进行数据类型转换,例如CAST('1' AS INT)将把字符串'1' 转换成整数1;如果强制类型转换失败,如执行CAST('X' AS INT),表达式返回空值 NULL。




    [yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/test.txt 
    [yinzhengjie@s101 download]$ 


    create table test(
        name string,
        friends array<string>,
        children map<string, int>,
        address struct<street:string, city:string>
    row format delimited fields terminated by ','
    collection items terminated by '_'
    map keys terminated by ':'
    lines terminated by '


    hive (yinzhengjie)>  load data local inpath '/home/yinzhengjie/download/test.txt' into table test;
    Loading data to table yinzhengjie.test
    Time taken: 0.335 seconds
    hive (yinzhengjie)> select * from test;
    test.name    test.friends    test.children    test.address
    漩涡鸣人    ["我爱罗","佐助"]    {"漩涡博人":18,"漩涡向日葵":16}    {"street":"一乐拉面附近","city":"木业忍者村"}
    宇智波富岳    ["宇智波美琴","志村团藏"]    {"宇智波鼬":28,"宇智波佐助":19}    {"street":"木叶警务部","city":"木业忍者村"}
    Time taken: 0.099 seconds, Fetched: 2 row(s)
    hive (yinzhengjie)> 


    hive (yinzhengjie)> select * from test;
    test.name    test.friends    test.children    test.address
    漩涡鸣人    ["我爱罗","佐助"]    {"漩涡博人":18,"漩涡向日葵":16}    {"street":"一乐拉面附近","city":"木业忍者村"}
    宇智波富岳    ["宇智波美琴","志村团藏"]    {"宇智波鼬":28,"宇智波佐助":19}    {"street":"木叶警务部","city":"木业忍者村"}
    Time taken: 0.085 seconds, Fetched: 2 row(s)
    hive (yinzhengjie)> select friends[0],children['漩涡博人'],address.city from test where name="漩涡鸣人";
    _c0    _c1    city
    我爱罗    18    木业忍者村
    Time taken: 0.096 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)> select friends[1],children['漩涡向日葵'],address.city from test where name="漩涡鸣人";
    _c0    _c1    city
    佐助    16    木业忍者村
    Time taken: 0.1 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)> 



    [yinzhengjie@s101 ~]$ more `which xcall.sh`
    #@author :yinzhengjie
    if [ $# -lt 1 ];then
            echo "请输入参数"
    for (( i=101;i<=105;i++ ))
            tput setaf 2
            echo ============= s$i $cmd ============
            tput setaf 7
            ssh s$i $cmd
            if [ $? == 0 ];then
                    echo "命令执行成功"
    [yinzhengjie@s101 ~]$ 
    查看集群的命令脚本([yinzhengjie@s101 ~]$ more `which xcall.sh`)
    [yinzhengjie@s101 ~]$ more `which start-dfs.sh` | grep -v ^# | grep -v ^$
    usage="Usage: start-dfs.sh [-upgrade|-rollback] [other options such as -clusterId]"
    bin=`dirname "${BASH_SOURCE-$0}"`
    bin=`cd "$bin"; pwd`
    . $HADOOP_LIBEXEC_DIR/hdfs-config.sh
    if [[ $# -ge 1 ]]; then
      case "$startOpt" in
          echo $usage
          exit 1
    nameStartOpt="$nameStartOpt $@"
    NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes)
    echo "Starting namenodes on [$NAMENODES]"
      --config "$HADOOP_CONF_DIR" 
      --hostnames "$NAMENODES" 
      --script "$bin/hdfs" start namenode $nameStartOpt
    if [ -n "$HADOOP_SECURE_DN_USER" ]; then
        "Attempting to start secure cluster, skipping datanodes. " 
        "Run start-secure-dns.sh as root to complete startup."
        --config "$HADOOP_CONF_DIR" 
        --script "$bin/hdfs" start datanode $dataStartOpt
    SECONDARY_NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -secondarynamenodes 2>/dev/null)
    if [ -n "$SECONDARY_NAMENODES" ]; then
      echo "Starting secondary namenodes [$SECONDARY_NAMENODES]"
          --config "$HADOOP_CONF_DIR" 
          --hostnames "$SECONDARY_NAMENODES" 
          --script "$bin/hdfs" start secondarynamenode
    SHARED_EDITS_DIR=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.namenode.shared.edits.dir 2>&-)
    case "$SHARED_EDITS_DIR" in
      JOURNAL_NODES=$(echo "$SHARED_EDITS_DIR" | sed 's,qjournal://([^/]*)/.*,1,g; s/;/ /g; s/:[0-9]*//g')
      echo "Starting journal nodes [$JOURNAL_NODES]"
          --config "$HADOOP_CONF_DIR" 
          --hostnames "$JOURNAL_NODES" 
          --script "$bin/hdfs" start journalnode ;;
    AUTOHA_ENABLED=$($HADOOP_PREFIX/bin/hdfs getconf -confKey dfs.ha.automatic-failover.enabled)
    if [ "$(echo "$AUTOHA_ENABLED" | tr A-Z a-z)" = "true" ]; then
      echo "Starting ZK Failover Controllers on NN hosts [$NAMENODES]"
        --config "$HADOOP_CONF_DIR" 
        --hostnames "$NAMENODES" 
        --script "$bin/hdfs" start zkfc
    [yinzhengjie@s101 ~]$
    HDFS分布式文件系统启动脚本([yinzhengjie@s101 ~]$ more `which start-dfs.sh` | grep -v ^# | grep -v ^$)
    [yinzhengjie@s101 ~]$ cat /soft/hadoop/sbin/start-yarn.sh | grep -v ^# | grep -v ^$
    echo "starting yarn daemons"
    bin=`dirname "${BASH_SOURCE-$0}"`
    bin=`cd "$bin"; pwd`
    . $HADOOP_LIBEXEC_DIR/yarn-config.sh
    "$bin"/yarn-daemon.sh --config $YARN_CONF_DIR  start resourcemanager
    "$bin"/yarn-daemons.sh --config $YARN_CONF_DIR  start nodemanager
    [yinzhengjie@s101 ~]$
    Yarn启动脚本([yinzhengjie@s101 ~]$ cat /soft/hadoop/sbin/start-yarn.sh | grep -v ^# | grep -v ^$)
    [yinzhengjie@s101 ~]$ more `which xzk.sh`
    #@author :yinzhengjie
    if [ $# -ne 1 ];then
        echo "无效参数,用法为: $0  {start|stop|restart|status}"
    function zookeeperManger(){
        case $cmd in
            echo "启动服务"        
            remoteExecution start
            echo "停止服务"
            remoteExecution stop
            echo "重启服务"
            remoteExecution restart
            echo "查看状态"
            remoteExecution status
            echo "无效参数,用法为: $0  {start|stop|restart|status}"
    function remoteExecution(){
        for (( i=102 ; i<=104 ; i++ )) ; do
                tput setaf 2
                echo ========== s$i zkServer.sh  $1 ================
                tput setaf 9
                ssh s$i  "source /etc/profile ; zkServer.sh $1"
    [yinzhengjie@s101 ~]$ 
    zookeeper启动脚本([yinzhengjie@s101 ~]$ more `which xzk.sh`)
    [yinzhengjie@s101 ~]$ xzk.sh start
    ========== s102 zkServer.sh start ================
    ZooKeeper JMX enabled by default
    Using config: /soft/zk/bin/../conf/zoo.cfg
    Starting zookeeper ... STARTED
    ========== s103 zkServer.sh start ================
    ZooKeeper JMX enabled by default
    Starting zookeeper ... Using config: /soft/zk/bin/../conf/zoo.cfg
    ========== s104 zkServer.sh start ================
    ZooKeeper JMX enabled by default
    Using config: /soft/zk/bin/../conf/zoo.cfg
    Starting zookeeper ... STARTED
    [yinzhengjie@s101 ~]$ 
    [yinzhengjie@s101 ~]$ xcall.sh jps
    ============= s101 jps ============
    6232 Jps
    ============= s102 jps ============
    4081 QuorumPeerMain
    4110 Jps
    ============= s103 jps ============
    4044 QuorumPeerMain
    4079 Jps
    ============= s104 jps ============
    4076 Jps
    4047 QuorumPeerMain
    ============= s105 jps ============
    3383 Jps
    [yinzhengjie@s101 ~]$ 
    启动zookeeper([yinzhengjie@s101 ~]$ xzk.sh start)
    [yinzhengjie@s101 ~]$ start-dfs.sh 
    Starting namenodes on [s101 s105]
    s101: starting namenode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-namenode-s101.out
    s105: starting namenode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-namenode-s105.out
    s103: starting datanode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-datanode-s103.out
    s102: starting datanode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-datanode-s102.out
    s104: starting datanode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-datanode-s104.out
    Starting journal nodes [s102 s103 s104]
    s102: starting journalnode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-journalnode-s102.out
    s103: starting journalnode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-journalnode-s103.out
    s104: starting journalnode, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-journalnode-s104.out
    Starting ZK Failover Controllers on NN hosts [s101 s105]
    s101: starting zkfc, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-zkfc-s101.out
    s105: starting zkfc, logging to /soft/hadoop-2.7.3/logs/hadoop-yinzhengjie-zkfc-s105.out
    [yinzhengjie@s101 ~]$ 
    [yinzhengjie@s101 ~]$ 
    [yinzhengjie@s101 ~]$ xcall.sh jps
    ============= s101 jps ============
    6755 Jps
    6380 NameNode
    6685 DFSZKFailoverController
    ============= s102 jps ============
    4240 JournalNode
    4081 QuorumPeerMain
    4159 DataNode
    4335 Jps
    ============= s103 jps ============
    4304 Jps
    4130 DataNode
    4211 JournalNode
    4044 QuorumPeerMain
    ============= s104 jps ============
    4300 Jps
    4125 DataNode
    4047 QuorumPeerMain
    4207 JournalNode
    ============= s105 jps ============
    3538 DFSZKFailoverController
    3436 NameNode
    3597 Jps
    [yinzhengjie@s101 ~]$ 
    启动HDFS分布式文件系统([yinzhengjie@s101 ~]$ start-dfs.sh )
    [yinzhengjie@s101 ~]$ start-yarn.sh 
    starting yarn daemons
    s101: starting resourcemanager, logging to /soft/hadoop-2.7.3/logs/yarn-yinzhengjie-resourcemanager-s101.out
    s105: starting resourcemanager, logging to /soft/hadoop-2.7.3/logs/yarn-yinzhengjie-resourcemanager-s105.out
    s103: starting nodemanager, logging to /soft/hadoop-2.7.3/logs/yarn-yinzhengjie-nodemanager-s103.out
    s102: starting nodemanager, logging to /soft/hadoop-2.7.3/logs/yarn-yinzhengjie-nodemanager-s102.out
    s104: starting nodemanager, logging to /soft/hadoop-2.7.3/logs/yarn-yinzhengjie-nodemanager-s104.out
    [yinzhengjie@s101 ~]$ 
    [yinzhengjie@s101 ~]$ 
    [yinzhengjie@s101 ~]$ xcall.sh jps
    ============= s101 jps ============
    6883 ResourceManager
    6982 Jps
    6380 NameNode
    6685 DFSZKFailoverController
    ============= s102 jps ============
    4240 JournalNode
    4081 QuorumPeerMain
    4387 NodeManager
    4424 Jps
    4159 DataNode
    ============= s103 jps ============
    4130 DataNode
    4211 JournalNode
    4356 NodeManager
    4436 Jps
    4044 QuorumPeerMain
    ============= s104 jps ============
    4352 NodeManager
    4390 Jps
    4125 DataNode
    4047 QuorumPeerMain
    4207 JournalNode
    ============= s105 jps ============
    3538 DFSZKFailoverController
    3436 NameNode
    3710 Jps
    [yinzhengjie@s101 ~]$ 
    启动yarn资源调度([yinzhengjie@s101 ~]$ start-yarn.sh )


    [yinzhengjie@s101 download]$ cat teachers.txt 
    70    Dennis MacAlistair Ritchie
    49    Linus Benedict Torvalds
    68    Bjarne Stroustrup
    62    Guido van Rossum
    63    James Gosling
    60    Martin Odersky
    62    Rob Pike
    50    Rasmus Lerdorf
    50    Brendan Eich
    [yinzhengjie@s101 download]$ 
    [yinzhengjie@s101 download]$ cat teachers.txt
    [yinzhengjie@s101 ~]$ hive -help
    usage: hive
     -d,--define <key=value>          Variable subsitution to apply to hive
                                      commands. e.g. -d A=B or --define A=B
        --database <databasename>     Specify the database to use
     -e <quoted-query-string>         SQL from command line
     -f <filename>                    SQL from files
     -H,--help                        Print help information
        --hiveconf <property=value>   Use value for given property
        --hivevar <key=value>         Variable subsitution to apply to hive
                                      commands. e.g. --hivevar A=B
     -i <filename>                    Initialization SQL file
     -S,--silent                      Silent mode in interactive shell
     -v,--verbose                     Verbose mode (echo executed SQL to the
    [yinzhengjie@s101 ~]$ 
    查看帮助信息([yinzhengjie@s101 ~]$ hive -help)
    [yinzhengjie@s101 ~]$ hive
    Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    登录hive的shell命令行交互界面([yinzhengjie@s101 ~]$ hive)
    [yinzhengjie@s101 ~]$ hive
    Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive> show databases;
    Time taken: 0.01 seconds, Fetched: 2 row(s)
    查看已经存在的库名(hive> show databases;)
    [yinzhengjie@s101 ~]$ hive
    Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive> show databases;
    Time taken: 0.008 seconds, Fetched: 2 row(s)
    hive> use yinzhengjie;
    Time taken: 0.018 seconds
    使用已经存在的数据库(hive> use yinzhengjie;)
    hive> show databases;
    Time taken: 0.008 seconds, Fetched: 2 row(s)
    hive> use yinzhengjie;
    Time taken: 0.018 seconds
    hive> show tables;
    Time taken: 0.025 seconds, Fetched: 7 row(s)
    查看当前库已经存在的表(hive> show tables;)
    hive> show databases;
    Time taken: 0.008 seconds, Fetched: 2 row(s)
    hive> use yinzhengjie;
    Time taken: 0.018 seconds
    hive> show tables;
    Time taken: 0.025 seconds, Fetched: 7 row(s)
    hive> create table Teacher(id int,name string)row format delimited fields terminated by '	';
    Time taken: 0.626 seconds
    hive> show tables;
    Time taken: 0.028 seconds, Fetched: 8 row(s)
    创建一个teacher表(hive> create table Teacher(id int,name string)row format delimited fields terminated by ' ';)
    hive> show tables;
    Time taken: 0.022 seconds, Fetched: 2 row(s)
    hive> select * from teacher;
    Time taken: 0.105 seconds
    hive> load data local inpath '/home/yinzhengjie/download/teachers.txt' into table yinzhengjie.teacher;
    Loading data to table yinzhengjie.teacher
    Time taken: 0.256 seconds
    hive> select * from teacher;
    70    Dennis MacAlistair Ritchie
    49    Linus Benedict Torvalds
    68    Bjarne Stroustrup
    62    Guido van Rossum
    63    James Gosling
    60    Martin Odersky
    62    Rob Pike
    50    Rasmus Lerdorf
    50    Brendan Eich
    Time taken: 0.104 seconds, Fetched: 9 row(s)
    从本地加载数据到hive中已经存在的表(hive> load data local inpath '/home/yinzhengjie/download/teachers.txt' into table yinzhengjie.teacher;)
    hive (yinzhengjie)> load data inpath '/home/yinzhengjie/data/logs/umeng/raw-log/201808/06/2346' into table raw_logs partition(ym=201808 , day=06 ,hm=2346);
    Loading data to table yinzhengjie.raw_logs partition (ym=201808, day=6, hm=2346)
    Time taken: 1.846 seconds
    hive (yinzhengjie)>
    从hdfs上加载数据到hive中已经存在的表(hive (yinzhengjie)> load data inpath '/home/yinzhengjie/data/logs/umeng/raw-log/201808/06/2346' into table raw_logs partition(ym=201808 , day=06 ,hm=2346);)
    [yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/umeng_create_logs_ddl.sql
    use yinzhengjie ;
    create table if not exists startuplogs
      appChannel             string , 
      appId             string , 
      appPlatform             string , 
      appVersion             string , 
      brand             string , 
      carrier             string , 
      country             string , 
      createdAtMs             bigint , 
      deviceId             string , 
      deviceStyle             string , 
      ipAddress             string , 
      network             string , 
      osType             string , 
      province             string , 
      screenSize             string , 
      tenantId             string 
    partitioned by (ym int ,day int , hm int) 
    stored as parquet ;
    create table if not exists eventlogs
      appChannel             string , 
      appId             string , 
      appPlatform             string , 
      appVersion             string , 
      createdAtMs             bigint , 
      deviceId             string , 
      deviceStyle             string , 
      eventDurationSecs             bigint , 
      eventId             string , 
      osType             string , 
      tenantId             string 
    partitioned by (ym int ,day int , hm int) 
    stored as parquet ;
    create table if not exists errorlogs
      appChannel             string , 
      appId             string , 
      appPlatform             string , 
      appVersion             string , 
      createdAtMs             bigint , 
      deviceId             string , 
      deviceStyle             string , 
      errorBrief             string , 
      errorDetail             string , 
      osType             string , 
      tenantId             string 
    partitioned by (ym int ,day int , hm int) 
    stored as parquet ;
    create table if not exists usagelogs
      appChannel             string , 
      appId             string , 
      appPlatform             string , 
      appVersion             string , 
      createdAtMs             bigint , 
      deviceId             string , 
      deviceStyle             string , 
      osType             string , 
      singleDownloadTraffic             bigint , 
      singleUploadTraffic             bigint , 
      singleUseDurationSecs             bigint , 
      tenantId             string 
    partitioned by (ym int ,day int , hm int) 
    stored as parquet ;
    create table if not exists pagelogs
      appChannel             string , 
      appId             string , 
      appPlatform             string , 
      appVersion             string , 
      createdAtMs             bigint , 
      deviceId             string , 
      deviceStyle             string , 
      nextPage             string , 
      osType             string , 
      pageId             string , 
      pageViewCntInSession             int , 
      stayDurationSecs             bigint , 
      tenantId             string , 
      visitIndex             int 
    partitioned by (ym int ,day int , hm int) 
    stored as parquet ;
    [yinzhengjie@s101 download]$ 
    HQL测试语句([yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/umeng_create_logs_ddl.sql)
    hive (yinzhengjie)> show tables;
    Time taken: 0.044 seconds, Fetched: 6 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> source /home/yinzhengjie/download/umeng_create_logs_ddl.sql;
    Time taken: 0.008 seconds
    Time taken: 0.257 seconds
    Time taken: 0.058 seconds
    Time taken: 0.073 seconds
    Time taken: 0.065 seconds
    Time taken: 0.053 seconds
    hive (yinzhengjie)> show tables;
    Time taken: 0.014 seconds, Fetched: 11 row(s)
    hive (yinzhengjie)> 
    在hive中执行HQL语句文本文件(hive (yinzhengjie)> source /home/yinzhengjie/download/umeng_create_logs_ddl.sql;)
    [yinzhengjie@s101 ~]$ hive
    Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive> dfs -cat /user/hive/warehouse/yinzhengjie.db/teacher/teachers.txt;
    70    Dennis MacAlistair Ritchie
    49    Linus Benedict Torvalds
    68    Bjarne Stroustrup
    62    Guido van Rossum
    63    James Gosling
    60    Martin Odersky
    62    Rob Pike
    50    Rasmus Lerdorf
    50    Brendan Eich
    在hive的命令行窗口中查看hdfs文件系统中的文件内容(hive> dfs -cat /user/hive/warehouse/yinzhengjie.db/teacher/teachers.txt;)
    [yinzhengjie@s101 ~]$ hive
    Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive> ! ls /home/yinzhengjie/download;
    在hive命令行窗口查看Linux文件系统中的文件内容(hive> ! ls /home/yinzhengjie/download;)
    [yinzhengjie@s101 download]$ cat ~/.hivehistory 
    show databases;
    show databases;
    create table(id int,name string) row format delimited
    fields terminated by '	'
    lines terminated by '
    stored as textfile;
    create table users(id int , name string) row format delimited
    fields terminated by '	'
    lines terminated by '
    stored as textfile;
    load data local inpath 'user.txt' into table users;
    !cd /home/yinzhengjie
    load data local inpath 'user.txt' into table users;
    load data inpath  'user.txt' into table users;
    hdfs dfs -put 'user.txt';
    hdfs dfs put 'user.txt';
    dfs put 'user.txt';
    dfs -put 'user.txt';
    dfs -put 'user.txt' /;
    dfs -put user.txt ;
    dfs -put user.txt /;
    load data inpath  'user.txt' into table users;
    load data inpath  '/user.txt' into table users;
    show databases;
    use yinzhengjie
    show tables;
    SET hive.support.concurrency = true;
    show tables;
    use yinzhengjie;
    show tables;
    select * from yzj;
    SET hive.support.concurrency = true;
    SET hive.enforce.bucketing = true;
    SET hive.exec.dynamic.partition.mode = nonstrict;
    SET hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
    SET hive.compactor.initiator.on = true;
    SET hive.compactor.worker.threads = 1;
    select * from yzj;
    use yinzhengjie;
    SET hive.support.concurrency = true;
    SET hive.enforce.bucketing = true;
    SET hive.exec.dynamic.partition.mode = nonstrict;
    SET hive.txn.manager = org.apache.hadoop.hive.ql.lockmgr.DbTxnManager;
    SET hive.compactor.initiator.on = true;
    SET hive.compactor.worker.threads = 1;
    show tables;
    select * from yzj;
    show databases;
    use yinzhengjie;
    show tables;
    show databases;
    use yinzhengjie;
    show tables;
    select * from az_top3;
    show databases;
    use yinzhengjie
    show tables;
    use yinzhengjie;
    show databases;
    use yinzhengjie;
    show tables;
    create table Teacher(id int,name string)row format delimited fields terminated by '	';
    show tables;
    load data local inpath '/home/yinzhengjie/download/teachers.txt'
    show tables;
    drop table taacher;
    show databases;
    use yinzhengjie;
    show tables;
    drop table teacher;
    show tables;
    show tables;
    create table Teacher(id int,name string)row format delimited fields terminated by '	';
    show tables;
    drop table test1,test2,test3;
    drop table test1;
    drop table test2;
    drop table test3;
    drop table test4;
    show tables;
    drop table az_top3;
    drop table az_wc;
    show tbales;
    show databasers;
    show databases;
    drop database yinzhengjie;
    use yinzhengjie;
    show tables;
    drop table teacher;
    show tables;
    create table Teacher(id int,name string)row format delimited fields terminated by '	';
    show tables;
    load data local inpath '/home/yinzhengjie/download/teachers.txt';
    load data local inpath `/home/yinzhengjie/download/teachers.txt`;
    use yinzhengjie
    show tables;
    load data local inpath '/home/yinzhengjie/download/teachers.txt' into table yinzhengjie.teacher;
    select * from teacher;
    drop table teacher;
    create table Teacher(id int,name string)row format delimited fields terminated by '	';
    show tables;
    select * from teacher;
    load data local inpath '/home/yinzhengjie/download/teachers.txt' into table yinzhengjie.teacher;
    select * from teacher;
    dfs -cat /user/hive/warehouse/yinzhengjie.db/teacher/teachers.txt;
    dfs -lsr /;
    ! ls /home/yinzhengjie;
    ! ls /home/yinzhengjie/download;
    [yinzhengjie@s101 download]$ 
    查看hive中输入的所有历史命令([yinzhengjie@s101 download]$ cat ~/.hivehistory )
    [yinzhengjie@s101 download]$ hive -e "select * from yinzhengjie.teacher;"
    Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
    70    Dennis MacAlistair Ritchie
    49    Linus Benedict Torvalds
    68    Bjarne Stroustrup
    62    Guido van Rossum
    63    James Gosling
    60    Martin Odersky
    62    Rob Pike
    50    Rasmus Lerdorf
    50    Brendan Eich
    Time taken: 3.414 seconds, Fetched: 9 row(s)
    [yinzhengjie@s101 download]$
    在shell命令行中执行HQL语句([yinzhengjie@s101 download]$ hive -e "select * from yinzhengjie.teacher;")
    [yinzhengjie@s101 download]$ hive -f /home/yinzhengjie/download/hivef.sql 
    Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
    Time taken: 0.023 seconds
    Time taken: 0.085 seconds, Fetched: 2 row(s)
    70    Dennis MacAlistair Ritchie
    49    Linus Benedict Torvalds
    68    Bjarne Stroustrup
    62    Guido van Rossum
    63    James Gosling
    60    Martin Odersky
    62    Rob Pike
    50    Rasmus Lerdorf
    50    Brendan Eich
    Time taken: 2.044 seconds, Fetched: 9 row(s)
    [yinzhengjie@s101 download]$ 
    执行HQL语句的脚本文件([yinzhengjie@s101 download]$ hive -f /home/yinzhengjie/download/hivef.sql )
    [yinzhengjie@s101 ~]$ hive
    Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive> quit;
    [yinzhengjie@s101 ~]$ 
    [yinzhengjie@s101 ~]$ hive
    Logging initialized using configuration in jar:file:/soft/apache-hive-2.1.1-bin/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: true
    Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    hive> exit;
    [yinzhengjie@s101 ~]$ 
    退出hive窗口(hive> exit;或者hive> quit;)


    hive (yinzhengjie)> show databases;
    Time taken: 0.007 seconds, Fetched: 2 row(s)
    hive (yinzhengjie)> create database if not exists db_hive;
    Time taken: 0.034 seconds
    hive (yinzhengjie)> show databases;
    Time taken: 0.009 seconds, Fetched: 3 row(s)
    hive (yinzhengjie)> 
    创建一个数据库的标准写法(hive (yinzhengjie)> create database if not exists db_hive;),创建的数据库默认存放在hdfs中的“/user/hive/warehouse”
    hive (yinzhengjie)> show databases;
    Time taken: 0.008 seconds, Fetched: 3 row(s)
    hive (yinzhengjie)> create database if not exists db_hive2 location "/db_hive2";
    Time taken: 0.04 seconds
    hive (yinzhengjie)> show databases;
    Time taken: 0.006 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> 
    创建一个数据库,使用location关键字指定数据库在HDFS上的存放位置并起一个别名(hive (yinzhengjie)> create database if not exists db_hive2 location "/db_hive2";),这种方式我不推荐大家使用,因为它和defalut数据库的存储方式很像
        用户可以使用ALTER DATABASE 命令为某个数据库的DBPROPERTIES设置键-值对属性值,来描述这个数据库的属性信息。
        hive (yinzhengjie)> show databases;
        Time taken: 0.007 seconds, Fetched: 4 row(s)
        hive (yinzhengjie)> ALTER DATABASE db_hive set dbproperties('Owner'='yinzhengjie');        #给数据库添加额外的属性,注意,这里并没有修改数据库里的元数据!
        Time taken: 0.03 seconds
        hive (yinzhengjie)> desc database db_hive;                                                #使用这条命令是查不到的咱们定义的属性的哟!
        db_name    comment    location    owner_name    owner_type    parameters
        db_hive        hdfs://mycluster/user/hive/warehouse/db_hive.db    yinzhengjie    USER    
        Time taken: 0.017 seconds, Fetched: 1 row(s)
        hive (yinzhengjie)> desc database extended db_hive;                                        #我们需要在数据库前加一个extended关键字,就能查看到我们定义的数据库属性。
        db_name    comment    location    owner_name    owner_type    parameters
        db_hive        hdfs://mycluster/user/hive/warehouse/db_hive.db    yinzhengjie    USER    {Owner=yinzhengjie}
        Time taken: 0.011 seconds, Fetched: 1 row(s)
        hive (yinzhengjie)> 
    修改数据库属性( hive (yinzhengjie)> ALTER DATABASE db_hive set dbproperties('Owner'='yinzhengjie'); )
    hive (yinzhengjie)> show databases;                                #显示所有的数据库
    Time taken: 0.008 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> show databases like 'yin*';                    #过滤显示的查询的数据库
    Time taken: 0.009 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> desc database db_hive;                          #显示数据库信息
    db_name    comment    location    owner_name    owner_type    parameters
    db_hive        hdfs://mycluster/user/hive/warehouse/db_hive.db    yinzhengjie    USER    
    Time taken: 0.012 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)> desc database extended db_hive;                  #显示数据库详细信息,使用关键字:extended
    db_name    comment    location    owner_name    owner_type    parameters
    db_hive        hdfs://mycluster/user/hive/warehouse/db_hive.db    yinzhengjie    USER    {Owner=yinzhengjie}
    Time taken: 0.013 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> show databases;
    Time taken: 0.006 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> use default;                                     #使用数据库
    Time taken: 0.012 seconds
    hive (default)> 
    查询数据库的常用姿势介绍(hive (yinzhengjie)> show databases like 'yin*';)
    hive (yinzhengjie)> show databases;
    Time taken: 0.006 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> use db_hive2;                                #使用db_hive2数据库
    Time taken: 0.014 seconds
    hive (db_hive2)> show tables;                                    #db_hive2数据库中没有任何表
    Time taken: 0.015 seconds
    hive (db_hive2)> drop database if exists db_hive2;                #删除空的数据库db_hive2
    Time taken: 0.05 seconds
    hive (db_hive2)> show databases;
    Time taken: 0.006 seconds, Fetched: 3 row(s)
    hive (db_hive2)> use db_hive;                                    #使用db_hive数据库
    Time taken: 0.012 seconds
    hive (db_hive)> show tables;                                    #db_hive2数据库中是有数据表的
    Time taken: 0.016 seconds, Fetched: 3 row(s)
    hive (db_hive)> drop database db_hive cascade;                    #使用关键字cascade强制删除有数据的数据库db_hive
    Time taken: 0.304 seconds
    hive (db_hive)> use yinzhengjie;
    Time taken: 0.016 seconds
    hive (yinzhengjie)> show databases;
    Time taken: 0.007 seconds, Fetched: 2 row(s)
    hive (yinzhengjie)> 
    删除数据库的常用姿势介绍(hive (db_hive)> drop database db_hive cascade;)
        [(col_name data_type [COMMENT col_comment], ...)] 
        [COMMENT table_comment] 
        [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] 
        [CLUSTERED BY (col_name, col_name, ...) 
        [SORTED BY (col_name [ASC|DESC], ...)] INTO num_buckets BUCKETS] 
        [ROW FORMAT row_format] 
        [STORED AS file_format] 
        [LOCATION hdfs_path]
        a>.CREATE TABLE 创建一个指定名字的表。如果相同名字的表已经存在,则抛出异常;用户可以用 IF NOT EXISTS 选项来忽略这个异常。
        d>.PARTITIONED BY创建分区表
        e>.CLUSTERED BY创建分桶表
        f>.SORTED BY不常用
        g>.ROW FORMAT 
                            [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] 
                       | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
                用户在建表的时候可以自定义SerDe或者使用自带的SerDe。如果没有指定ROW FORMAT 或者ROW FORMAT DELIMITED,将会使用自带的SerDe。在建表的时候,用户还需要为表指定列,用户在指定表的列的同时也会指定自定义的SerDe,Hive通过SerDe确定表的具体的列的数据。
        h>.STORED AS指定存储文件类型
                如果文件数据是纯文本,可以使用STORED AS TEXTFILE。如果数据需要压缩,使用 STORED AS SEQUENCEFILE。
        i>.LOCATION :指定表在HDFS上的存储位置。
        默认创建的表都是所谓的管理表,有时也被称为内部表。因为这种表,Hive会(或多或少地)控制着数据的生命周期。Hive默认情况下会将这些表的数据存储在由配置项hive.metastore.warehouse.dir(例如,/user/hive/warehouse)所定义的目录的子目录下。    当我们删除一个管理表时,Hive也会删除这个表中数据。管理表不适合和其他工具共享数据。
    管理表-普通创建表的标准写法,指定存储方式以及表创建的数据库名称(hive (yinzhengjie)> create table if not exists Student(id int,name string)row format delimited fields terminated by ' ' stored as textfile location '/user/hive/warehouse/yinzhengjie.db';)
    hive (yinzhengjie)> show tables;
    Time taken: 0.015 seconds, Fetched: 2 row(s)
    hive (yinzhengjie)> create table if not exists teacherbak as select id, name from teacher;                #根据查询结果创建表,即查询的结果会添加到新创建的表中,它会自动启用一个job
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = yinzhengjie_20180806000505_71d796a2-3129-4497-9741-b5d39abd58c9
    Total jobs = 3
    Launching Job 1 out of 3
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1533518652134_0001, Tracking URL = http://s101:8088/proxy/application_1533518652134_0001/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533518652134_0001
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
    2018-08-06 00:05:26,132 Stage-1 map = 0%,  reduce = 0%
    2018-08-06 00:05:37,668 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.02 sec
    MapReduce Total cumulative CPU time: 2 seconds 20 msec
    Ended Job = job_1533518652134_0001
    Stage-4 is selected by condition resolver.
    Stage-3 is filtered out by condition resolver.
    Stage-5 is filtered out by condition resolver.
    Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/.hive-staging_hive_2018-08-06_00-05-05_947_8165112419833752968-1/-ext-10002
    Moving data to directory hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/teacherbak
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1   Cumulative CPU: 2.02 sec   HDFS Read: 3610 HDFS Write: 258 SUCCESS
    Total MapReduce CPU Time Spent: 2 seconds 20 msec
    id    name
    Time taken: 33.117 seconds
    hive (yinzhengjie)> show tables;
    Time taken: 0.014 seconds, Fetched: 3 row(s)
    hive (yinzhengjie)> select id, name from teacher;                #查看teacher表中的数据
    id    name
    70    Dennis MacAlistair Ritchie
    49    Linus Benedict Torvalds
    68    Bjarne Stroustrup
    62    Guido van Rossum
    63    James Gosling
    60    Martin Odersky
    62    Rob Pike
    50    Rasmus Lerdorf
    50    Brendan Eich
    Time taken: 0.093 seconds, Fetched: 9 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select id, name from teacherbak;            #查看teacherbak表中的数据,我们会发现其内容和teacher一致
    id    name
    70    Dennis MacAlistair Ritchie
    49    Linus Benedict Torvalds
    68    Bjarne Stroustrup
    62    Guido van Rossum
    63    James Gosling
    60    Martin Odersky
    62    Rob Pike
    50    Rasmus Lerdorf
    50    Brendan Eich
    Time taken: 0.083 seconds, Fetched: 9 row(s)
    hive (yinzhengjie)> 
    管理表(内部表)-根据查询结果创建表,即查询的结果会添加到新创建的表中(hive (yinzhengjie)> create table if not exists teacherbak as select id, name from teacher;)
    hive (yinzhengjie)> show tables;
    Time taken: 0.013 seconds, Fetched: 3 row(s)
    hive (yinzhengjie)> desc teacher;
    col_name    data_type    comment
    id                      int                                         
    name                    string                                      
    Time taken: 0.029 seconds, Fetched: 2 row(s)
    hive (yinzhengjie)> select * from teacher;
    teacher.id    teacher.name
    70    Dennis MacAlistair Ritchie
    49    Linus Benedict Torvalds
    68    Bjarne Stroustrup
    62    Guido van Rossum
    63    James Gosling
    60    Martin Odersky
    62    Rob Pike
    50    Rasmus Lerdorf
    50    Brendan Eich
    Time taken: 0.1 seconds, Fetched: 9 row(s)
    hive (yinzhengjie)> create table if not exists teacherCopy like teacher;        #根据已经存在的表结构创建表,即只复制表结构,并不会复制表中的数据
    Time taken: 0.181 seconds
    hive (yinzhengjie)> show tables;
    Time taken: 0.014 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> select * from teachercopy;
    teachercopy.id    teachercopy.name
    Time taken: 0.103 seconds
    hive (yinzhengjie)> desc teachercopy;
    col_name    data_type    comment
    id                      int                                         
    name                    string                                      
    Time taken: 0.03 seconds, Fetched: 2 row(s)
    hive (yinzhengjie)> 
    管理表(内部表)-根据已经存在的表结构创建表,即只复制表结构,并不会复制表中的数据(hive (yinzhengjie)> create table if not exists teacherCopy like teacher;)
    hive (yinzhengjie)> show tables;
    Time taken: 0.012 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> desc formatted teacher;                        #查询表的类型
    col_name    data_type    comment
    # col_name                data_type               comment             
    id                      int                                         
    name                    string                                      
    # Detailed Table Information          
    Database:               yinzhengjie              
    Owner:                  yinzhengjie              
    CreateTime:             Sun Aug 05 19:55:34 PDT 2018     
    LastAccessTime:         UNKNOWN                  
    Retention:              0                        
    Location:               hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/teacher     
    Table Type:             MANAGED_TABLE                            #显示器对面的小哥哥小姐姐往这里看,这里可以查看当前表的类型哟,这里明显是管理表,也称为内部表。
    Table Parameters:          
        numFiles                1                   
        numRows                 0                   
        rawDataSize             0                   
        totalSize               179                 
        transient_lastDdlTime    1533524151          
    # Storage Information          
    SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe     
    InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
    OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat     
    Compressed:             No                       
    Num Buckets:            -1                       
    Bucket Columns:         []                       
    Sort Columns:           []                       
    Storage Desc Params:          
    Time taken: 0.036 seconds, Fetched: 31 row(s)
    hive (yinzhengjie)> 
    管理表(内部表)-查询表的类型(hive (yinzhengjie)> desc formatted teacher;)
    [yinzhengjie@s101 download]$ pwd
    [yinzhengjie@s101 download]$ 
    [yinzhengjie@s101 download]$ cat dept.dat 
    10    ACCOUNTING    2700
    20    RESEARCH    3800
    30    SALES    5900
    40    OPERATIONS    4700
    [yinzhengjie@s101 download]$ 
    [yinzhengjie@s101 download]$ more emp.dat 
    7369    SMITH    CLERK    7902    1980-12-17    800.00        20
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.00    300.00    30
    7521    WARD    SALESMAN    7698    1981-2-22    1250.00    500.00    30
    7566    JONES    MANAGER    7839    1981-4-2    2975.00        20
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.00    1400.00    30
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.00        30
    7782    CLARK    MANAGER    7839    1981-6-9    2450.00        10
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.00        20
    7839    KING    PRESIDENT        1981-11-17    5000.00        10
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.00    0.00    30
    7876    ADAMS    CLERK    7788    1987-5-23    1100.00        20
    7900    JAMES    CLERK    7698    1981-12-3    950.00        30
    7902    FORD    ANALYST    7566    1981-12-3    3000.00        20
    7934    MILLER    CLERK    7782    1982-1-23    1300.00        10
    [yinzhengjie@s101 download]$ 
    hive (yinzhengjie)> create external table if not exists yinzhengjie.dept(
                      >     deptno int,
                      >     dname string,
                      >     loc int
                      > )
                      > row format delimited fields terminated by '	';
    Time taken: 0.096 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> create external table if not exists yinzhengjie.emp(
                      >     empno int,
                      >     ename string,
                      >     job string,
                      >     mgr int,
                      >     hiredate string,
                      >     sal double, 
                      >     comm double,
                      >     deptno int
                      > )
                      > row format delimited fields terminated by '	';
    Time taken: 0.064 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.dat' into table yinzhengjie.dept;
    Loading data to table yinzhengjie.dept
    Time taken: 0.222 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select * from dept;                            #导入成功后需要查看dept表中是否有数据
    dept.deptno    dept.dname    dept.loc
    10    ACCOUNTING    2700
    20    RESEARCH    3800
    30    SALES    5900
    40    OPERATIONS    4700
    Time taken: 0.088 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/emp.dat' into table yinzhengjie.emp;
    Loading data to table yinzhengjie.emp
    Time taken: 0.21 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select * from emp;                            #导入成功后需要查看emp表中是否有数据
    emp.empno    emp.ename    emp.job    emp.mgr    emp.hiredate    emp.sal    emp.comm    emp.deptno
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10
    Time taken: 0.079 seconds, Fetched: 14 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> desc formatted dept;                #查看dept表格式化数据
    col_name    data_type    comment
    # col_name                data_type               comment             
    deptno                  int                                         
    dname                   string                                      
    loc                     int                                         
    # Detailed Table Information          
    Database:               yinzhengjie              
    Owner:                  yinzhengjie              
    CreateTime:             Mon Aug 06 00:52:48 PDT 2018     
    LastAccessTime:         UNKNOWN                  
    Retention:              0                        
    Location:               hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/dept     
    Table Type:             EXTERNAL_TABLE                   #Duang~显示器面前的小哥哥小姐姐往这看,这里有查看dept表的的类型是外部表哟!
    Table Parameters:          
        EXTERNAL                TRUE                
        numFiles                1                   
        numRows                 0                   
        rawDataSize             0                   
        totalSize               69                  
        transient_lastDdlTime    1533542290          
    # Storage Information          
    SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe     
    InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
    OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat     
    Compressed:             No                       
    Num Buckets:            -1                       
    Bucket Columns:         []                       
    Sort Columns:           []                       
    Storage Desc Params:          
    Time taken: 0.036 seconds, Fetched: 33 row(s)
    hive (yinzhengjie)> desc formatted emp;                #查看emp表格式化数据
    col_name    data_type    comment
    # col_name                data_type               comment             
    empno                   int                                         
    ename                   string                                      
    job                     string                                      
    mgr                     int                                         
    hiredate                string                                      
    sal                     double                                      
    comm                    double                                      
    deptno                  int                                         
    # Detailed Table Information          
    Database:               yinzhengjie              
    Owner:                  yinzhengjie              
    CreateTime:             Mon Aug 06 00:55:41 PDT 2018     
    LastAccessTime:         UNKNOWN                  
    Retention:              0                        
    Location:               hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/emp     
    Table Type:             EXTERNAL_TABLE                   #Duang~显示器面前的小哥哥小姐姐往这看,这里有查看emp表的的类型是外部表哟!
    Table Parameters:          
        EXTERNAL                TRUE                
        numFiles                1                   
        numRows                 0                   
        rawDataSize             0                   
        totalSize               657                 
        transient_lastDdlTime    1533542299          
    # Storage Information          
    SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe     
    InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
    OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat     
    Compressed:             No                       
    Num Buckets:            -1                       
    Bucket Columns:         []                       
    Sort Columns:           []                       
    Storage Desc Params:          
    Time taken: 0.036 seconds, Fetched: 38 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> show tables;
    Time taken: 0.014 seconds, Fetched: 6 row(s)
    hive (yinzhengjie)> drop table dept;
    Time taken: 0.122 seconds
    hive (yinzhengjie)> drop table emp;
    Time taken: 0.079 seconds
    hive (yinzhengjie)> show tables;                                                        #你会发现删除了元数据表,并没有删除真实数据,我们可以在hive中通过dfs命令查看真实数据
    Time taken: 0.013 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> dfs -cat /user/hive/warehouse/yinzhengjie.db/dept/dept.dat;            #怎么样?hdfs中的文件内容依旧存在,并没有删除,hive只是删除了元数据而已。
    10    ACCOUNTING    2700
    20    RESEARCH    3800
    30    SALES    5900
    40    OPERATIONS    4700
    hive (yinzhengjie)> 
                      > dfs -cat /user/hive/warehouse/yinzhengjie.db/emp/emp.dat;            #怎么样?hdfs中的文件内容依旧存在,并没有删除,hive只是删除了元数据而已。
    7369    SMITH    CLERK    7902    1980-12-17    800.00        20
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.00    300.00    30
    7521    WARD    SALESMAN    7698    1981-2-22    1250.00    500.00    30
    7566    JONES    MANAGER    7839    1981-4-2    2975.00        20
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.00    1400.00    30
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.00        30
    7782    CLARK    MANAGER    7839    1981-6-9    2450.00        10
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.00        20
    7839    KING    PRESIDENT        1981-11-17    5000.00        10
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.00    0.00    30
    7876    ADAMS    CLERK    7788    1987-5-23    1100.00        20
    7900    JAMES    CLERK    7698    1981-12-3    950.00        30
    7902    FORD    ANALYST    7566    1981-12-3    3000.00        20
    7934    MILLER    CLERK    7782    1982-1-23    1300.00        10
    hive (yinzhengjie)> 
    [yinzhengjie@s101 download]$ cat users.txt 
    1    yinzhengjie    26
    2    Guido van Rossum    62        
    3    Martin Odersky    60
    4    Rasmus Lerdorf    50
    [yinzhengjie@s101 download]$ 
    [yinzhengjie@s101 download]$ cat dept.txt 
    10    开发部门    20000
    20    运维部门    13000
    30    测试部门    8000
    40    产品部门    6000
    50    销售部门    15000
    60    财务部门    17000
    70    人事部门    16000
    [yinzhengjie@s101 download]$ 
    hive (yinzhengjie)> show tables;
    Time taken: 0.038 seconds
    hive (yinzhengjie)> create table dept_partition(
                      >     deptno int,
                      >     dname string,
                      >     loc string
                      > )
                      > partitioned by (month string)
                      > row format delimited fields terminated by '	';
    Time taken: 0.262 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> show tables;
    Time taken: 0.035 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)> 
    分区表-创建一个分区表语法(hive (yinzhengjie)> create table dept_partition(deptno int,dname string,loc string) partitioned by (month string)row format delimited fields terminated by ' ';)
    hive (yinzhengjie)> show tables;
    Time taken: 0.016 seconds, Fetched: 6 row(s)
    hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.txt' into table yinzhengjie.dept_partition partition(month='201803');            #加载数据指定分区
    Loading data to table yinzhengjie.dept_partition partition (month=201803)
    Time taken: 0.609 seconds
    hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.txt' into table yinzhengjie.dept_partition partition(month='201804');
    Loading data to table yinzhengjie.dept_partition partition (month=201804)
    Time taken: 0.868 seconds
    hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.txt' into table yinzhengjie.dept_partition partition(month='201805');
    Loading data to table yinzhengjie.dept_partition partition (month=201805)
    Time taken: 0.462 seconds
    hive (yinzhengjie)> select * from dept_partition;
    dept_partition.deptno    dept_partition.dname    dept_partition.loc    dept_partition.month
    10    开发部门    20000    201803
    20    运维部门    13000    201803
    30    测试部门    8000    201803
    40    产品部门    6000    201803
    50    销售部门    15000    201803
    60    财务部门    17000    201803
    70    人事部门    16000    201803
    10    开发部门    20000    201804
    20    运维部门    13000    201804
    30    测试部门    8000    201804
    40    产品部门    6000    201804
    50    销售部门    15000    201804
    60    财务部门    17000    201804
    70    人事部门    16000    201804
    10    开发部门    20000    201805
    20    运维部门    13000    201805
    30    测试部门    8000    201805
    40    产品部门    6000    201805
    50    销售部门    15000    201805
    60    财务部门    17000    201805
    70    人事部门    16000    201805
    Time taken: 0.129 seconds, Fetched: 21 row(s)
    hive (yinzhengjie)> select * from dept_partition where month='201805';
    dept_partition.deptno    dept_partition.dname    dept_partition.loc    dept_partition.month
    10    开发部门    20000    201805
    20    运维部门    13000    201805
    30    测试部门    8000    201805
    40    产品部门    6000    201805
    50    销售部门    15000    201805
    60    财务部门    17000    201805
    70    人事部门    16000    201805
    Time taken: 1.017 seconds, Fetched: 7 row(s)
    hive (yinzhengjie)> 
    分区表-加载数据指定一个分区表(hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.txt' into table yinzhengjie.dept_partition partition(month='201805');)
    hive (yinzhengjie)> show partitions dept_partition;
    Time taken: 0.563 seconds, Fetched: 3 row(s)
    hive (yinzhengjie)> 
    分区表-查看分区表现有的分区个数(hive (yinzhengjie)> show partitions dept_partition;)
    hive (yinzhengjie)> select * from dept_partition where month='201805';                    #单分区查询
    dept_partition.deptno    dept_partition.dname    dept_partition.loc    dept_partition.month
    10    开发部门    20000    201805
    20    运维部门    13000    201805
    30    测试部门    8000    201805                       
    40    产品部门    6000    201805
    50    销售部门    15000    201805
    60    财务部门    17000    201805
    70    人事部门    16000    201805
    Time taken: 1.017 seconds, Fetched: 7 row(s)
    hive (yinzhengjie)>                    
    hive (yinzhengjie)> select * from dept_partition where month='201803'
                      > union
                      > select * from dept_partition where month='201804'
                      > union
                      > select * from dept_partition where month='201805';                        #多分区联合查询,你会发现它的速度还不如select * from dept_partition;
    u3.deptno    u3.dname    u3.loc    u3.month
    10    开发部门    20000    201803
    10    开发部门    20000    201804
    10    开发部门    20000    201805
    20    运维部门    13000    201803
    20    运维部门    13000    201804
    20    运维部门    13000    201805
    30    测试部门    8000    201803
    30    测试部门    8000    201804
    30    测试部门    8000    201805
    40    产品部门    6000    201803
    40    产品部门    6000    201804
    40    产品部门    6000    201805
    50    销售部门    15000    201803
    50    销售部门    15000    201804
    50    销售部门    15000    201805
    60    财务部门    17000    201803
    60    财务部门    17000    201804
    60    财务部门    17000    201805
    70    人事部门    16000    201803
    70    人事部门    16000    201804
    70    人事部门    16000    201805
    Time taken: 278.849 seconds, Fetched: 21 row(s)
    hive (yinzhengjie)> 
    分区表-查询分区表数据之单分区查询个多分区联合查询(hive (yinzhengjie)> select * from dept_partition where month='201803' union select * from dept_partition where month='201804' union select * from dept_partition where month='201805'; )
    hive (yinzhengjie)> show partitions dept_partition;                                    #查看分区表中已经有的分区数
    Time taken: 0.563 seconds, Fetched: 3 row(s)
    hive (yinzhengjie)> ALTER TABLE dept_partition ADD PARTITION(month='201806');        #添加单个分区
    Time taken: 0.562 seconds
    hive (yinzhengjie)> show partitions dept_partition;
    Time taken: 0.096 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> ALTER TABLE dept_partition ADD PARTITION(month='201807') PARTITION(month='201808') PARTITION(month='201809');    #添加多个分区
    Time taken: 0.22 seconds
    hive (yinzhengjie)> show partitions dept_partition;
    Time taken: 0.097 seconds, Fetched: 7 row(s)
    hive (yinzhengjie)> 
    分区表-增加分区之创建单个分区和同时创建多个分区案例展示(hive (yinzhengjie)> ALTER TABLE dept_partition ADD PARTITION(month='201807') PARTITION(month='201808') PARTITION(month='201809');)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> show partitions dept_partition;                #查看当前已经有的分区数
    Time taken: 0.114 seconds, Fetched: 7 row(s)
    hive (yinzhengjie)> ALTER TABLE dept_partition DROP PARTITION(month='201807');        #删除单个分区
    Dropped the partition month=201807
    Time taken: 0.893 seconds
    hive (yinzhengjie)> show partitions dept_partition;
    Time taken: 0.083 seconds, Fetched: 6 row(s)
    hive (yinzhengjie)> ALTER TABLE dept_partition DROP PARTITION(month='201808'),PARTITION(month='201809');    #同时删除多个分区
    Dropped the partition month=201808
    Dropped the partition month=201809
    Time taken: 0.364 seconds
    hive (yinzhengjie)> show partitions dept_partition;
    Time taken: 0.104 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> 
    分区表-删除分区之删除单个分区和同时删除多个分区案例展示(hive (yinzhengjie)> ALTER TABLE dept_partition DROP PARTITION(month='201808'),PARTITION(month='201809');)
    hive (yinzhengjie)> DESC FORMATTED dept_partition;
    col_name    data_type    comment
    # col_name                data_type               comment             
    deptno                  int                                         
    dname                   string                                      
    loc                     string                                      
    # Partition Information                                                  #这里是分区的详细信息
    # col_name                data_type               comment                
    month                   string                                                  
    # Detailed Table Information          
    Database:               yinzhengjie              
    Owner:                  yinzhengjie              
    CreateTime:             Wed Aug 08 21:08:14 PDT 2018     
    LastAccessTime:         UNKNOWN                  
    Retention:              0                        
    Location:               hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/dept_partition     
    Table Type:             MANAGED_TABLE            
    Table Parameters:          
        transient_lastDdlTime    1533787694          
    # Storage Information          
    SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe     
    InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
    OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat     
    Compressed:             No                       
    Num Buckets:            -1                       
    Bucket Columns:         []                       
    Sort Columns:           []                       
    Storage Desc Params:          
    Time taken: 1.813 seconds, Fetched: 33 row(s)
    hive (yinzhengjie)> 
    分区表-查看分区表的结构(hive (yinzhengjie)> DESC FORMATTED dept_partition;)
    hive (yinzhengjie)> create table users (
                      >     id int,
                      >     name string, 
                      >     age int
                      > )
                      > partitioned by (province string, city string)
                      > row format delimited fields terminated by '	';
    Time taken: 1.046 seconds
    hive (yinzhengjie)> show tables;
    Time taken: 0.26 seconds, Fetched: 7 row(s)
    hive (yinzhengjie)> 
    分区表-创建二级分区表语法(hive (yinzhengjie)> create table users (id int,name string, age int) partitioned by (province string, city string) row format delimited fields terminated by ' ';)
    hive (yinzhengjie)> create table users (id int,name string, age int) partitioned by (province string, city string) row format delimited fields terminated by '	';        #创建二级分区
    Time taken: 0.071 seconds
    hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/users.txt' into table users partition(province='hebei',city='shijiazhuang');                #加载数到擦创建的二级分区中
    Loading data to table yinzhengjie.users partition (province=hebei, city=shijiazhuang)
    Time taken: 0.482 seconds
    hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/users.txt' into table users partition(province='shanxi',city='xian');
    Loading data to table yinzhengjie.users partition (province=shanxi, city=xian)
    Time taken: 0.414 seconds
    hive (yinzhengjie)> select * from users;
    users.id    users.name    users.age    users.province    users.city
    1    yinzhengjie    26    hebei    shijiazhuang
    2    Guido van Rossum    62    hebei    shijiazhuang
    3    Martin Odersky    60    hebei    shijiazhuang
    4    Rasmus Lerdorf    50    hebei    shijiazhuang
    1    yinzhengjie    26    shanxi    xian
    2    Guido van Rossum    62    shanxi    xian
    3    Martin Odersky    60    shanxi    xian
    4    Rasmus Lerdorf    50    shanxi    xian
    Time taken: 0.101 seconds, Fetched: 8 row(s)
    hive (yinzhengjie)> select * from users where province='hebei';                #查询分区表中仅含有province='hebei'的数据
    users.id    users.name    users.age    users.province    users.city
    1    yinzhengjie    26    hebei    shijiazhuang
    2    Guido van Rossum    62    hebei    shijiazhuang
    3    Martin Odersky    60    hebei    shijiazhuang
    4    Rasmus Lerdorf    50    hebei    shijiazhuang
    Time taken: 1.775 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> 
    分区表-加载数据到二级分区表中(hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/users.txt' into table users partition(province='hebei',city='shijiazhuang');)
    hive (yinzhengjie)> dfs -mkdir -p /user/hive/warehouse/yinzhengjie.db/users/province=hebei/city=handan;            #在hdfs上创建目录
    hive (yinzhengjie)> 
    hive (yinzhengjie)> dfs -put /home/yinzhengjie/download/users.txt /user/hive/warehouse/yinzhengjie.db/users/province=hebei/city=handan;        #将本地文件的数据上传到hdfs上
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select * from users where province='hebei' and city='handan';                #很显然,查看数据是没有的
    users.id    users.name    users.age    users.province    users.city
    Time taken: 0.304 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> msck repair table users;                                                    #手动执行修复命令
    Partitions not in metastore:    users:province=hebei/city=handan
    Repair: Added partition to metastore users:province=hebei/city=handan
    Time taken: 0.487 seconds, Fetched: 2 row(s)
    hive (yinzhengjie)> select * from users where province='hebei' and city='handan';                #再次查看数据,发现已经是有数据的
    users.id    users.name    users.age    users.province    users.city
    1    yinzhengjie    26    hebei    handan
    2    Guido van Rossum    62    hebei    handan
    3    Martin Odersky    60    hebei    handan
    4    Rasmus Lerdorf    50    hebei    handan
    Time taken: 0.156 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> 
    分区表-把数据直接上传到分区目录上,让分区表和数据产生关联的方式一:上传数据后修复(hive (yinzhengjie)> msck repair table users;)
    hive (yinzhengjie)> dfs -mkdir -p /user/hive/warehouse/yinzhengjie.db/users/province=shanxi/city=ankang;
    hive (yinzhengjie)> dfs -put /home/yinzhengjie/download/users.txt /user/hive/warehouse/yinzhengjie.db/users/province=shanxi/city=ankang;
    hive (yinzhengjie)> select * from users where province='shanxi' and city='ankang';            #查询数据,此时数据是没有查到的
    users.id    users.name    users.age    users.province    users.city
    Time taken: 0.112 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> ALTER TABLE users add partition(province='shanxi',city='ankang');       #上传数据后添加分区
    Time taken: 0.14 seconds
    hive (yinzhengjie)> select * from users where province='shanxi' and city='ankang';            #再次查询数据,你会发现数据又有了
    users.id    users.name    users.age    users.province    users.city
    1    yinzhengjie    26    shanxi    ankang
    2    Guido van Rossum    62    shanxi    ankang
    3    Martin Odersky    60    shanxi    ankang
    4    Rasmus Lerdorf    50    shanxi    ankang
    Time taken: 0.156 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> 
    分区表-把数据直接上传到分区目录上,让分区表和数据产生关联的方式二:上传数据后添加分区(hive (yinzhengjie)> ALTER TABLE users add partition(province='shanxi',city='ankang'); )
    hive (yinzhengjie)> dfs -mkdir -p /user/hive/warehouse/yinzhengjie.db/users/province=shanxi/city=hanzhong;                #在hdfs上创建目录
    hive (yinzhengjie)> select * from users where province='shanxi' and city='hanzhong';                                    #很显然,查看数据是没有的
    users.id    users.name    users.age    users.province    users.city
    Time taken: 0.148 seconds
    hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/users.txt' into table users partition(province='shanxi',city='hanzhong');        #上传数据后load数据到分区
    Loading data to table yinzhengjie.users partition (province=shanxi, city=hanzhong)
    Time taken: 0.593 seconds
    hive (yinzhengjie)> select * from users where province='shanxi' and city='hanzhong';                                    #再次查看数据,发现已经是有数据的
    users.id    users.name    users.age    users.province    users.city
    1    yinzhengjie    26    shanxi    hanzhong
    2    Guido van Rossum    62    shanxi    hanzhong
    3    Martin Odersky    60    shanxi    hanzhong
    4    Rasmus Lerdorf    50    shanxi    hanzhong
    Time taken: 0.104 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> 
    分区表-把数据直接上传到分区目录上,让分区表和数据产生关联的方式三:上传数据后load数据到分区(hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/users.txt' into table users partition(province='shanxi',city='hanzhong');)
    分桶表-创建分桶表(hive (yinzhengjie)> create table stu_buck(id int,name string) clustered by(id) into 4 buckets row format delimited fields terminated by '	';)
    hive (yinzhengjie)> create table stu_buck(
                      >     id int,
                      >     name string
                      > )
                      > clustered by(id) 
                      > into 4 buckets
                      > row format delimited fields terminated by '	';            #创建分桶表
    Time taken: 0.246 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> desc formatted stu_buck;                                #查看表结构
    col_name    data_type    comment
    # col_name                data_type               comment             
    id                      int                                         
    name                    string                                      
    # Detailed Table Information          
    Database:               yinzhengjie              
    Owner:                  yinzhengjie              
    CreateTime:             Fri Aug 10 00:52:10 PDT 2018     
    LastAccessTime:         UNKNOWN                  
    Retention:              0                        
    Location:               hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/stu_buck     
    Table Type:             MANAGED_TABLE            
    Table Parameters:          
        numFiles                0                   
        numRows                 0                   
        rawDataSize             0                   
        totalSize               0                   
        transient_lastDdlTime    1533887530          
    # Storage Information          
    SerDe Library:          org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe     
    InputFormat:            org.apache.hadoop.mapred.TextInputFormat     
    OutputFormat:           org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat     
    Compressed:             No                       
    Num Buckets:            4                       #小哥哥小姐姐们,快看这里,这是4个分桶表。             
    Bucket Columns:         [id]                     
    Sort Columns:           []                       
    Storage Desc Params:          
    Time taken: 0.128 seconds, Fetched: 32 row(s)
    hive (yinzhengjie)> 
    分桶表-创建分桶表(hive (yinzhengjie)> create table stu_buck(id int,name string) clustered by(id) into 4 buckets row format delimited fields terminated by ' ';)
    分桶表-导入数据到分桶表中(hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/stu_buck.txt' into table stu_buck;)
    hive (yinzhengjie)> ! cat /home/yinzhengjie/download/stu_buck.txt;                                            #查看本地文件内容
    1001    ss1
    1002    ss2
    1003    ss3
    1004    ss4
    1005    ss5
    1006    ss6
    1007    ss7
    1008    ss8
    1009    ss9
    1010    ss10
    1011    ss11
    1012    ss12
    1013    ss13
    1014    ss14
    1015    ss15
    1016    ss16
    hive (yinzhengjie)> 
    hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/stu_buck.txt' into table stu_buck;    #将本地文件内容导入到hive表中
    Loading data to table yinzhengjie.stu_buck
    Time taken: 0.306 seconds
    hive (yinzhengjie)>
    hive (yinzhengjie)> select * from stu_buck;                    #查询桶表的内容
    stu_buck.id    stu_buck.name
    1001    ss1
    1002    ss2
    1003    ss3
    1004    ss4
    1005    ss5
    1006    ss6
    1007    ss7
    1008    ss8
    1009    ss9
    1010    ss10
    1011    ss11
    1012    ss12
    1013    ss13
    1014    ss14
    1015    ss15
    1016    ss16
    Time taken: 0.088 seconds, Fetched: 16 row(s)
    hive (yinzhengjie)> 
    分桶表-导入数据到分桶表中(hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/stu_buck.txt' into table stu_buck;)
    分桶表-创建分桶表时,数据通过子查询的方式导入(hive (yinzhengjie)> insert into table stu_buck select id, name from stu;)
    hive (yinzhengjie)> create table stu(
                      >     id int,
                      >     name string
                      > )
                      > row format delimited fields terminated by '	';                                            #先建一个普通的stu表
    Time taken: 0.148 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/stu_buck.txt' into table stu;        #向普通的stu表中导入数据
    Loading data to table yinzhengjie.stu
    Time taken: 0.186 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> truncate table stu_buck;                                                                #清空stu_buck表中数据
    Time taken: 0.098 seconds
    hive (yinzhengjie)> select * from stu_buck;                                                                    #导入数据到分桶表,通过子查询的方式
    stu_buck.id    stu_buck.name
    Time taken: 0.103 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> insert into table stu_buck select id, name from stu;                                    #导入数据到分桶表,通过子查询的方式
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = yinzhengjie_20180810010832_901bd21c-690c-48b5-9282-c3900c960245
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 2
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0049, Tracking URL = http://s101:8088/proxy/application_1533789743141_0049/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0049
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 2
    2018-08-10 01:08:54,781 Stage-1 map = 0%,  reduce = 0%
    2018-08-10 01:09:34,871 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.52 sec
    2018-08-10 01:10:01,903 Stage-1 map = 100%,  reduce = 50%, Cumulative CPU 5.3 sec
    2018-08-10 01:10:03,970 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.01 sec
    MapReduce Total cumulative CPU time: 8 seconds 10 msec
    Ended Job = job_1533789743141_0049
    Loading data to table yinzhengjie.stu_buck
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 2   Cumulative CPU: 8.01 sec   HDFS Read: 11021 HDFS Write: 303 SUCCESS
    Total MapReduce CPU Time Spent: 8 seconds 10 msec
    id    name
    Time taken: 95.111 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select * from stu_buck;                                #查询分桶的数据
    stu_buck.id    stu_buck.name
    1016    ss16
    1012    ss12
    1008    ss8
    1004    ss4
    1001    ss1
    1013    ss13
    1005    ss5
    1009    ss9
    1014    ss14
    1010    ss10
    1006    ss6
    1002    ss2
    1015    ss15
    1007    ss7
    1003    ss3
    1011    ss11
    Time taken: 0.073 seconds, Fetched: 16 row(s)
    hive (yinzhengjie)> 
    分桶表-创建分桶表时,数据通过子查询的方式导入(hive (yinzhengjie)> insert into table stu_buck select id, name from stu;)
    hive (yinzhengjie)> show tables;                                    #查看当前数据库已经存在的表
    Time taken: 0.071 seconds, Fetched: 7 row(s)
    hive (yinzhengjie)> ALTER TABLE users RENAME TO myusers;            #重命名表,将users表名改为myusers
    Time taken: 0.341 seconds
    hive (yinzhengjie)> show tables;                                    #再次查看当前数据库已经存在的表,发现表名称已经修改了
    Time taken: 0.011 seconds, Fetched: 7 row(s)
    hive (yinzhengjie)> 
    修改表-重名名表实操案例(hive (yinzhengjie)> ALTER TABLE users RENAME TO myusers;)
    hive (yinzhengjie)> desc dept_partition;                                        #查看表结构
    col_name    data_type    comment
    deptno                  int                                         
    dname                   string                                      
    loc                     string                                      
    month                   string                                      
    # Partition Information          
    # col_name                data_type               comment             
    month                   string                                      
    Time taken: 0.054 seconds, Fetched: 9 row(s)
    hive (yinzhengjie)> ALTER TABLE dept_partition ADD COLUMNS(desc string);        #添加新字段(列),温馨提示:ADD是代表新增一字段,字段位置在所有列后面(partition列前),REPLACE则是表示替换表中所有字段。
    Time taken: 0.176 seconds
    hive (yinzhengjie)> desc dept_partition;                                        #再次查看表结构
    col_name    data_type    comment
    deptno                  int                                         
    dname                   string                                      
    loc                     string                                      
    desc                    string                                      
    month                   string                                      
    # Partition Information          
    # col_name                data_type               comment             
    month                   string                                      
    Time taken: 0.059 seconds, Fetched: 10 row(s)
    hive (yinzhengjie)> 
    修改表-添加列实操案例(hive (yinzhengjie)> ALTER TABLE dept_partition ADD COLUMNS(desc string);)
    hive (yinzhengjie)> desc dept_partition;                                                #查看表结构
    col_name    data_type    comment
    deptno                  int                                         
    dname                   string                                      
    loc                     string                                      
    month                   string                                      
    # Partition Information          
    # col_name                data_type               comment             
    month                   string                                      
    Time taken: 0.054 seconds, Fetched: 9 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> alter table dept_partition change column desc deptdesc string;        #修改列名实操案例
    Time taken: 0.153 seconds
    hive (yinzhengjie)> desc dept_partition;
    col_name    data_type    comment
    deptno                  int                                         
    dname                   string                                      
    loc                     string                                      
    deptdesc                string                                      
    month                   string                                      
    # Partition Information          
    # col_name                data_type               comment             
    month                   string                                      
    Time taken: 0.027 seconds, Fetched: 10 row(s)
    hive (yinzhengjie)> 
    修改表-修改列名实操案例(hive (yinzhengjie)> alter table dept_partition change column desc deptdesc string;)
    hive (yinzhengjie)> desc dept_partition;
    col_name    data_type    comment
    deptno                  int                                         
    dname                   string                                      
    loc                     string                                      
    deptdesc                string                                      
    month                   string                                      
    # Partition Information          
    # col_name                data_type               comment             
    month                   string                                      
    Time taken: 0.031 seconds, Fetched: 10 row(s)
    hive (yinzhengjie)> alter table dept_partition replace columns(deptno string, dname string, loc string);         #替换列名,温馨提示:ADD是代表新增一字段,字段位置在所有列后面(partition列前),REPLACE则是表示替换表中所有字段。
    Time taken: 0.152 seconds
    hive (yinzhengjie)> desc dept_partition;
    col_name    data_type    comment
    deptno                  string                                      
    dname                   string                                      
    loc                     string                                      
    month                   string                                      
    # Partition Information          
    # col_name                data_type               comment             
    month                   string                                      
    Time taken: 0.027 seconds, Fetched: 9 row(s)
    hive (yinzhengjie)> 
    修改表-替换列名实操案例(hive (yinzhengjie)> alter table dept_partition replace columns(deptno string, dname string, loc string);)
    hive (yinzhengjie)> show tables;
    Time taken: 0.015 seconds, Fetched: 7 row(s)
    hive (yinzhengjie)> DROP TABLE dept_partition;         #删除指定的表
    Time taken: 0.214 seconds
    hive (yinzhengjie)> show tables;
    Time taken: 0.015 seconds, Fetched: 6 row(s)
    hive (yinzhengjie)> 
    修改表-删除指定的表(hive (yinzhengjie)> DROP TABLE dept_partition; )


        hive>load data [local] inpath '/home/yinzhengjie/download/user.txt' [overwrite] into table student [partition (partcol1=val1,…)];
            1>.load data:表示加载数据
            5>.into table:表示加载到哪张表
    数据导入-向表中装载数据(Load)语法(hive>load data [local] inpath '/home/yinzhengjie/download/user.txt' [overwrite] into table student [partition (partcol1=val1,…)];)
    [yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/students.txt 
    1    sunwukong
    2    zhubajie
    3    shaheshang
    4    bailongma
    5    tangsanzang
    [yinzhengjie@s101 download]$ 
    hive (yinzhengjie)> create table xiyouji(
                      >     id string, 
                      >     name string
                      > )
                      > row format delimited fields terminated by '	';                
    Time taken: 0.635 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/students.txt' into table yinzhengjie.xiyouji;
    Loading data to table yinzhengjie.xiyouji
    Time taken: 10.337 seconds
    hive (yinzhengjie)>
    hive (yinzhengjie)> select * from xiyouji;
    xiyouji.id    xiyouji.name
    1    sunwukong
    2    zhubajie
    3    shaheshang
    4    bailongma
    5    tangsanzang
    Time taken: 0.131 seconds, Fetched: 5 row(s)
    hive (yinzhengjie)> 
    数据导入-向表中装载数据(Load)实操案例之从本地导入数据(hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/students.txt' into table yinzhengjie.xiyouji;)
    hive (yinzhengjie)> select * from xiyouji;
    xiyouji.id    xiyouji.name
    1    sunwukong
    2    zhubajie
    3    shaheshang
    4    bailongma
    5    tangsanzang
    Time taken: 0.207 seconds, Fetched: 5 row(s)
    hive (yinzhengjie)> truncate table xiyouji;                        #温馨提示:Truncate只能删除管理表,不能删除外部表中数据
    Time taken: 0.169 seconds
    hive (yinzhengjie)> select * from xiyouji;
    xiyouji.id    xiyouji.name
    Time taken: 0.086 seconds
    hive (yinzhengjie)> 
    清除表中数据(hive (yinzhengjie)> truncate table xiyouji;)
    hive (yinzhengjie)> select * from xiyouji;                                                                        #查看表中数据是空的
    xiyouji.id    xiyouji.name
    Time taken: 0.077 seconds
    hive (yinzhengjie)> dfs -put /home/yinzhengjie/download/students.txt /home/yinzhengjie/data;                    #上传文件到HDFS
    hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/students.txt;
    1    sunwukong
    2    zhubajie
    3    shaheshang
    4    bailongma
    5    tangsanzang
    hive (yinzhengjie)> load data inpath '/home/yinzhengjie/data/students.txt' into table yinzhengjie.xiyouji;        #加载HDFS上数据,注意数据会被剪切走哟
    Loading data to table yinzhengjie.xiyouji
    Time taken: 0.228 seconds
    hive (yinzhengjie)> select * from xiyouji;                                                                        #再次查看表中数据
    xiyouji.id    xiyouji.name
    1    sunwukong
    2    zhubajie
    3    shaheshang
    4    bailongma
    5    tangsanzang
    Time taken: 0.073 seconds, Fetched: 5 row(s)
    hive (yinzhengjie)> 
    数据导入-向表中装载数据(Load)实操案例之从HDFS导入数据(hive (yinzhengjie)> load data inpath '/home/yinzhengjie/data/students.txt' into table yinzhengjie.xiyouji;)
    hive (yinzhengjie)> select * from xiyouji;                                                                                    #查看上传之前表中数据
    xiyouji.id    xiyouji.name
    1    sunwukong
    2    zhubajie
    3    shaheshang
    4    bailongma
    5    tangsanzang
    1    sunwukong
    2    zhubajie
    3    shaheshang
    4    bailongma
    5    tangsanzang
    1    sunwukong
    2    zhubajie
    3    shaheshang
    4    bailongma
    5    tangsanzang
    Time taken: 0.077 seconds, Fetched: 15 row(s)
    hive (yinzhengjie)> dfs -put /home/yinzhengjie/download/students.txt /home/yinzhengjie/data;                                #上传文件到HDFS
    hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/students.txt;                                                            #查看上传到HDFS的文件内容
    1    sunwukong
    2    zhubajie
    3    shaheshang
    4    bailongma
    5    tangsanzang
    hive (yinzhengjie)> load data inpath '/home/yinzhengjie/data/students.txt' overwrite into table yinzhengjie.xiyouji;        #加载HDFS上数据覆盖表中已有的数据,注意数据会被剪切走哟
    Loading data to table yinzhengjie.xiyouji
    Time taken: 0.346 seconds
    hive (yinzhengjie)> select * from xiyouji;                                                                                    #再次查看表中数据。发现之前的数据已经被覆盖了
    xiyouji.id    xiyouji.name
    1    sunwukong
    2    zhubajie
    3    shaheshang
    4    bailongma
    5    tangsanzang
    Time taken: 0.086 seconds, Fetched: 5 row(s)
    hive (yinzhengjie)> 
    数据导入-向表中装载数据(Load)实操案例之加载数据覆盖表中已有的数据(hive (yinzhengjie)> load data inpath '/home/yinzhengjie/data/students.txt' overwrite into table yinzhengjie.xiyouji;)
    hive (yinzhengjie)> drop table xiyouji;                                                #删除之前的测试表
    Time taken: 1.645 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> create table xiyouji(
                      >     id int, 
                      >     name string
                      > ) 
                      > partitioned by (position string)
                      > row format delimited fields terminated by '	';                                                #创建一张分区表
    Time taken: 0.137 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> insert into table  xiyouji partition(position='wuzhishan') values(1,'孙悟空');                #基本插入数据
    Time taken: 136.695 seconds
    hive (yinzhengjie)> select * from xiyouji;
    xiyouji.id    xiyouji.name    xiyouji.position
    1    孙悟空    wuzhishan
    Time taken: 0.169 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)> 
    数据导入-基本插入数据(hive (yinzhengjie)> insert into table xiyouji partition(position='wuzhishan') values(1,'孙悟空');)温馨提示:position的值最好不要设置成中文!!!
    hive (yinzhengjie)> select * from xiyouji;                                        #查看表中的数据
    xiyouji.id    xiyouji.name    xiyouji.position
    1    孙悟空    wuzhishan
    Time taken: 0.117 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)> insert overwrite table xiyouji partition(position='sandabaigujing') select id, name from xiyouji where position='wuzhishan';        #根据单张表查询结果向表中插入数据
    hive (yinzhengjie)> select * from xiyouji;                                        #再次查看表中的数据,你会发现多了一条数据,只不过position的值发生了变化
    xiyouji.id    xiyouji.name    xiyouji.position
    1    孙悟空    sandabaigujing
    1    孙悟空    wuzhishan
    Time taken: 0.105 seconds, Fetched: 2 row(s)
    hive (yinzhengjie)> 
    数据导入-根据单张表查询结果向表中插入数据(hive (yinzhengjie)> insert overwrite table xiyouji partition(position='sandabaigujing') select id, name from xiyouji where position='wuzhishan';)
    hive (yinzhengjie)> select * from xiyouji;                                                            #查看数据表当前的数据
    xiyouji.id    xiyouji.name    xiyouji.position
    1    孙悟空    sandabaigujing
    1    孙悟空    wuzhishan
    Time taken: 0.14 seconds, Fetched: 2 row(s)
    hive (yinzhengjie)> from xiyouji
                      > insert overwrite table xiyouji partition(position='nverguo')
                      > select id, name where position='wuzhishan'
                      > insert overwrite table xiyouji partition(position='zhenjiameihouwang')
                      > select id, name where position='wuzhishan';                                        #根据多张表查询结果多插入模式,我测试时只插入了2条数据
    Time taken: 63.367 seconds
    hive (yinzhengjie)> select * from xiyouji;                                                            #再次查看数据表当前的数据,你会发现又多了2条数据
    xiyouji.id    xiyouji.name    xiyouji.position
    1    孙悟空    nverguo
    1    孙悟空    sandabaigujing
    1    孙悟空    wuzhishan
    1    孙悟空    zhenjiameihouwang
    Time taken: 0.141 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select * from xiyouji;                                                #查看表中的数据
    xiyouji.id    xiyouji.name    xiyouji.position
    1    孙悟空    nverguo
    1    孙悟空    sandabaigujing
    1    孙悟空    wuzhishan
    1    孙悟空    zhenjiameihouwang
    Time taken: 0.087 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> create table if not exists xiyouji2  as select id, name from xiyouji;        #根据查询结果创建表(查询的结果会添加到新创建的表中)
    Time taken: 54.907 seconds
    hive (yinzhengjie)> select * from xiyouji2;                                                #查看新生成表的数据
    xiyouji2.id    xiyouji2.name
    1    孙悟空
    1    孙悟空
    1    孙悟空
    1    孙悟空
    Time taken: 0.065 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> 
    数据导入-查询语句中创建表并加载数据(hive (yinzhengjie)> create table if not exists xiyouji2 as select id, name from xiyouji;)
    hive (yinzhengjie)> create table if not exists Student(                                            
                      >     id int, 
                      >     name string
                      > )
                      > row format delimited fields terminated by '	'
                      > location '/home/yinzhengjie/data/students.txt';                                                    #创建表,并指定在hdfs上的加载数据路径
    Time taken: 0.017 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> dfs -put /home/yinzhengjie/download/students.txt  /home/yinzhengjie/data/students.txt;            #上传数据到hdfs上
    hive (yinzhengjie)>  dfs -cat /home/yinzhengjie/data/students.txt;                                                    #查看上传到hdfs上的数据,这个数据会被Student表自动加载。
    1    sunwukong
    2    zhubajie
    3    shaheshang
    4    bailongma
    5    tangsanzang
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select * from Student;                                                                            #我们会发现Student表会自动加载数据,神奇不?
    student.id    student.name
    1    sunwukong
    2    zhubajie
    3    shaheshang
    4    bailongma
    5    tangsanzang
    Time taken: 0.054 seconds, Fetched: 5 row(s)
    hive (yinzhengjie)>
    hive (yinzhengjie)> import table xiyoujihouzhuan partition(position='zhenjiameihouwang') from '/home/yinzhengjie/data/xiyouji2';            #从hdfs中导入指定的分区到指定的表中
    Copying data from hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=zhenjiameihouwang
    Copying file: hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=zhenjiameihouwang/000000_0
    Loading data to table yinzhengjie.xiyoujihouzhuan partition (position=zhenjiameihouwang)
    Time taken: 3.966 seconds
    hive (yinzhengjie)> select * from xiyoujihouzhuan;                    #查看是否导入成功
    xiyoujihouzhuan.id    xiyoujihouzhuan.name    xiyoujihouzhuan.position
    1    孙悟空    zhenjiameihouwang
    Time taken: 0.293 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)> import table xiyoujihouzhuan partition(position='nverguo') from '/home/yinzhengjie/data/xiyouji2';
    Copying data from hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=nverguo
    Copying file: hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=nverguo/000000_0
    Loading data to table yinzhengjie.xiyoujihouzhuan partition (position=nverguo)
    Time taken: 0.751 seconds
    hive (yinzhengjie)> import table xiyoujihouzhuan partition(position='wuzhishan') from '/home/yinzhengjie/data/xiyouji2';
    Copying data from hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=wuzhishan
    Copying file: hdfs://mycluster/home/yinzhengjie/data/xiyouji2/position=wuzhishan/000000_0
    Loading data to table yinzhengjie.xiyoujihouzhuan partition (position=wuzhishan)
    Time taken: 1.363 seconds
    hive (yinzhengjie)> select * from xiyoujihouzhuan;
    xiyoujihouzhuan.id    xiyoujihouzhuan.name    xiyoujihouzhuan.position
    1    孙悟空    nverguo
    1    孙悟空    wuzhishan
    1    孙悟空    zhenjiameihouwang
    Time taken: 0.488 seconds, Fetched: 3 row(s)
    hive (yinzhengjie)>
    数据导入-Import数据到指定Hive表中,温馨提示:先用export导出后,再将数据导入。(hive (yinzhengjie)> import table xiyoujihouzhuan partition(position='wuzhishan') from '/home/yinzhengjie/data/xiyouji2';)
    hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/xiyouji' select * from xiyouji;                      #将查询的结果导出到本地路径,注意这里导出的是一个目录哟
    Time taken: 77.687 seconds
    hive (yinzhengjie)> ! cat /home/yinzhengjie/download/xiyouji/000000_0;                                                                    #查看导出到本地的文本信息
    hive (yinzhengjie)> 
    数据导出-将查询的结果导出到本地(hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/xiyouji' select * from xiyouji;)
    hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/xiyouji2'
                      > select * from xiyouji;                                                                #我们指定以"	"进行风格字段
    Time taken: 100.57 seconds
    hive (yinzhengjie)> ! cat /home/yinzhengjie/download/xiyouji2/000000_0;                                    #查看导出的数据内容
    1    孙悟空    nverguo
    1    孙悟空    sandabaigujing
    1    孙悟空    wuzhishan
    1    孙悟空    zhenjiameihouwang
    hive (yinzhengjie)> 
    数据导出-将查询的结果格式化导出到本地(hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/xiyouji2' ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' select * from xiyouji;)
    hive (yinzhengjie)> insert overwrite directory '/home/yinzhengjie/data/xiyouji'
                      > select * from xiyouji;                                                                #将查询的结果导出到HDFS上
    MapReduce Total cumulative CPU time: 2 seconds 380 msec
    Ended Job = job_1533789743141_0013
    Stage-3 is selected by condition resolver.
    Stage-2 is filtered out by condition resolver.
    Stage-4 is filtered out by condition resolver.
    Moving data to directory hdfs://mycluster/home/yinzhengjie/data/xiyouji/.hive-staging_hive_2018-08-09_19-21-05_012_3955068750863516339-1/-ext-10000
    Moving data to directory /home/yinzhengjie/data/xiyouji
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1   Cumulative CPU: 2.38 sec   HDFS Read: 5455 HDFS Write: 99 SUCCESS
    Total MapReduce CPU Time Spent: 2 seconds 380 msec
    xiyouji.id    xiyouji.name    xiyouji.position
    Time taken: 88.306 seconds
    hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/xiyouji/000000_0;                                    #查询导出在hdfs上的数据
    1    孙悟空    nverguo
    1    孙悟空    sandabaigujing
    1    孙悟空    wuzhishan
    1    孙悟空    zhenjiameihouwang
    hive (yinzhengjie)> 
    数据导出-将查询的结果导出到HDFS上(hive (yinzhengjie)> insert overwrite directory '/home/yinzhengjie/data/xiyouji' ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' select * from xiyouji;)
    hive (yinzhengjie)> dfs -get  /home/yinzhengjie/data/xiyouji/000000_0  /home/yinzhengjie/download/xiyouji3;                #通过Hadoop命令将数据导出到本地
    hive (yinzhengjie)> ! cat /home/yinzhengjie/download/xiyouji3;                                                            #查看导出到Linux的文本信息
    1    孙悟空    nverguo
    1    孙悟空    sandabaigujing
    1    孙悟空    wuzhishan
    1    孙悟空    zhenjiameihouwang
    hive (yinzhengjie)> 
    数据导出-Hadoop命令导出到本地(hive (yinzhengjie)> dfs -get /home/yinzhengjie/data/xiyouji/000000_0 /home/yinzhengjie/download/xiyouji3;)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> export table yinzhengjie.xiyouji to '/home/yinzhengjie/data/xiyouji2';                            #通过Export将数据导出到HDFS上
    Copying data from file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_19-30-58_906_1594217512913959561-1/-local-10000/_metadata
    Copying file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_19-30-58_906_1594217512913959561-1/-local-10000/_metadata
    Copying data from hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=???
    Copying file: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=五指山/000000_0
    Copying data from hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=nverguo
    Copying file: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=nverguo/000000_0
    Copying data from hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=sandabaigujing
    Copying file: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=sandabaigujing/000000_0
    Copying data from hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=wuzhishan
    Copying file: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=wuzhishan/000000_0
    Copying data from hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=zhenjiameihouwang
    Copying file: hdfs://mycluster/user/hive/warehouse/yinzhengjie.db/xiyouji/position=zhenjiameihouwang/000000_0
    Time taken: 0.978 seconds
    hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/xiyouji2/position=wuzhishan/000000_0;
    1    孙悟空
    hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/xiyouji2/position=nverguo/000000_0;
    1    孙悟空
    hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/xiyouji2/position=sandabaigujing/000000_0;
    1    孙悟空
    hive (yinzhengjie)> dfs -cat /home/yinzhengjie/data/xiyouji2/position=zhenjiameihouwang/000000_0;
    1    孙悟空
    hive (yinzhengjie)> 
    数据导出-通过Export将数据导出到HDFS上(hive (yinzhengjie)> export table yinzhengjie.xiyouji to '/home/yinzhengjie/data/xiyouji2';)
    [yinzhengjie@s101 ~]$ hive -e 'select * from yinzhengjie.xiyouji;' > /home/yinzhengjie/download/xiyouji6                            #通过命令行访问hive,并将数据重定向到本地的一个文件中。
    Time taken: 20.367 seconds, Fetched: 4 row(s)
    [yinzhengjie@s101 ~]$ 
    [yinzhengjie@s101 ~]$ cat /home/yinzhengjie/download/xiyouji6                #查看查询的结果
    xiyouji.id    xiyouji.name    xiyouji.position
    1    孙悟空    nverguo
    1    孙悟空    sandabaigujing
    1    孙悟空    wuzhishan
    1    孙悟空    zhenjiameihouwang
    [yinzhengjie@s101 ~]$ 
    数据导出-Hive Shell 命令导出([yinzhengjie@s101 ~]$ hive -e 'select * from yinzhengjie.xiyouji;' > /home/yinzhengjie/download/xiyouji6)



    hive (yinzhengjie)> select * from teacher;                #全表查询
    teacher.id    teacher.name
    70    Dennis MacAlistair Ritchie
    49    Linus Benedict Torvalds
    68    Bjarne Stroustrup
    62    Guido van Rossum
    63    James Gosling
    60    Martin Odersky
    62    Rob Pike
    50    Rasmus Lerdorf
    50    Brendan Eich
    Time taken: 0.108 seconds, Fetched: 9 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select name from teacher;            #选择特定列查询
    Dennis MacAlistair Ritchie
    Linus Benedict Torvalds
    Bjarne Stroustrup
    Guido van Rossum
    James Gosling
    Martin Odersky
    Rob Pike
    Rasmus Lerdorf
    Brendan Eich
    Time taken: 0.1 seconds, Fetched: 9 row(s)
    hive (yinzhengjie)> 
        1>.SQL 语言大小写不敏感。 
        2>.SQL 可以写在一行或者多行
    基本查询- 全表和特定列查询(hive (yinzhengjie)> select name from teacher;)
    hive (yinzhengjie)> select id AS tid, name AS Tname from teacher;
    tid    tname
    70    Dennis MacAlistair Ritchie
    49    Linus Benedict Torvalds
    68    Bjarne Stroustrup
    62    Guido van Rossum
    63    James Gosling
    60    Martin Odersky
    62    Rob Pike
    50    Rasmus Lerdorf
    50    Brendan Eich
    Time taken: 0.088 seconds, Fetched: 9 row(s)
    hive (yinzhengjie)> 
    基本查询- 列别名操作案例(hive (yinzhengjie)> select id AS tid, name AS Tname from teacher;)
    hive (yinzhengjie)> select id AS age, name AS Tname from teacher;
    age    tname
    70    Dennis MacAlistair Ritchie
    49    Linus Benedict Torvalds
    68    Bjarne Stroustrup
    62    Guido van Rossum
    63    James Gosling
    60    Martin Odersky
    62    Rob Pike
    50    Rasmus Lerdorf
    50    Brendan Eich
    Time taken: 0.157 seconds, Fetched: 9 row(s)
    hive (yinzhengjie)> select id+20 AS age, name AS Tname from teacher;
    age    tname
    90    Dennis MacAlistair Ritchie
    69    Linus Benedict Torvalds
    88    Bjarne Stroustrup
    82    Guido van Rossum
    83    James Gosling
    80    Martin Odersky
    82    Rob Pike
    70    Rasmus Lerdorf
    70    Brendan Eich
    Time taken: 0.091 seconds, Fetched: 9 row(s)
    hive (yinzhengjie)> 
    算术运算符                    描述
        A+B                      A和B 相加
        A-B                      A减去B
        A*B                      A和B 相乘
        A/B                      A除以B
        A%B                      A对B取余
        A&B                      A和B按位取与
        A|B                      A和B按位取或
        A^B                      A和B按位取异或
        ~A                      A按位取反
    基本查询-通过算术运算符将查询结果的数据加20后在显示(hive (yinzhengjie)> select id+20 AS age, name AS Tname from teacher;)
    hive (yinzhengjie)> select count(*)cnt from teacher;
    Query ID = yinzhengjie_20180809202019_6a4b05d8-8807-410b-af4e-3c1839e0bdc6
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0014, Tracking URL = http://s101:8088/proxy/application_1533789743141_0014/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0014
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-08-09 20:21:06,776 Stage-1 map = 0%,  reduce = 0%
    2018-08-09 20:21:35,994 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.61 sec
    2018-08-09 20:22:19,562 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.51 sec
    MapReduce Total cumulative CPU time: 5 seconds 510 msec
    Ended Job = job_1533789743141_0014
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.51 sec   HDFS Read: 7766 HDFS Write: 101 SUCCESS
    Total MapReduce CPU Time Spent: 5 seconds 510 msec
    Time taken: 123.864 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)> 
    基本查询- 常用函数之求总行数(hive (yinzhengjie)> select count(*)cnt from teacher;)
    hive (yinzhengjie)> select max(id) max_age from teacher;
    Query ID = yinzhengjie_20180809202410_0146f895-4c54-440f-aa1b-bee4fb566b91
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0015, Tracking URL = http://s101:8088/proxy/application_1533789743141_0015/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0015
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-08-09 20:24:47,751 Stage-1 map = 0%,  reduce = 0%
    2018-08-09 20:25:09,196 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.46 sec
    2018-08-09 20:25:22,584 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.08 sec
    MapReduce Total cumulative CPU time: 5 seconds 80 msec
    Ended Job = job_1533789743141_0015
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.08 sec   HDFS Read: 7950 HDFS Write: 102 SUCCESS
    Total MapReduce CPU Time Spent: 5 seconds 80 msec
    Time taken: 74.014 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)> 
    基本查询- 常用函数之求年龄的最大值(hive (yinzhengjie)> select max(id) max_age from teacher;)
    hive (yinzhengjie)> select min(id) min_age from teacher;
    Query ID = yinzhengjie_20180809202623_b1b99783-b7d3-4994-901e-4e901795a128
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0016, Tracking URL = http://s101:8088/proxy/application_1533789743141_0016/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0016
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-08-09 20:26:41,646 Stage-1 map = 0%,  reduce = 0%
    2018-08-09 20:27:10,432 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.34 sec
    2018-08-09 20:27:38,200 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 3.77 sec
    2018-08-09 20:27:40,261 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.42 sec
    MapReduce Total cumulative CPU time: 4 seconds 420 msec
    Ended Job = job_1533789743141_0016
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.42 sec   HDFS Read: 7956 HDFS Write: 102 SUCCESS
    Total MapReduce CPU Time Spent: 4 seconds 420 msec
    Time taken: 79.135 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)>
    基本查询- 常用函数之求年龄的最小值(hive (yinzhengjie)> select min(id) min_age from teacher;)
    hive (yinzhengjie)> select sum(id) sum_age from teacher;
    Query ID = yinzhengjie_20180809202800_14580ea4-3e65-461e-a1c6-6607e960c3d7
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0017, Tracking URL = http://s101:8088/proxy/application_1533789743141_0017/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0017
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-08-09 20:28:16,698 Stage-1 map = 0%,  reduce = 0%
    2018-08-09 20:28:29,168 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.27 sec
    2018-08-09 20:28:42,627 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.58 sec
    MapReduce Total cumulative CPU time: 4 seconds 580 msec
    Ended Job = job_1533789743141_0017
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.58 sec   HDFS Read: 7948 HDFS Write: 103 SUCCESS
    Total MapReduce CPU Time Spent: 4 seconds 580 msec
    Time taken: 43.081 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)>
    基本查询- 常用函数之求年龄的总和(hive (yinzhengjie)> select sum(id) sum_age from teacher;)
    hive (yinzhengjie)> select avg(id) avg_age from teacher;
    Query ID = yinzhengjie_20180809202900_618a9c9f-535a-45ac-94de-16723f47d9b9
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0018, Tracking URL = http://s101:8088/proxy/application_1533789743141_0018/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0018
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-08-09 20:29:18,939 Stage-1 map = 0%,  reduce = 0%
    2018-08-09 20:29:38,527 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 3.19 sec
    2018-08-09 20:29:58,143 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.25 sec
    MapReduce Total cumulative CPU time: 5 seconds 250 msec
    Ended Job = job_1533789743141_0018
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.25 sec   HDFS Read: 8551 HDFS Write: 118 SUCCESS
    Total MapReduce CPU Time Spent: 5 seconds 250 msec
    Time taken: 59.897 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)> 
    基本查询- 常用函数之求年龄的平均值(hive (yinzhengjie)> select avg(id) avg_age from teacher;)
    hive (yinzhengjie)> select id AS age , name  from teacher;
    age    name
    70    Dennis MacAlistair Ritchie
    49    Linus Benedict Torvalds
    68    Bjarne Stroustrup
    62    Guido van Rossum
    63    James Gosling
    60    Martin Odersky
    62    Rob Pike
    50    Rasmus Lerdorf
    50    Brendan Eich
    Time taken: 0.068 seconds, Fetched: 9 row(s)
    hive (yinzhengjie)> select id AS age , name  from teacher limit 3;             #典型的查询会返回多行数据。LIMIT子句用于限制返回的行数。
    age    name
    70    Dennis MacAlistair Ritchie
    49    Linus Benedict Torvalds
    68    Bjarne Stroustrup
    Time taken: 0.1 seconds, Fetched: 3 row(s)
    hive (yinzhengjie)> 
    基本查询- Limit语句(hive (yinzhengjie)> select id AS age , name from teacher limit 3;)
    hive (yinzhengjie)> select id, name  from teacher where id> 60;                #使用WHERE子句,将不满足条件的行过滤掉。WHERE子句紧随FROM子句。
    id    name
    70    Dennis MacAlistair Ritchie
    68    Bjarne Stroustrup
    62    Guido van Rossum
    63    James Gosling
    62    Rob Pike
    Time taken: 0.056 seconds, Fetched: 5 row(s)
    hive (yinzhengjie)> 
    Where语句(hive (yinzhengjie)> select id, name from teacher where id> 60;)
    hive (yinzhengjie)> select * from teacher where id = 60;                    #查询出id等于60的老师
    teacher.id    teacher.name
    60    Martin Odersky
    Time taken: 0.075 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select * from teacher where id between 40 and 60;        #查询id在40到60的老师
    teacher.id    teacher.name
    49    Linus Benedict Torvalds
    60    Martin Odersky
    50    Rasmus Lerdorf
    50    Brendan Eich
    Time taken: 0.05 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select * from teacher where name is null;                #查询name字段为空的所有老师信息,很显然我没有这样的数据
    teacher.id    teacher.name
    Time taken: 0.104 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select * from teacher where id IN(50,60);                #查询id是50和60的老师信息
    teacher.id    teacher.name
    60    Martin Odersky
    50    Rasmus Lerdorf
    50    Brendan Eich
    Time taken: 0.07 seconds, Fetched: 3 row(s)
    hive (yinzhengjie)> 
    操作符                                支持的数据类型                                描述
    A=B                                    基本数据类型            如果A等于B则返回TRUE,反之返回FALSE
    A<=>B                                基本数据类型            如果A和B都为NULL,则返回TRUE,其他的和等号(=)操作符的结果一致,如果任一为NULL则结果为NULL
    A<>B, A!=B                            基本数据类型            A或者B为NULL则返回NULL;如果A不等于B,则返回TRUE,反之返回FALSE
    A<B                                    基本数据类型            A或者B为NULL,则返回NULL;如果A小于B,则返回TRUE,反之返回FALSE
    A<=B                                基本数据类型            A或者B为NULL,则返回NULL;如果A小于等于B,则返回TRUE,反之返回FALSE
    A>B                                    基本数据类型            A或者B为NULL,则返回NULL;如果A大于B,则返回TRUE,反之返回FALSE
    A>=B                                基本数据类型            A或者B为NULL,则返回NULL;如果A大于等于B,则返回TRUE,反之返回FALSE
    A [NOT] BETWEEN B AND C                基本数据类型            如果A,B或者C任一为NULL,则结果为NULL。如果A的值大于等于B而且小于或等于C,则结果为TRUE,反之为FALSE。如果使用NOT关键字则可达到相反的效果。
    A IS NULL                            所有数据类型            如果A等于NULL,则返回TRUE,反之返回FALSE
    A IS NOT NULL                        所有数据类型            如果A不等于NULL,则返回TRUE,反之返回FALSE
    IN(数值1, 数值2)                    所有数据类型            使用 IN运算显示列表中的值
    A [NOT] LIKE B                        STRING类型                B是一个SQL下的简单正则表达式,如果A与其匹配的话,则返回TRUE;反之返回FALSE。B的表达式说明如下:‘x%’表示A必须以字母‘x’开头,‘%x’表示A必须以字母’x’结尾,而‘%x%’表示A包含有字母’x’,可以位于开头,结尾或者字符串中间。如果使用NOT关键字则可达到相反的效果。
    A RLIKE B, A REGEXP B                STRING类型                B是一个正则表达式,如果A与其匹配,则返回TRUE;反之返回FALSE。匹配使用的是JDK中的正则表达式接口实现的,因为正则也依据其中的规则。例如,正则表达式必须和整个字符串A相匹配,而不是只需与其字符串匹配。
    Where语句-比较运算符详解(hive (yinzhengjie)> select * from teacher where id IN(50,60);)
            % :代表零个或多个字符(任意个字符)。
            _ :代表一个字符。
    hive (yinzhengjie)> select * from teacher where id LIKE '5%';                #查找以5开头id的老师信息
    teacher.id    teacher.name
    50    Rasmus Lerdorf
    50    Brendan Eich
    Time taken: 0.126 seconds, Fetched: 2 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select * from teacher where id LIKE '_2%';                #查找第二个数值为2的id的老师信息
    teacher.id    teacher.name
    62    Guido van Rossum
    62    Rob Pike
    Time taken: 0.065 seconds, Fetched: 2 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select * from teacher where name RLIKE '[P]';            #查找name字段中含有“P”字母的老师信息
    teacher.id    teacher.name
    62    Rob Pike
    Time taken: 0.049 seconds, Fetched: 1 row(s)
    hive (yinzhengjie)> 
    Where语句-Like和RLike(hive (yinzhengjie)> select * from teacher where name RLIKE '[P]';)
    hive (yinzhengjie)> select * from teacher where id NOT IN(50,70,49,68,62);
    teacher.id    teacher.name
    63    James Gosling
    60    Martin Odersky
    Time taken: 0.076 seconds, Fetched: 2 row(s)
    hive (yinzhengjie)> 
    Where语句-逻辑运算符(hive (yinzhengjie)> select * from teacher where id > 65 or id <50;)
    hive (yinzhengjie)> select * from dept_partition;
    dept_partition.deptno    dept_partition.dname    dept_partition.loc    dept_partition.month
    10    开发部门    20000    201805
    20    运维部门    13000    201805
    30    测试部门    8000    201805
    40    产品部门    6000    201805
    50    销售部门    15000    201805
    60    财务部门    17000    201805
    70    人事部门    16000    201805
    10    开发部门    25000    201805
    10    开发部门    10000    201805
    20    运维部门    13000    201805
    30    测试部门    7000    201805
    40    产品部门    9000    201805
    50    销售部门    26000    201805
    60    财务部门    11000    201805
    70    人事部门    16000    201805
    20    运维部门    21000    201805
    30    测试部门    8000    201805
    40    产品部门    9800    201805
    50    销售部门    15000    201805
    60    财务部门    17000    201805
    70    人事部门    8700    201805
    Time taken: 0.059 seconds, Fetched: 21 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select t.deptno, avg(t.loc) avg_sal from dept_partition t group by t.deptno;            #计算dept_partition表每个部门的平均工资
    Query ID = yinzhengjie_20180809212224_fcbdaa54-b167-4a43-8a08-c0a984c25a0d
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0021, Tracking URL = http://s101:8088/proxy/application_1533789743141_0021/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0021
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-08-09 21:22:51,029 Stage-1 map = 0%,  reduce = 0%
    2018-08-09 21:23:15,924 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.62 sec
    2018-08-09 21:23:31,362 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.14 sec
    MapReduce Total cumulative CPU time: 5 seconds 140 msec
    Ended Job = job_1533789743141_0021
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.14 sec   HDFS Read: 9719 HDFS Write: 312 SUCCESS
    Total MapReduce CPU Time Spent: 5 seconds 140 msec
    t.deptno    avg_sal
    10    18333.333333333332
    20    15666.666666666666
    30    7666.666666666667
    40    8266.666666666666
    50    18666.666666666668
    60    15000.0
    70    13566.666666666666
    Time taken: 68.573 seconds, Fetched: 7 row(s)
    hive (yinzhengjie)>
    分组-Group By语句案例一(hive (yinzhengjie)> select t.deptno, avg(t.loc) avg_sal from dept_partition t group by t.deptno;)
    hive (yinzhengjie)> select * from dept_partition;
    dept_partition.deptno    dept_partition.dname    dept_partition.loc    dept_partition.month
    10    开发部门    20000    201805
    20    运维部门    13000    201805
    30    测试部门    8000    201805
    40    产品部门    6000    201805
    50    销售部门    15000    201805
    60    财务部门    17000    201805
    70    人事部门    16000    201805
    10    开发部门    25000    201805
    10    开发部门    10000    201805
    20    运维部门    13000    201805
    30    测试部门    7000    201805
    40    产品部门    9000    201805
    50    销售部门    26000    201805
    60    财务部门    11000    201805
    70    人事部门    16000    201805
    20    运维部门    21000    201805
    30    测试部门    8000    201805
    40    产品部门    9800    201805
    50    销售部门    15000    201805
    60    财务部门    17000    201805
    70    人事部门    8700    201805
    Time taken: 0.072 seconds, Fetched: 21 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select t.deptno, t.dname,max(t.loc) max_sal from dept_partition t group by t.deptno,t.dname;        #计算dept_partition每个部门中每个岗位的最高薪水
    Query ID = yinzhengjie_20180809213154_e1ea82c8-897d-40b5-b167-5fe42d0e6476
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0023, Tracking URL = http://s101:8088/proxy/application_1533789743141_0023/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0023
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-08-09 21:32:11,358 Stage-1 map = 0%,  reduce = 0%
    2018-08-09 21:32:21,651 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.85 sec
    2018-08-09 21:32:29,958 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.61 sec
    MapReduce Total cumulative CPU time: 3 seconds 610 msec
    Ended Job = job_1533789743141_0023
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.61 sec   HDFS Read: 9537 HDFS Write: 406 SUCCESS
    Total MapReduce CPU Time Spent: 3 seconds 610 msec
    t.deptno    t.dname    max_sal
    10    开发部门    25000
    20    运维部门    21000
    30    测试部门    8000
    40    产品部门    9800
    50    销售部门    26000
    60    财务部门    17000
    70    人事部门    8700
    Time taken: 37.781 seconds, Fetched: 7 row(s)
    hive (yinzhengjie)> 
    分组-Group By语句案例二(hive (yinzhengjie)> select t.deptno, t.dname,max(t.loc) max_sal from dept_partition t group by t.deptno,t.dname;)
    hive (yinzhengjie)> select deptno,dname,avg(loc) AS avg_sal  from dept_partition  group by dname,deptno;                            #求每个部门的平均工资
    Query ID = yinzhengjie_20180809213945_f7a1a9c2-8c19-4096-9c1a-37faa29fee44
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0024, Tracking URL = http://s101:8088/proxy/application_1533789743141_0024/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0024
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-08-09 21:40:17,366 Stage-1 map = 0%,  reduce = 0%
    2018-08-09 21:40:33,044 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.2 sec
    2018-08-09 21:40:46,435 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.69 sec
    MapReduce Total cumulative CPU time: 4 seconds 690 msec
    Ended Job = job_1533789743141_0024
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.69 sec   HDFS Read: 10452 HDFS Write: 487 SUCCESS
    Total MapReduce CPU Time Spent: 4 seconds 690 msec
    deptno    dname    avg_sal
    10    开发部门    18333.333333333332
    20    运维部门    15666.666666666666
    30    测试部门    7666.666666666667
    40    产品部门    8266.666666666666
    50    销售部门    18666.666666666668
    60    财务部门    15000.0
    70    人事部门    13566.666666666666
    Time taken: 63.433 seconds, Fetched: 7 row(s)
    hive (yinzhengjie)> 
    hive (yinzhengjie)> select deptno,dname,avg(loc) AS avg_sal from dept_partition group by dname, deptno having avg_sal > 10000;            #求每个部门的平均薪水大于10000的部门
    Query ID = yinzhengjie_20180809214521_d980d9db-3473-4fd4-a062-ec9de0cafca2
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0026, Tracking URL = http://s101:8088/proxy/application_1533789743141_0026/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0026
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-08-09 21:45:37,001 Stage-1 map = 0%,  reduce = 0%
    2018-08-09 21:45:50,841 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.45 sec
    2018-08-09 21:46:03,332 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.45 sec
    MapReduce Total cumulative CPU time: 4 seconds 450 msec
    Ended Job = job_1533789743141_0026
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.45 sec   HDFS Read: 10711 HDFS Write: 371 SUCCESS
    Total MapReduce CPU Time Spent: 4 seconds 450 msec
    deptno    dname    avg_sal
    70    人事部门    13566.666666666666
    10    开发部门    18333.333333333332
    60    财务部门    15000.0
    20    运维部门    15666.666666666666
    50    销售部门    18666.666666666668
    Time taken: 43.701 seconds, Fetched: 5 row(s)
    hive (yinzhengjie)> 
    分组-Having语句(hive (yinzhengjie)> select deptno,dname,avg(loc) AS avg_sal from dept_partition group by dname, deptno having avg_sal > 10000;)
    Join语句-等值Join(hive (yinzhengjie)>  select e.empno, e.ename, d.deptno, d.dname from emp e join dept d on e.deptno = d.deptno;)
        Hive支持通常的SQL JOIN语句,但是只支持等值连接,不支持非等值连接。
    [yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/dept.txt 
    10    ACCOUNTING    1700
    20    RESEARCH    1800
    30    SALES    1900
    40    OPERATIONS    1700
    [yinzhengjie@s101 download]$ 
    [yinzhengjie@s101 download]$ cat /home/yinzhengjie/download/emp.txt 
    7369    SMITH    CLERK    7902    1980-12-17    800.00        20
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.00    300.00    30
    7521    WARD    SALESMAN    7698    1981-2-22    1250.00    500.00    30
    7566    JONES    MANAGER    7839    1981-4-2    2975.00        20
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.00    1400.00    30
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.00        30
    7782    CLARK    MANAGER    7839    1981-6-9    2450.00        10
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.00        20
    7839    KING    PRESIDENT        1981-11-17    5000.00        10
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.00    0.00    30
    7876    ADAMS    CLERK    7788    1987-5-23    1100.00        20
    7900    JAMES    CLERK    7698    1981-12-3    950.00        30
    7902    FORD    ANALYST    7566    1981-12-3    3000.00        20
    7934    MILLER    CLERK    7782    1982-1-23    1300.00        10
    [yinzhengjie@s101 download]$ 
    hive (yinzhengjie)> create  table if not exists yinzhengjie.dept(
                      >     deptno int,
                      >     dname string,
                      >     loc int
                      > )
                      > row format delimited fields terminated by '	';                            #创建部门表dept
    Time taken: 0.204 seconds
    hive (yinzhengjie)> create  table if not exists yinzhengjie.emp(
                      >     empno int,
                      >     ename string,
                      >     job string,
                      >     mgr int,
                      >     hiredate string,
                      >     sal double,
                      >     comm double,
                      >     deptno int
                      > row format delimited fields terminated by '	';                            #创建员工表emp
    Time taken: 0.088 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/dept.txt' into table yinzhengjie.dept;            #向dept中导入数据
    Loading data to table yinzhengjie.dept
    Time taken: 0.222 seconds
    hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/emp.txt' into table yinzhengjie.emp;                #向emp中导入数据
    Loading data to table yinzhengjie.emp
    Time taken: 0.175 seconds
    hive (yinzhengjie)> 
    hive (yinzhengjie)>  select e.empno, e.ename, d.deptno, d.dname from emp e join dept d on e.deptno = d.deptno;            #根据员工表和部门表中的部门编号相等,查询员工编号、员工名称和部门编号;
    2018-08-09 23:34:25    Starting to launch local task to process map join;    maximum memory = 477626368
    2018-08-09 23:34:34    Dump the side-table for tag: 1 with group count: 4 into file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-34-09_040_8075868526571286750-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile11--.hashtable
    2018-08-09 23:34:34    Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-34-09_040_8075868526571286750-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile11--.hashtable (430 bytes)
    2018-08-09 23:34:34    End of local task; Time Taken: 9.163 sec.
    Execution completed successfully
    MapredLocal task succeeded
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1533789743141_0028, Tracking URL = http://s101:8088/proxy/application_1533789743141_0028/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0028
    Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
    2018-08-09 23:35:21,748 Stage-3 map = 0%,  reduce = 0%
    2018-08-09 23:35:45,815 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.71 sec
    MapReduce Total cumulative CPU time: 2 seconds 710 msec
    Ended Job = job_1533789743141_0028
    MapReduce Jobs Launched: 
    Stage-Stage-3: Map: 1   Cumulative CPU: 2.71 sec   HDFS Read: 8390 HDFS Write: 1999 SUCCESS
    Total MapReduce CPU Time Spent: 2 seconds 710 msec
    e.empno    e.ename    d.deptno    d.dname
    7369    SMITH    20    RESEARCH
    7369    SMITH    20    RESEARCH
    7499    ALLEN    30    SALES
    7499    ALLEN    30    SALES
    7521    WARD    30    SALES
    7521    WARD    30    SALES
    7566    JONES    20    RESEARCH
    7566    JONES    20    RESEARCH
    7654    MARTIN    30    SALES
    7654    MARTIN    30    SALES
    7698    BLAKE    30    SALES
    7698    BLAKE    30    SALES
    7782    CLARK    10    ACCOUNTING
    7782    CLARK    10    ACCOUNTING
    7788    SCOTT    20    RESEARCH
    7788    SCOTT    20    RESEARCH
    7839    KING    10    ACCOUNTING
    7839    KING    10    ACCOUNTING
    7844    TURNER    30    SALES
    7844    TURNER    30    SALES
    7876    ADAMS    20    RESEARCH
    7876    ADAMS    20    RESEARCH
    7900    JAMES    30    SALES
    7900    JAMES    30    SALES
    7902    FORD    20    RESEARCH
    7902    FORD    20    RESEARCH
    7934    MILLER    10    ACCOUNTING
    7934    MILLER    10    ACCOUNTING
    7369    SMITH    20    RESEARCH
    7369    SMITH    20    RESEARCH
    7499    ALLEN    30    SALES
    7499    ALLEN    30    SALES
    7521    WARD    30    SALES
    7521    WARD    30    SALES
    7566    JONES    20    RESEARCH
    7566    JONES    20    RESEARCH
    7654    MARTIN    30    SALES
    7654    MARTIN    30    SALES
    7698    BLAKE    30    SALES
    7698    BLAKE    30    SALES
    7782    CLARK    10    ACCOUNTING
    7782    CLARK    10    ACCOUNTING
    7788    SCOTT    20    RESEARCH
    7788    SCOTT    20    RESEARCH
    7839    KING    10    ACCOUNTING
    7839    KING    10    ACCOUNTING
    7844    TURNER    30    SALES
    7844    TURNER    30    SALES
    7876    ADAMS    20    RESEARCH
    7876    ADAMS    20    RESEARCH
    7900    JAMES    30    SALES
    7900    JAMES    30    SALES
    7902    FORD    20    RESEARCH
    7902    FORD    20    RESEARCH
    7934    MILLER    10    ACCOUNTING
    7934    MILLER    10    ACCOUNTING
    Time taken: 98.923 seconds, Fetched: 56 row(s)
    hive (yinzhengjie)> 
    Join语句-等值Join(hive (yinzhengjie)> select e.empno, e.ename, d.deptno, d.dname from emp e join dept d on e.deptno = d.deptno;)
    Join语句-表的别名(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e join dept d on e.deptno = d.deptno;)
    hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e join dept d on e.deptno = d.deptno;            #合并员工表和部门表
    2018-08-09 23:31:39    Starting to launch local task to process map join;    maximum memory = 477626368
    2018-08-09 23:31:55    Dump the side-table for tag: 1 with group count: 4 into file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-31-20_931_5011927912909131499-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile01--.hashtable
    2018-08-09 23:31:55    Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-31-20_931_5011927912909131499-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile01--.hashtable (348 bytes)
    2018-08-09 23:31:55    End of local task; Time Taken: 16.147 sec.
    Execution completed successfully
    MapredLocal task succeeded
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1533789743141_0027, Tracking URL = http://s101:8088/proxy/application_1533789743141_0027/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0027
    Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
    2018-08-09 23:32:55,103 Stage-3 map = 0%,  reduce = 0%
    2018-08-09 23:33:11,944 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.82 sec
    MapReduce Total cumulative CPU time: 1 seconds 820 msec
    Ended Job = job_1533789743141_0027
    MapReduce Jobs Launched: 
    Stage-Stage-3: Map: 1   Cumulative CPU: 1.82 sec   HDFS Read: 8221 HDFS Write: 1543 SUCCESS
    Total MapReduce CPU Time Spent: 1 seconds 820 msec
    e.empno    e.ename    d.deptno
    7369    SMITH    20
    7369    SMITH    20
    7499    ALLEN    30
    7499    ALLEN    30
    7521    WARD    30
    7521    WARD    30
    7566    JONES    20
    7566    JONES    20
    7654    MARTIN    30
    7654    MARTIN    30
    7698    BLAKE    30
    7698    BLAKE    30
    7782    CLARK    10
    7782    CLARK    10
    7788    SCOTT    20
    7788    SCOTT    20
    7839    KING    10
    7839    KING    10
    7844    TURNER    30
    7844    TURNER    30
    7876    ADAMS    20
    7876    ADAMS    20
    7900    JAMES    30
    7900    JAMES    30
    7902    FORD    20
    7902    FORD    20
    7934    MILLER    10
    7934    MILLER    10
    7369    SMITH    20
    7369    SMITH    20
    7499    ALLEN    30
    7499    ALLEN    30
    7521    WARD    30
    7521    WARD    30
    7566    JONES    20
    7566    JONES    20
    7654    MARTIN    30
    7654    MARTIN    30
    7698    BLAKE    30
    7698    BLAKE    30
    7782    CLARK    10
    7782    CLARK    10
    7788    SCOTT    20
    7788    SCOTT    20
    7839    KING    10
    7839    KING    10
    7844    TURNER    30
    7844    TURNER    30
    7876    ADAMS    20
    7876    ADAMS    20
    7900    JAMES    30
    7900    JAMES    30
    7902    FORD    20
    7902    FORD    20
    7934    MILLER    10
    7934    MILLER    10
    Time taken: 113.095 seconds, Fetched: 56 row(s)
    hive (yinzhengjie)>
    Join语句-表的别名(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e join dept d on e.deptno = d.deptno;)
    hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e join dept d on e.deptno = d.deptno;
    2018-08-09 23:41:10    Starting to launch local task to process map join;    maximum memory = 477626368
    2018-08-09 23:41:15    Dump the side-table for tag: 1 with group count: 4 into file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-40-54_618_7309603760212569588-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable
    2018-08-09 23:41:16    Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-40-54_618_7309603760212569588-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable (348 bytes)
    2018-08-09 23:41:16    End of local task; Time Taken: 5.741 sec.
    Execution completed successfully
    MapredLocal task succeeded
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1533789743141_0029, Tracking URL = http://s101:8088/proxy/application_1533789743141_0029/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0029
    Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
    2018-08-09 23:41:32,299 Stage-3 map = 0%,  reduce = 0%
    2018-08-09 23:41:46,692 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.69 sec
    MapReduce Total cumulative CPU time: 2 seconds 690 msec
    Ended Job = job_1533789743141_0029
    MapReduce Jobs Launched: 
    Stage-Stage-3: Map: 1   Cumulative CPU: 2.69 sec   HDFS Read: 8208 HDFS Write: 1543 SUCCESS
    Total MapReduce CPU Time Spent: 2 seconds 690 msec
    e.empno    e.ename    d.deptno
    7369    SMITH    20
    7369    SMITH    20
    7499    ALLEN    30
    7499    ALLEN    30
    7521    WARD    30
    7521    WARD    30
    7566    JONES    20
    7566    JONES    20
    7654    MARTIN    30
    7654    MARTIN    30
    7698    BLAKE    30
    7698    BLAKE    30
    7782    CLARK    10
    7782    CLARK    10
    7788    SCOTT    20
    7788    SCOTT    20
    7839    KING    10
    7839    KING    10
    7844    TURNER    30
    7844    TURNER    30
    7876    ADAMS    20
    7876    ADAMS    20
    7900    JAMES    30
    7900    JAMES    30
    7902    FORD    20
    7902    FORD    20
    7934    MILLER    10
    7934    MILLER    10
    7369    SMITH    20
    7369    SMITH    20
    7499    ALLEN    30
    7499    ALLEN    30
    7521    WARD    30
    7521    WARD    30
    7566    JONES    20
    7566    JONES    20
    7654    MARTIN    30
    7654    MARTIN    30
    7698    BLAKE    30
    7698    BLAKE    30
    7782    CLARK    10
    7782    CLARK    10
    7788    SCOTT    20
    7788    SCOTT    20
    7839    KING    10
    7839    KING    10
    7844    TURNER    30
    7844    TURNER    30
    7876    ADAMS    20
    7876    ADAMS    20
    7900    JAMES    30
    7900    JAMES    30
    7902    FORD    20
    7902    FORD    20
    7934    MILLER    10
    7934    MILLER    10
    Time taken: 53.142 seconds, Fetched: 56 row(s)
    hive (yinzhengjie)>
    Join语句-内连接(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e join dept d on e.deptno = d.deptno;)
    hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e left join dept d on e.deptno = d.deptno;
    2018-08-09 23:42:39    Starting to launch local task to process map join;    maximum memory = 477626368
    2018-08-09 23:42:43    Dump the side-table for tag: 1 with group count: 4 into file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-42-22_712_6649379300342030940-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile31--.hashtable
    2018-08-09 23:42:44    Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-42-22_712_6649379300342030940-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile31--.hashtable (348 bytes)
    2018-08-09 23:42:44    End of local task; Time Taken: 4.518 sec.
    Execution completed successfully
    MapredLocal task succeeded
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1533789743141_0030, Tracking URL = http://s101:8088/proxy/application_1533789743141_0030/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0030
    Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
    2018-08-09 23:43:07,580 Stage-3 map = 0%,  reduce = 0%
    2018-08-09 23:43:18,075 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.03 sec
    MapReduce Total cumulative CPU time: 2 seconds 30 msec
    Ended Job = job_1533789743141_0030
    MapReduce Jobs Launched: 
    Stage-Stage-3: Map: 1   Cumulative CPU: 2.03 sec   HDFS Read: 7874 HDFS Write: 1543 SUCCESS
    Total MapReduce CPU Time Spent: 2 seconds 30 msec
    e.empno    e.ename    d.deptno
    7369    SMITH    20
    7369    SMITH    20
    7499    ALLEN    30
    7499    ALLEN    30
    7521    WARD    30
    7521    WARD    30
    7566    JONES    20
    7566    JONES    20
    7654    MARTIN    30
    7654    MARTIN    30
    7698    BLAKE    30
    7698    BLAKE    30
    7782    CLARK    10
    7782    CLARK    10
    7788    SCOTT    20
    7788    SCOTT    20
    7839    KING    10
    7839    KING    10
    7844    TURNER    30
    7844    TURNER    30
    7876    ADAMS    20
    7876    ADAMS    20
    7900    JAMES    30
    7900    JAMES    30
    7902    FORD    20
    7902    FORD    20
    7934    MILLER    10
    7934    MILLER    10
    7369    SMITH    20
    7369    SMITH    20
    7499    ALLEN    30
    7499    ALLEN    30
    7521    WARD    30
    7521    WARD    30
    7566    JONES    20
    7566    JONES    20
    7654    MARTIN    30
    7654    MARTIN    30
    7698    BLAKE    30
    7698    BLAKE    30
    7782    CLARK    10
    7782    CLARK    10
    7788    SCOTT    20
    7788    SCOTT    20
    7839    KING    10
    7839    KING    10
    7844    TURNER    30
    7844    TURNER    30
    7876    ADAMS    20
    7876    ADAMS    20
    7900    JAMES    30
    7900    JAMES    30
    7902    FORD    20
    7902    FORD    20
    7934    MILLER    10
    7934    MILLER    10
    Time taken: 57.477 seconds, Fetched: 56 row(s)
    hive (yinzhengjie)>  
    Join语句-左外连接(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e left join dept d on e.deptno = d.deptno;)
    hive (yinzhengjie)>  select e.empno, e.ename, d.deptno from emp e right join dept d on e.deptno = d.deptno;
    2018-08-09 23:43:50    Starting to launch local task to process map join;    maximum memory = 477626368
    2018-08-09 23:43:54    Dump the side-table for tag: 0 with group count: 3 into file: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-43-32_208_373121853797344697-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile40--.hashtable
    2018-08-09 23:43:54    Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/46c2c137-93f5-4f30-9855-6b0d3d62c227/hive_2018-08-09_23-43-32_208_373121853797344697-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile40--.hashtable (697 bytes)
    2018-08-09 23:43:54    End of local task; Time Taken: 4.69 sec.
    Execution completed successfully
    MapredLocal task succeeded
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1533789743141_0031, Tracking URL = http://s101:8088/proxy/application_1533789743141_0031/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0031
    Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
    2018-08-09 23:44:21,028 Stage-3 map = 0%,  reduce = 0%
    2018-08-09 23:44:54,359 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.38 sec
    MapReduce Total cumulative CPU time: 2 seconds 380 msec
    Ended Job = job_1533789743141_0031
    MapReduce Jobs Launched: 
    Stage-Stage-3: Map: 1   Cumulative CPU: 2.38 sec   HDFS Read: 6395 HDFS Write: 1585 SUCCESS
    Total MapReduce CPU Time Spent: 2 seconds 380 msec
    e.empno    e.ename    d.deptno
    7782    CLARK    10
    7839    KING    10
    7934    MILLER    10
    7782    CLARK    10
    7839    KING    10
    7934    MILLER    10
    7369    SMITH    20
    7566    JONES    20
    7788    SCOTT    20
    7876    ADAMS    20
    7902    FORD    20
    7369    SMITH    20
    7566    JONES    20
    7788    SCOTT    20
    7876    ADAMS    20
    7902    FORD    20
    7499    ALLEN    30
    7521    WARD    30
    7654    MARTIN    30
    7698    BLAKE    30
    7844    TURNER    30
    7900    JAMES    30
    7499    ALLEN    30
    7521    WARD    30
    7654    MARTIN    30
    7698    BLAKE    30
    7844    TURNER    30
    7900    JAMES    30
    NULL    NULL    40
    7782    CLARK    10
    7839    KING    10
    7934    MILLER    10
    7782    CLARK    10
    7839    KING    10
    7934    MILLER    10
    7369    SMITH    20
    7566    JONES    20
    7788    SCOTT    20
    7876    ADAMS    20
    7902    FORD    20
    7369    SMITH    20
    7566    JONES    20
    7788    SCOTT    20
    7876    ADAMS    20
    7902    FORD    20
    7499    ALLEN    30
    7521    WARD    30
    7654    MARTIN    30
    7698    BLAKE    30
    7844    TURNER    30
    7900    JAMES    30
    7499    ALLEN    30
    7521    WARD    30
    7654    MARTIN    30
    7698    BLAKE    30
    7844    TURNER    30
    7900    JAMES    30
    NULL    NULL    40
    Time taken: 87.954 seconds, Fetched: 58 row(s)
    hive (yinzhengjie)> 
    Join语句-右外连接(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e right join dept d on e.deptno = d.deptno;)
    Join语句-满外连接(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e full join dept d on e.deptno = d.deptno;)
    hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e full join dept d on e.deptno = d.deptno;
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = yinzhengjie_20180809235025_e7e97788-2d65-45e0-b567-004f2d7057e0
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0035, Tracking URL = http://s101:8088/proxy/application_1533789743141_0035/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0035
    Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
    2018-08-09 23:50:45,807 Stage-1 map = 0%,  reduce = 0%
    2018-08-09 23:51:08,516 Stage-1 map = 50%,  reduce = 0%, Cumulative CPU 2.58 sec
    2018-08-09 23:51:14,735 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 4.88 sec
    2018-08-09 23:51:27,238 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.56 sec
    MapReduce Total cumulative CPU time: 7 seconds 560 msec
    Ended Job = job_1533789743141_0035
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 2  Reduce: 1   Cumulative CPU: 7.56 sec   HDFS Read: 17097 HDFS Write: 1585 SUCCESS
    Total MapReduce CPU Time Spent: 7 seconds 560 msec
    e.empno    e.ename    d.deptno
    7934    MILLER    10
    7934    MILLER    10
    7839    KING    10
    7839    KING    10
    7782    CLARK    10
    7782    CLARK    10
    7934    MILLER    10
    7934    MILLER    10
    7839    KING    10
    7839    KING    10
    7782    CLARK    10
    7782    CLARK    10
    7788    SCOTT    20
    7788    SCOTT    20
    7566    JONES    20
    7566    JONES    20
    7566    JONES    20
    7566    JONES    20
    7369    SMITH    20
    7369    SMITH    20
    7902    FORD    20
    7902    FORD    20
    7876    ADAMS    20
    7876    ADAMS    20
    7788    SCOTT    20
    7788    SCOTT    20
    7369    SMITH    20
    7369    SMITH    20
    7902    FORD    20
    7902    FORD    20
    7876    ADAMS    20
    7876    ADAMS    20
    7900    JAMES    30
    7900    JAMES    30
    7844    TURNER    30
    7844    TURNER    30
    7844    TURNER    30
    7844    TURNER    30
    7499    ALLEN    30
    7499    ALLEN    30
    7698    BLAKE    30
    7698    BLAKE    30
    7654    MARTIN    30
    7654    MARTIN    30
    7900    JAMES    30
    7900    JAMES    30
    7521    WARD    30
    7521    WARD    30
    7499    ALLEN    30
    7499    ALLEN    30
    7654    MARTIN    30
    7654    MARTIN    30
    7521    WARD    30
    7521    WARD    30
    7698    BLAKE    30
    7698    BLAKE    30
    NULL    NULL    40
    NULL    NULL    40
    Time taken: 63.838 seconds, Fetched: 58 row(s)
    hive (yinzhengjie)> 
    Join语句-满外连接(hive (yinzhengjie)> select e.empno, e.ename, d.deptno from emp e full join dept d on e.deptno = d.deptno;)
    Join语句-多表连接查询(hive (yinzhengjie)> SELECT e.ename, d.deptno, l. loc_name FROM   emp e JOIN   dept d ON     d.deptno = e.deptno JOIN   location l ON     d.loc = l.loc;)
    [yinzhengjie@s101 ~]$ cat /home/yinzhengjie/download/location.txt 
    1700    Beijing
    1800    London
    1900    Tokyo
    [yinzhengjie@s101 ~]$ 
        大多数情况下,Hive会对每对JOIN连接对象启动一个MapReduce任务。以下案例中会首先启动一个MapReduce job对表e和表d进行连接操作,
    然后会再启动一个MapReduce job将第一个MapReduce job的输出和表l;进行连接操作。
    hive (yinzhengjie)> create table if not exists yinzhengjie.location(
                      >     loc int,
                      >     loc_name string
                      > )
                      > row format delimited fields terminated by '	';                        #创建location表
    Time taken: 0.614 seconds
    hive (yinzhengjie)> load data local inpath '/home/yinzhengjie/download/location.txt' into table yinzhengjie.location;        #向表中导入数据
    Loading data to table yinzhengjie.location
    Time taken: 0.478 seconds
    hive (yinzhengjie)>
    hive (yinzhengjie)> SELECT e.ename, d.deptno, l. loc_name
                      > FROM   emp e
                      > JOIN   dept d
                      > ON     d.deptno = e.deptno 
                      > JOIN   location l
                      > ON     d.loc = l.loc;                    #多表连接查询
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = yinzhengjie_20180809235602_7fbd82df-9541-4b76-b5c4-9482d4aa2ccc
    Total jobs = 1
    2018-08-09 23:56:12    Starting to launch local task to process map join;    maximum memory = 477626368
    2018-08-09 23:56:16    Dump the side-table for tag: 1 with group count: 3 into file: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018-08-09_23-56-02_428_1537442849954313200-1/-local-10005/HashTable-Stage-5/MapJoin-mapfile01--.hashtable
    2018-08-09 23:56:16    Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018-08-09_23-56-02_428_1537442849954313200-1/-local-10005/HashTable-Stage-5/MapJoin-mapfile01--.hashtable (344 bytes)
    2018-08-09 23:56:16    Dump the side-table for tag: 1 with group count: 4 into file: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018-08-09_23-56-02_428_1537442849954313200-1/-local-10005/HashTable-Stage-5/MapJoin-mapfile11--.hashtable
    2018-08-09 23:56:16    Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018-08-09_23-56-02_428_1537442849954313200-1/-local-10005/HashTable-Stage-5/MapJoin-mapfile11--.hashtable (380 bytes)
    2018-08-09 23:56:16    End of local task; Time Taken: 3.928 sec.
    Execution completed successfully
    MapredLocal task succeeded
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1533789743141_0036, Tracking URL = http://s101:8088/proxy/application_1533789743141_0036/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0036
    Hadoop job information for Stage-5: number of mappers: 1; number of reducers: 0
    2018-08-09 23:56:37,193 Stage-5 map = 0%,  reduce = 0%
    2018-08-09 23:56:54,925 Stage-5 map = 100%,  reduce = 0%, Cumulative CPU 2.64 sec
    MapReduce Total cumulative CPU time: 2 seconds 640 msec
    Ended Job = job_1533789743141_0036
    MapReduce Jobs Launched: 
    Stage-Stage-5: Map: 1   Cumulative CPU: 2.64 sec   HDFS Read: 9513 HDFS Write: 865 SUCCESS
    Total MapReduce CPU Time Spent: 2 seconds 640 msec
    e.ename    d.deptno    l.loc_name
    SMITH    20    London
    ALLEN    30    Tokyo
    WARD    30    Tokyo
    JONES    20    London
    MARTIN    30    Tokyo
    BLAKE    30    Tokyo
    CLARK    10    Beijing
    SCOTT    20    London
    KING    10    Beijing
    TURNER    30    Tokyo
    ADAMS    20    London
    JAMES    30    Tokyo
    FORD    20    London
    MILLER    10    Beijing
    SMITH    20    London
    ALLEN    30    Tokyo
    WARD    30    Tokyo
    JONES    20    London
    MARTIN    30    Tokyo
    BLAKE    30    Tokyo
    CLARK    10    Beijing
    SCOTT    20    London
    KING    10    Beijing
    TURNER    30    Tokyo
    ADAMS    20    London
    JAMES    30    Tokyo
    FORD    20    London
    MILLER    10    Beijing
    Time taken: 56.659 seconds, Fetched: 28 row(s)
    hive (yinzhengjie)> 
    Join语句-多表连接查询(hive (yinzhengjie)> SELECT e.ename, d.deptno, l. loc_name FROM emp e JOIN dept d ON d.deptno = e.deptno JOIN location l ON d.loc = l.loc;)
    Join语句-笛卡尔积(hive (yinzhengjie)> select * from emp, dept;)
    hive (yinzhengjie)> set hive.mapred.mode=strict;
    hive (yinzhengjie)> set hive.mapred.mode;
    hive (yinzhengjie)> select * from emp, dept;                    #在strict模式执行笛卡尔积操作是失败的
    FAILED: SemanticException Cartesian products are disabled for safety reasons. If you know what you are doing, please make sure that hive.strict.checks.cartesian.product is set to false and that hive.mapred.mode is not set to 'strict' to enable them.
    hive (yinzhengjie)> 
    hive (yinzhengjie)> set hive.mapred.mode=nonstrict;
    hive (yinzhengjie)> set hive.mapred.mode;
    hive (yinzhengjie)> select empno, deptno from emp, dept;
    FAILED: SemanticException Column deptno Found in more than One Tables/Subqueries
    hive (yinzhengjie)> select * from emp, dept;                    #在nonstrict模式执行笛卡尔积操作是可以的,但不推荐使用这样的查询语句,意义不大!
    Warning: Map Join MAPJOIN[9][bigTable=?] in task 'Stage-3:MAPRED' is a cross product
    WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
    Query ID = yinzhengjie_20180810000249_98e28c13-db4d-4e2b-81c6-28e44bf51f1d
    Total jobs = 1
    2018-08-10 00:03:00    Starting to launch local task to process map join;    maximum memory = 477626368
    2018-08-10 00:03:04    Dump the side-table for tag: 1 with group count: 1 into file: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018-08-10_00-02-49_246_882868568149391185-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable
    2018-08-10 00:03:04    Uploaded 1 File to: file:/home/yinzhengjie/yinzhengjie/85f0ef7d-ce74-41a8-942e-d1798288e72b/hive_2018-08-10_00-02-49_246_882868568149391185-1/-local-10004/HashTable-Stage-3/MapJoin-mapfile21--.hashtable (418 bytes)
    2018-08-10 00:03:04    End of local task; Time Taken: 3.916 sec.
    Execution completed successfully
    MapredLocal task succeeded
    Launching Job 1 out of 1
    Number of reduce tasks is set to 0 since there's no reduce operator
    Starting Job = job_1533789743141_0037, Tracking URL = http://s101:8088/proxy/application_1533789743141_0037/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0037
    Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 0
    2018-08-10 00:03:27,349 Stage-3 map = 0%,  reduce = 0%
    2018-08-10 00:03:40,822 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 1.8 sec
    MapReduce Total cumulative CPU time: 1 seconds 800 msec
    Ended Job = job_1533789743141_0037
    MapReduce Jobs Launched: 
    Stage-Stage-3: Map: 1   Cumulative CPU: 1.8 sec   HDFS Read: 8853 HDFS Write: 17375 SUCCESS
    Total MapReduce CPU Time Spent: 1 seconds 800 msec
    emp.empno    emp.ename    emp.job    emp.mgr    emp.hiredate    emp.sal    emp.comm    emp.deptno    dept.deptno    dept.dname    dept.loc
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    10    ACCOUNTING    2700
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    20    RESEARCH    3800
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    30    SALES    5900
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    40    OPERATIONS    4700
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    10    ACCOUNTING    1700
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    20    RESEARCH    1800
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    30    SALES    1900
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    40    OPERATIONS    1700
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    10    ACCOUNTING    2700
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    20    RESEARCH    3800
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    30    SALES    5900
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    40    OPERATIONS    4700
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    10    ACCOUNTING    1700
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    20    RESEARCH    1800
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    30    SALES    1900
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    40    OPERATIONS    1700
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    10    ACCOUNTING    2700
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    20    RESEARCH    3800
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    30    SALES    5900
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    40    OPERATIONS    4700
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    10    ACCOUNTING    1700
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    20    RESEARCH    1800
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    30    SALES    1900
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    40    OPERATIONS    1700
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    10    ACCOUNTING    2700
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    20    RESEARCH    3800
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    30    SALES    5900
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    40    OPERATIONS    4700
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    10    ACCOUNTING    1700
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    20    RESEARCH    1800
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    30    SALES    1900
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    40    OPERATIONS    1700
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    10    ACCOUNTING    2700
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    20    RESEARCH    3800
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    30    SALES    5900
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    40    OPERATIONS    4700
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    10    ACCOUNTING    1700
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    20    RESEARCH    1800
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    30    SALES    1900
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    40    OPERATIONS    1700
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    10    ACCOUNTING    2700
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    20    RESEARCH    3800
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    30    SALES    5900
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    40    OPERATIONS    4700
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    10    ACCOUNTING    1700
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    20    RESEARCH    1800
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    30    SALES    1900
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    40    OPERATIONS    1700
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    10    ACCOUNTING    2700
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    20    RESEARCH    3800
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    30    SALES    5900
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    40    OPERATIONS    4700
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    10    ACCOUNTING    1700
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    20    RESEARCH    1800
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    30    SALES    1900
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    40    OPERATIONS    1700
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    10    ACCOUNTING    2700
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    20    RESEARCH    3800
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    30    SALES    5900
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    40    OPERATIONS    4700
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    10    ACCOUNTING    1700
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    20    RESEARCH    1800
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    30    SALES    1900
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    40    OPERATIONS    1700
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    10    ACCOUNTING    2700
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    20    RESEARCH    3800
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    30    SALES    5900
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    40    OPERATIONS    4700
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    10    ACCOUNTING    1700
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    20    RESEARCH    1800
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    30    SALES    1900
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    40    OPERATIONS    1700
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    10    ACCOUNTING    2700
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    20    RESEARCH    3800
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    30    SALES    5900
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    40    OPERATIONS    4700
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    10    ACCOUNTING    1700
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    20    RESEARCH    1800
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    30    SALES    1900
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    40    OPERATIONS    1700
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    10    ACCOUNTING    2700
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    20    RESEARCH    3800
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    30    SALES    5900
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    40    OPERATIONS    4700
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    10    ACCOUNTING    1700
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    20    RESEARCH    1800
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    30    SALES    1900
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    40    OPERATIONS    1700
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    10    ACCOUNTING    2700
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    20    RESEARCH    3800
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    30    SALES    5900
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    40    OPERATIONS    4700
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    10    ACCOUNTING    1700
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    20    RESEARCH    1800
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    30    SALES    1900
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    40    OPERATIONS    1700
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    10    ACCOUNTING    2700
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    20    RESEARCH    3800
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    30    SALES    5900
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    40    OPERATIONS    4700
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    10    ACCOUNTING    1700
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    20    RESEARCH    1800
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    30    SALES    1900
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    40    OPERATIONS    1700
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    10    ACCOUNTING    2700
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    20    RESEARCH    3800
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    30    SALES    5900
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    40    OPERATIONS    4700
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    10    ACCOUNTING    1700
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    20    RESEARCH    1800
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    30    SALES    1900
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    40    OPERATIONS    1700
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    10    ACCOUNTING    2700
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    20    RESEARCH    3800
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    30    SALES    5900
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    40    OPERATIONS    4700
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    10    ACCOUNTING    1700
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    20    RESEARCH    1800
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    30    SALES    1900
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20    40    OPERATIONS    1700
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    10    ACCOUNTING    2700
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    20    RESEARCH    3800
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    30    SALES    5900
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    40    OPERATIONS    4700
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    10    ACCOUNTING    1700
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    20    RESEARCH    1800
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    30    SALES    1900
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30    40    OPERATIONS    1700
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    10    ACCOUNTING    2700
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    20    RESEARCH    3800
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    30    SALES    5900
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    40    OPERATIONS    4700
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    10    ACCOUNTING    1700
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    20    RESEARCH    1800
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    30    SALES    1900
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30    40    OPERATIONS    1700
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    10    ACCOUNTING    2700
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    20    RESEARCH    3800
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    30    SALES    5900
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    40    OPERATIONS    4700
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    10    ACCOUNTING    1700
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    20    RESEARCH    1800
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    30    SALES    1900
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20    40    OPERATIONS    1700
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    10    ACCOUNTING    2700
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    20    RESEARCH    3800
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    30    SALES    5900
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    40    OPERATIONS    4700
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    10    ACCOUNTING    1700
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    20    RESEARCH    1800
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    30    SALES    1900
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30    40    OPERATIONS    1700
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    10    ACCOUNTING    2700
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    20    RESEARCH    3800
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    30    SALES    5900
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    40    OPERATIONS    4700
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    10    ACCOUNTING    1700
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    20    RESEARCH    1800
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    30    SALES    1900
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30    40    OPERATIONS    1700
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    10    ACCOUNTING    2700
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    20    RESEARCH    3800
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    30    SALES    5900
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    40    OPERATIONS    4700
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    10    ACCOUNTING    1700
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    20    RESEARCH    1800
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    30    SALES    1900
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10    40    OPERATIONS    1700
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    10    ACCOUNTING    2700
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    20    RESEARCH    3800
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    30    SALES    5900
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    40    OPERATIONS    4700
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    10    ACCOUNTING    1700
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    20    RESEARCH    1800
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    30    SALES    1900
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20    40    OPERATIONS    1700
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    10    ACCOUNTING    2700
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    20    RESEARCH    3800
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    30    SALES    5900
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    40    OPERATIONS    4700
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    10    ACCOUNTING    1700
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    20    RESEARCH    1800
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    30    SALES    1900
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10    40    OPERATIONS    1700
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    10    ACCOUNTING    2700
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    20    RESEARCH    3800
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    30    SALES    5900
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    40    OPERATIONS    4700
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    10    ACCOUNTING    1700
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    20    RESEARCH    1800
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    30    SALES    1900
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30    40    OPERATIONS    1700
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    10    ACCOUNTING    2700
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    20    RESEARCH    3800
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    30    SALES    5900
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    40    OPERATIONS    4700
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    10    ACCOUNTING    1700
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    20    RESEARCH    1800
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    30    SALES    1900
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20    40    OPERATIONS    1700
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    10    ACCOUNTING    2700
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    20    RESEARCH    3800
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    30    SALES    5900
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    40    OPERATIONS    4700
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    10    ACCOUNTING    1700
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    20    RESEARCH    1800
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    30    SALES    1900
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30    40    OPERATIONS    1700
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    10    ACCOUNTING    2700
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    20    RESEARCH    3800
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    30    SALES    5900
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    40    OPERATIONS    4700
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    10    ACCOUNTING    1700
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    20    RESEARCH    1800
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    30    SALES    1900
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20    40    OPERATIONS    1700
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    10    ACCOUNTING    2700
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    20    RESEARCH    3800
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    30    SALES    5900
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    40    OPERATIONS    4700
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    10    ACCOUNTING    1700
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    20    RESEARCH    1800
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    30    SALES    1900
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10    40    OPERATIONS    1700
    Time taken: 52.698 seconds, Fetched: 224 row(s)
    hive (yinzhengjie)> 
    Join语句-笛卡尔积,不推荐使用,我们应该避免笛卡尔积的查询,因为在实际生产环境中使用笛卡尔积查询对hadoop的集群是压力是很大的,如果集群配置低的话很可能让整个集群崩掉!!!(hive (yinzhengjie)> select * from emp, dept;)
    排序-全局排序(hive (yinzhengjie)> select * from emp order by sal desc;)
        Order By:全局排序,一个MapReduce
            1>.使用 ORDER BY 子句排序
                ASC(ascend): 升序(默认)
                DESC(descend): 降序
            2>.ORDER BY 子句在SELECT语句的结尾。
    hive (yinzhengjie)> select * from emp order by sal;                                #查询员工信息按工资升序排列,默认就是升序排列
    Query ID = yinzhengjie_20180810001838_6c529433-c84b-447d-89e0-16af47dc89eb
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0039, Tracking URL = http://s101:8088/proxy/application_1533789743141_0039/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0039
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-08-10 00:18:56,082 Stage-1 map = 0%,  reduce = 0%
    2018-08-10 00:19:37,122 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.66 sec
    2018-08-10 00:19:59,288 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.41 sec
    MapReduce Total cumulative CPU time: 4 seconds 410 msec
    Ended Job = job_1533789743141_0039
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.41 sec   HDFS Read: 10952 HDFS Write: 1745 SUCCESS
    Total MapReduce CPU Time Spent: 4 seconds 410 msec
    emp.empno    emp.ename    emp.job    emp.mgr    emp.hiredate    emp.sal    emp.comm    emp.deptno
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10
    Time taken: 82.564 seconds, Fetched: 28 row(s)
    hive (yinzhengjie)>
    hive (yinzhengjie)> select * from emp order by sal desc;                        #查询员工信息按工资降序排列
    Query ID = yinzhengjie_20180810002012_ebf1251c-c92b-4010-bea7-bb8a2c34ebdb
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0040, Tracking URL = http://s101:8088/proxy/application_1533789743141_0040/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0040
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-08-10 00:20:30,216 Stage-1 map = 0%,  reduce = 0%
    2018-08-10 00:20:44,683 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.47 sec
    2018-08-10 00:21:00,184 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.31 sec
    MapReduce Total cumulative CPU time: 5 seconds 310 msec
    Ended Job = job_1533789743141_0040
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.31 sec   HDFS Read: 10952 HDFS Write: 1745 SUCCESS
    Total MapReduce CPU Time Spent: 5 seconds 310 msec
    emp.empno    emp.ename    emp.job    emp.mgr    emp.hiredate    emp.sal    emp.comm    emp.deptno
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20
    Time taken: 51.103 seconds, Fetched: 28 row(s)
    hive (yinzhengjie)>
    排序-全局排序(hive (yinzhengjie)> select * from emp order by sal desc;)
    排序-按照别名排序(hive (yinzhengjie)> select ename, sal*2 twosal from emp order by twosal;)
    hive (yinzhengjie)> select ename, sal*2 twosal from emp order by twosal;            #按照员工薪水的2倍排序
    Query ID = yinzhengjie_20180810002258_b9f73ab7-2a29-459a-9b27-119eb56f1dde
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0041, Tracking URL = http://s101:8088/proxy/application_1533789743141_0041/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0041
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-08-10 00:23:17,109 Stage-1 map = 0%,  reduce = 0%
    2018-08-10 00:23:29,497 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.6 sec
    2018-08-10 00:23:41,890 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 4.99 sec
    MapReduce Total cumulative CPU time: 4 seconds 990 msec
    Ended Job = job_1533789743141_0041
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 4.99 sec   HDFS Read: 10079 HDFS Write: 789 SUCCESS
    Total MapReduce CPU Time Spent: 4 seconds 990 msec
    ename    twosal
    SMITH    1600.0
    SMITH    1600.0
    JAMES    1900.0
    JAMES    1900.0
    ADAMS    2200.0
    ADAMS    2200.0
    WARD    2500.0
    WARD    2500.0
    MARTIN    2500.0
    MARTIN    2500.0
    MILLER    2600.0
    MILLER    2600.0
    TURNER    3000.0
    TURNER    3000.0
    ALLEN    3200.0
    ALLEN    3200.0
    CLARK    4900.0
    CLARK    4900.0
    BLAKE    5700.0
    BLAKE    5700.0
    JONES    5950.0
    JONES    5950.0
    SCOTT    6000.0
    SCOTT    6000.0
    FORD    6000.0
    FORD    6000.0
    KING    10000.0
    KING    10000.0
    Time taken: 44.517 seconds, Fetched: 28 row(s)
    hive (yinzhengjie)> 
    排序-按照别名排序(hive (yinzhengjie)> select ename, sal*2 twosal from emp order by twosal;)
    排序-多个列排序(hive (yinzhengjie)> select ename, deptno, sal from emp order by deptno, sal ;)
    hive (yinzhengjie)> select ename, deptno, sal from emp order by deptno, sal ;                #按照部门和工资升序排序
    Query ID = yinzhengjie_20180810002405_c29a1508-8152-4d7c-9b50-e2fc04c8bdbc
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0042, Tracking URL = http://s101:8088/proxy/application_1533789743141_0042/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0042
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-08-10 00:24:21,693 Stage-1 map = 0%,  reduce = 0%
    2018-08-10 00:24:35,159 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.77 sec
    2018-08-10 00:24:44,565 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.85 sec
    MapReduce Total cumulative CPU time: 3 seconds 850 msec
    Ended Job = job_1533789743141_0042
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 3.85 sec   HDFS Read: 9332 HDFS Write: 867 SUCCESS
    Total MapReduce CPU Time Spent: 3 seconds 850 msec
    ename    deptno    sal
    MILLER    10    1300.0
    MILLER    10    1300.0
    CLARK    10    2450.0
    CLARK    10    2450.0
    KING    10    5000.0
    KING    10    5000.0
    SMITH    20    800.0
    SMITH    20    800.0
    ADAMS    20    1100.0
    ADAMS    20    1100.0
    JONES    20    2975.0
    JONES    20    2975.0
    FORD    20    3000.0
    SCOTT    20    3000.0
    FORD    20    3000.0
    SCOTT    20    3000.0
    JAMES    30    950.0
    JAMES    30    950.0
    WARD    30    1250.0
    MARTIN    30    1250.0
    MARTIN    30    1250.0
    WARD    30    1250.0
    TURNER    30    1500.0
    TURNER    30    1500.0
    ALLEN    30    1600.0
    ALLEN    30    1600.0
    BLAKE    30    2850.0
    BLAKE    30    2850.0
    Time taken: 39.975 seconds, Fetched: 28 row(s)
    hive (yinzhengjie)> 
    排序-多个列排序(hive (yinzhengjie)> select ename, deptno, sal from emp order by deptno, sal ;)
    排序-每个MapReduce内部排序(hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result' ROW FORMAT DELIMITED FIELDS TERMINATED BY '	' select * from emp sort by deptno desc;)
    hive (yinzhengjie)> set mapreduce.job.reduces=3;                    #设置reduce个数
    hive (yinzhengjie)> set mapreduce.job.reduces;                        #查看设置reduce个数
    hive (yinzhengjie)>  
    hive (yinzhengjie)>  select * from emp sort by empno desc;            #根据部门编号降序查看员工信息
    Query ID = yinzhengjie_20180810002752_cd4d7e0d-be26-4053-8730-9379c1632a3a
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Defaulting to jobconf value of: 3
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0043, Tracking URL = http://s101:8088/proxy/application_1533789743141_0043/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0043
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
    2018-08-10 00:28:08,954 Stage-1 map = 0%,  reduce = 0%
    2018-08-10 00:28:20,313 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.02 sec
    2018-08-10 00:28:33,921 Stage-1 map = 100%,  reduce = 11%, Cumulative CPU 2.45 sec
    2018-08-10 00:28:36,045 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.76 sec
    2018-08-10 00:28:37,074 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 7.48 sec
    2018-08-10 00:28:54,683 Stage-1 map = 100%,  reduce = 89%, Cumulative CPU 10.02 sec
    2018-08-10 00:28:57,007 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 10.69 sec
    MapReduce Total cumulative CPU time: 10 seconds 690 msec
    Ended Job = job_1533789743141_0043
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 10.69 sec   HDFS Read: 20664 HDFS Write: 1919 SUCCESS
    Total MapReduce CPU Time Spent: 10 seconds 690 msec
    emp.empno    emp.ename    emp.job    emp.mgr    emp.hiredate    emp.sal    emp.comm    emp.deptno
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20
    Time taken: 67.599 seconds, Fetched: 28 row(s)
    hive (yinzhengjie)>    
    hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result' ROW FORMAT DELIMITED FIELDS TERMINATED BY '	' select * from emp sort by deptno desc;          #将查询结果导入到文件中(按照部门编号降序排序)
    Query ID = yinzhengjie_20180810003404_42a220b7-02c7-42ae-bf8a-566c6300f4c3
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Defaulting to jobconf value of: 3
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0045, Tracking URL = http://s101:8088/proxy/application_1533789743141_0045/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0045
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
    2018-08-10 00:34:28,526 Stage-1 map = 0%,  reduce = 0%
    2018-08-10 00:34:37,987 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.22 sec
    2018-08-10 00:34:46,345 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 3.35 sec
    2018-08-10 00:34:49,548 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 5.71 sec
    2018-08-10 00:35:05,098 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 7.57 sec
    MapReduce Total cumulative CPU time: 7 seconds 570 msec
    Ended Job = job_1533789743141_0045
    Moving data to local directory /home/yinzhengjie/download/sortby-result
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 7.57 sec   HDFS Read: 19815 HDFS Write: 1322 SUCCESS
    Total MapReduce CPU Time Spent: 7 seconds 570 msec
    emp.empno    emp.ename    emp.job    emp.mgr    emp.hiredate    emp.sal    emp.comm    emp.deptno
    Time taken: 62.425 seconds
    hive (yinzhengjie)> 
    排序-每个MapReduce内部排序(hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result' ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' select * from emp sort by deptno desc;)
    排序-分区排序(hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result'  ROW FORMAT DELIMITED FIELDS TERMINATED BY '	'  select * from emp distribute by deptno sort by empno desc;)
        Distribute By:类似MR中partition,进行分区,结合sort by使用。
        温馨提示,Hive要求DISTRIBUTE BY语句要写在SORT BY语句之前。对于distribute by进行测试,一定要分配多reduce进行处理,否则无法看到distribute by的效果。
    hive (yinzhengjie)> set mapreduce.job.reduces;
    hive (yinzhengjie)> 
    hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result'  ROW FORMAT DELIMITED FIELDS TERMINATED BY '	'  select * from emp distribute by deptno sort by empno desc;            #先按照部门编号分区,再按照员工编号降序排序。
    Query ID = yinzhengjie_20180810003826_af885657-4f0a-4e2a-83f3-62cbdabda4f3
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Defaulting to jobconf value of: 3
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0046, Tracking URL = http://s101:8088/proxy/application_1533789743141_0046/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0046
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
    2018-08-10 00:38:46,632 Stage-1 map = 0%,  reduce = 0%
    2018-08-10 00:39:27,774 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.07 sec
    2018-08-10 00:39:45,945 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.54 sec
    2018-08-10 00:39:50,095 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 6.44 sec
    2018-08-10 00:39:51,122 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.78 sec
    MapReduce Total cumulative CPU time: 8 seconds 780 msec
    Ended Job = job_1533789743141_0046
    Moving data to local directory /home/yinzhengjie/download/sortby-result
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 8.78 sec   HDFS Read: 19858 HDFS Write: 1322 SUCCESS
    Total MapReduce CPU Time Spent: 8 seconds 780 msec
    emp.empno    emp.ename    emp.job    emp.mgr    emp.hiredate    emp.sal    emp.comm    emp.deptno
    Time taken: 86.59 seconds
    hive (yinzhengjie)>
    排序-分区排序(hive (yinzhengjie)> insert overwrite local directory '/home/yinzhengjie/download/sortby-result' ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' select * from emp distribute by deptno sort by empno desc;)
    排序-Cluster By(hive (yinzhengjie)> select * from emp cluster by deptno;)
        当distribute by和sorts by字段相同时,可以使用cluster by方式。
            cluster by除了具有distribute by的功能外还兼具sort by的功能。但是排序只能是倒序排序,不能指定排序规则为ASC或者DESC。
    hive (yinzhengjie)> select * from emp cluster by deptno;
    Query ID = yinzhengjie_20180810004115_0faf59ba-950a-4f86-885a-00865338c95c
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Defaulting to jobconf value of: 3
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0047, Tracking URL = http://s101:8088/proxy/application_1533789743141_0047/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0047
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
    2018-08-10 00:41:31,323 Stage-1 map = 0%,  reduce = 0%
    2018-08-10 00:41:40,638 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.21 sec
    2018-08-10 00:41:49,985 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 3.64 sec
    2018-08-10 00:41:58,261 Stage-1 map = 100%,  reduce = 67%, Cumulative CPU 5.93 sec
    2018-08-10 00:42:13,824 Stage-1 map = 100%,  reduce = 89%, Cumulative CPU 8.2 sec
    2018-08-10 00:42:16,943 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 8.97 sec
    MapReduce Total cumulative CPU time: 8 seconds 970 msec
    Ended Job = job_1533789743141_0047
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 8.97 sec   HDFS Read: 20707 HDFS Write: 1919 SUCCESS
    Total MapReduce CPU Time Spent: 8 seconds 970 msec
    emp.empno    emp.ename    emp.job    emp.mgr    emp.hiredate    emp.sal    emp.comm    emp.deptno
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20
    Time taken: 64.632 seconds, Fetched: 28 row(s)
    hive (yinzhengjie)> select * from emp distribute by deptno sort by deptno;
    Query ID = yinzhengjie_20180810004343_d5ce078f-80a7-4762-8a00-a75b6a97f7b2
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks not specified. Defaulting to jobconf value of: 3
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1533789743141_0048, Tracking URL = http://s101:8088/proxy/application_1533789743141_0048/
    Kill Command = /soft/hadoop-2.7.3/bin/hadoop job  -kill job_1533789743141_0048
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 3
    2018-08-10 00:43:58,038 Stage-1 map = 0%,  reduce = 0%
    2018-08-10 00:44:10,447 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.51 sec
    2018-08-10 00:44:23,055 Stage-1 map = 100%,  reduce = 33%, Cumulative CPU 4.62 sec
    2018-08-10 00:44:29,343 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 10.22 sec
    MapReduce Total cumulative CPU time: 10 seconds 220 msec
    Ended Job = job_1533789743141_0048
    MapReduce Jobs Launched: 
    Stage-Stage-1: Map: 1  Reduce: 3   Cumulative CPU: 10.22 sec   HDFS Read: 20707 HDFS Write: 1919 SUCCESS
    Total MapReduce CPU Time Spent: 10 seconds 220 msec
    emp.empno    emp.ename    emp.job    emp.mgr    emp.hiredate    emp.sal    emp.comm    emp.deptno
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30
    7844    TURNER    SALESMAN    7698    1981-9-8    1500.0    0.0    30
    7499    ALLEN    SALESMAN    7698    1981-2-20    1600.0    300.0    30
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30
    7900    JAMES    CLERK    7698    1981-12-3    950.0    NULL    30
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30
    7654    MARTIN    SALESMAN    7698    1981-9-28    1250.0    1400.0    30
    7698    BLAKE    MANAGER    7839    1981-5-1    2850.0    NULL    30
    7521    WARD    SALESMAN    7698    1981-2-22    1250.0    500.0    30
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10
    7934    MILLER    CLERK    7782    1982-1-23    1300.0    NULL    10
    7839    KING    PRESIDENT    NULL    1981-11-17    5000.0    NULL    10
    7782    CLARK    MANAGER    7839    1981-6-9    2450.0    NULL    10
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20
    7369    SMITH    CLERK    7902    1980-12-17    800.0    NULL    20
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20
    7788    SCOTT    ANALYST    7566    1987-4-19    3000.0    NULL    20
    7566    JONES    MANAGER    7839    1981-4-2    2975.0    NULL    20
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20
    7902    FORD    ANALYST    7566    1981-12-3    3000.0    NULL    20
    7876    ADAMS    CLERK    7788    1987-5-23    1100.0    NULL    20
    Time taken: 48.312 seconds, Fetched: 28 row(s)
    hive (yinzhengjie)> 
    排序-Cluster By(hive (yinzhengjie)> select * from emp cluster by deptno;)
    分桶表-分桶抽样查询(hive (yinzhengjie)>  select * from stu_buck tablesample(bucket 1 out of 4 on id);)
    hive (yinzhengjie)> select * from stu_buck;
    stu_buck.id    stu_buck.name
    1016    ss16
    1012    ss12
    1008    ss8
    1004    ss4
    1001    ss1
    1013    ss13
    1005    ss5
    1009    ss9
    1014    ss14
    1010    ss10
    1006    ss6
    1002    ss2
    1015    ss15
    1007    ss7
    1003    ss3
    1011    ss11
    Time taken: 0.073 seconds, Fetched: 16 row(s)
    hive (yinzhengjie)>  select * from stu_buck tablesample(bucket 1 out of 4 on id);        #查询表stu_buck中的数据。
    stu_buck.id    stu_buck.name
    1016    ss16
    1012    ss12
    1008    ss8
    1004    ss4
    Time taken: 0.088 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> 
    温馨提示:tablesample是抽样语句,语法:TABLESAMPLE(BUCKET x OUT OF y) 。
        x表示从哪个bucket开始抽取。例如,table总bucket数为4,tablesample(bucket 4 out of 4),表示总共抽取(4/4=)1个bucket的数据,抽取第4个bucket的数据。
        注意:x的值必须小于等于y的值,否则会抛异常,FAILED: SemanticException [Error 10061]: Numerator should not be bigger than denominator in sample clause for table stu_buck
    分桶表-分桶抽样查询(hive (yinzhengjie)> select * from stu_buck tablesample(bucket 1 out of 4 on id);)
    分桶表-数据块抽样(hive (yinzhengjie)> select * from stu_buck tablesample(0.1 percent);)
    hive (yinzhengjie)> select * from stu_buck;
    stu_buck.id    stu_buck.name
    1016    ss16
    1012    ss12
    1008    ss8
    1004    ss4
    1001    ss1
    1013    ss13
    1005    ss5
    1009    ss9
    1014    ss14
    1010    ss10
    1006    ss6
    1002    ss2
    1015    ss15
    1007    ss7
    1003    ss3
    1011    ss11
    Time taken: 0.078 seconds, Fetched: 16 row(s)
    hive (yinzhengjie)> select * from stu_buck tablesample(0.1 percent) ;            #注意,stu_buck是一个4和桶的桶表,因此他不会把桶表的数据都查询出来,因为它是从四个桶中随机抽取的一个桶的数据
    stu_buck.id    stu_buck.name
    1016    ss16
    1012    ss12
    1008    ss8
    1004    ss4
    Time taken: 0.04 seconds, Fetched: 4 row(s)
    hive (yinzhengjie)> select * from stu tablesample(0.1 percent) ;
    stu.id    stu.name
    1001    ss1
    1002    ss2
    1003    ss3
    1004    ss4
    1005    ss5
    1006    ss6
    1007    ss7
    1008    ss8
    1009    ss9
    1010    ss10
    1011    ss11
    1012    ss12
    1013    ss13
    1014    ss14
    1015    ss15
    1016    ss16
    Time taken: 0.059 seconds, Fetched: 16 row(s)
    hive (yinzhengjie)> 
    分桶表-数据块抽样(hive (yinzhengjie)> select * from stu_buck tablesample(0.1 percent);)


    hive (yinzhengjie)> show functions;                    #查看系统自带的函数
    hive (yinzhengjie)> desc function xpath;            #显示自带的函数的用法
    hive (yinzhengjie)> desc function extended xpath;    #详细显示自带的函数的用法
  • 相关阅读:
  • 原文地址:https://www.cnblogs.com/yinzhengjie/p/9154339.html
Copyright © 2011-2022 走看看