zoukankan      html  css  js  c++  java
  • hive 连接查询sql对比效率

    准备4个表

    从mysql 导出excel 转换为txt

    创建hive 表的导入文件

    create table bdqn_student(
    sno int,
    sname string,
    sbirthdate string,
    sgender string) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ’ STORED AS TEXTFILE;

    create table bdqn_teacher(
    tno int,
    tname string)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ’ STORED AS TEXTFILE;

    create table bdqn_course(
    cno int,
    cname string,
    tno int)
    ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ’ STORED AS TEXTFILE;

    create table bdqn_score(
    sno int,
    cno int,
    score string)

    ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ’ STORED AS TEXTFILE;

    Time taken: 4.246 seconds, Fetched: 1 row(s)
    hive> create table bdqn_student(

    sno int,
    sname string,
    sbirthdate string,
    sgender string);
    OK
    Time taken: 0.583 seconds
    hive> create table bdqn_teacher(
    tno int,
    tname string);
    OK
    Time taken: 0.106 seconds
    hive> create table bdqn_course(
    cno int,
    cname string,
    tno int);
    OK
    Time taken: 0.105 seconds
    hive>
    create table bdqn_score(
    sno int,
    cno int,
    score string);
    OK
    Time taken: 0.094 seconds

    Time taken: 0.094 seconds
    hive> show tables;
    OK
    bdqn_course
    bdqn_score
    bdqn_student
    bdqn_teacher
    ncdc
    Time taken: 0.021 seconds, Fetched: 5 row(s)

    一共四个表

    load data local inpath ‘/opt/hadoop/hadoopDATA/sql_Query_do_not_delete/course.txt’ into table bdqn_course
    load data local inpath ‘/opt/hadoop/hadoopDATA/sql_Query_do_not_delete/student.txt’ into table bdqn_student
    load data local inpath ‘/opt/hadoop/hadoopDATA/sql_Query_do_not_delete/teacher.txt’ into table bdqn_teacher
    load data local inpath ‘/opt/hadoop/hadoopDATA/sql_Query_do_not_delete/score.txt’ into table bdqn_score

    中文乱码问题解决:

    解决方法:
    1、修改远程linux机器的配置
    [root@rhel ~]#vi /etc/sysconfig/i18n
    把LANG改成支持UTF-8的字符集
    如: LANG=”zh_CN.UTF-8″ 或者是 LANG=”en_US.UTF-8″ 本文修改为后者
    2、修改Secure CRT的Session Options
    Options->Session Options->Appearance->Font->新宋体 字符集:中文GB2312 ->Character encoding 为UTF-8
    3、OK.

    查询:
    查询平均成绩大于等于60分的同学的学生编号和学生姓名和平均成绩(提示:子查询,分组)
    select st.sname, ascore from bdqn_student st join
    (select sno,avg(score) ascore from bdqn_score group by sno having avg(score)>=60) sc on sc.sno=st.sno

    hive> select st.sname, ascore from bdqn_student st join

    (select sno,avg(score) ascore from bdqn_score group by sno having avg(score)>=60) sc on sc.sno=st.sno;
    Total MapReduce jobs = 2
    Launching Job 1 out of 2
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
    set hive.exec.reducers.bytes.per.reducer=
    In order to limit the maximum number of reducers:
    set hive.exec.reducers.max=
    In order to set a constant number of reducers:
    set mapred.reduce.tasks=
    Starting Job = job_201507050950_0007, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201507050950_0007
    Kill Command = /opt/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507050950_0007
    Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
    2015-07-06 15:46:11,004 Stage-2 map = 0%, reduce = 0%
    2015-07-06 15:46:15,029 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
    2015-07-06 15:46:16,034 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
    2015-07-06 15:46:17,040 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
    2015-07-06 15:46:18,046 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
    2015-07-06 15:46:19,051 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
    2015-07-06 15:46:20,057 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
    2015-07-06 15:46:21,063 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
    2015-07-06 15:46:22,068 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
    2015-07-06 15:46:23,074 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
    2015-07-06 15:46:24,079 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.86 sec
    2015-07-06 15:46:25,090 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 5.08 sec
    2015-07-06 15:46:26,096 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 5.08 sec
    2015-07-06 15:46:27,102 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 5.08 sec
    2015-07-06 15:46:28,108 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 5.08 sec
    MapReduce Total cumulative CPU time: 5 seconds 80 msec
    Ended Job = job_201507050950_0007
    Launching Job 2 out of 2
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
    set hive.exec.reducers.bytes.per.reducer=
    In order to limit the maximum number of reducers:
    set hive.exec.reducers.max=
    In order to set a constant number of reducers:
    set mapred.reduce.tasks=
    Starting Job = job_201507050950_0008, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201507050950_0008
    Kill Command = /opt/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507050950_0008
    Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
    2015-07-06 15:46:35,818 Stage-1 map = 0%, reduce = 0%
    2015-07-06 15:46:39,836 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.85 sec
    2015-07-06 15:46:40,841 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.85 sec
    2015-07-06 15:46:41,848 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.69 sec
    2015-07-06 15:46:42,853 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.69 sec
    2015-07-06 15:46:43,859 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.69 sec
    2015-07-06 15:46:44,864 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.69 sec
    2015-07-06 15:46:45,869 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.69 sec
    2015-07-06 15:46:46,875 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 3.69 sec
    2015-07-06 15:46:47,880 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 3.69 sec
    2015-07-06 15:46:48,888 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.73 sec
    2015-07-06 15:46:49,894 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.73 sec
    2015-07-06 15:46:50,900 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.73 sec
    2015-07-06 15:46:51,906 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 6.73 sec
    MapReduce Total cumulative CPU time: 6 seconds 730 msec
    Ended Job = job_201507050950_0008
    MapReduce Jobs Launched:
    Job 0: Map: 1 Reduce: 1 Cumulative CPU: 5.08 sec HDFS Read: 377 HDFS Write: 226 SUCCESS
    Job 1: Map: 2 Reduce: 1 Cumulative CPU: 6.73 sec HDFS Read: 1109 HDFS Write: 73 SUCCESS
    Total MapReduce CPU Time Spent: 11 seconds 810 msec
    OK
    赵雷 89.66666666666667
    钱电 70.0
    孙风 80.0
    周梅 81.5
    郑竹 93.5
    Time taken: 51.375 seconds, Fetched: 5 row(s)

    Hive只支持在FROM子句中使用子查询,子查询必须有名字,并且列必须唯一:SELECT … FROM(subquery) name …

    这个如果要写成mapred的话,将会非常复杂,但是一个简单的子查询就搞定啦。也可以看到,其实这个查询是有两个job的。

    3. 查询所有同学的学生编号、学生姓名、选课总数、所有课程的总成绩
    

    select st.sname, ascore ,sum from bdqn_student st join
    (select sno,sum(score) ascore,count(*) sum from bdqn_score group by sno) sc on sc.sno=st.sno

    hive> select st.sname, ascore ,sum from bdqn_student st join

    (select sno,sum(score) ascore,count(*) sum from bdqn_score group by sno) sc on sc.sno=st.sno
    ;
    Total MapReduce jobs = 2
    Launching Job 1 out of 2
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
    set hive.exec.reducers.bytes.per.reducer=
    In order to limit the maximum number of reducers:
    set hive.exec.reducers.max=
    In order to set a constant number of reducers:
    set mapred.reduce.tasks=
    Starting Job = job_201507050950_0009, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201507050950_0009
    Kill Command = /opt/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507050950_0009
    Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
    2015-07-06 16:00:40,162 Stage-2 map = 0%, reduce = 0%
    2015-07-06 16:00:43,179 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
    2015-07-06 16:00:44,184 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
    2015-07-06 16:00:45,189 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
    2015-07-06 16:00:46,194 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
    2015-07-06 16:00:47,199 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
    2015-07-06 16:00:48,205 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
    2015-07-06 16:00:49,210 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
    2015-07-06 16:00:50,215 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 1.65 sec
    2015-07-06 16:00:51,220 Stage-2 map = 100%, reduce = 33%, Cumulative CPU 1.65 sec
    2015-07-06 16:00:52,225 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.57 sec
    2015-07-06 16:00:53,231 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.57 sec
    2015-07-06 16:00:54,236 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.57 sec
    2015-07-06 16:00:55,242 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.57 sec
    MapReduce Total cumulative CPU time: 4 seconds 570 msec
    Ended Job = job_201507050950_0009
    Launching Job 2 out of 2
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
    set hive.exec.reducers.bytes.per.reducer=
    In order to limit the maximum number of reducers:
    set hive.exec.reducers.max=
    In order to set a constant number of reducers:
    set mapred.reduce.tasks=
    Starting Job = job_201507050950_0010, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201507050950_0010
    Kill Command = /opt/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507050950_0010
    Hadoop job information for Stage-1: number of mappers: 2; number of reducers: 1
    2015-07-06 16:01:01,938 Stage-1 map = 0%, reduce = 0%
    2015-07-06 16:01:04,952 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.27 sec
    2015-07-06 16:01:05,957 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.27 sec
    2015-07-06 16:01:06,962 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 1.27 sec
    2015-07-06 16:01:07,967 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.64 sec
    2015-07-06 16:01:08,972 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.64 sec
    2015-07-06 16:01:09,978 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.64 sec
    2015-07-06 16:01:10,983 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.64 sec
    2015-07-06 16:01:11,988 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.64 sec
    2015-07-06 16:01:12,993 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.64 sec
    2015-07-06 16:01:13,999 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.64 sec
    2015-07-06 16:01:15,005 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.52 sec
    2015-07-06 16:01:16,011 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.52 sec
    2015-07-06 16:01:17,016 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 5.52 sec
    MapReduce Total cumulative CPU time: 5 seconds 520 msec
    Ended Job = job_201507050950_0010
    MapReduce Jobs Launched:
    Job 0: Map: 1 Reduce: 1 Cumulative CPU: 4.57 sec HDFS Read: 377 HDFS Write: 285 SUCCESS
    Job 1: Map: 2 Reduce: 1 Cumulative CPU: 5.52 sec HDFS Read: 1170 HDFS Write: 104 SUCCESS
    Total MapReduce CPU Time Spent: 10 seconds 90 msec
    OK
    赵雷 269.0 3
    钱电 210.0 3
    孙风 240.0 3
    李云 100.0 3
    周梅 163.0 2
    吴兰 65.0 2
    郑竹 187.0 2
    Time taken: 44.616 seconds, Fetched: 7 row(s)

    8. 查询没有学全所有课程的同学的信息
    

    select * from bdqn_student st join (
    select sno, count() from bdqn_score group by sno having count()<>3) temp on temp.sno=st.sno

    Time taken: 44.616 seconds, Fetched: 7 row(s)
    hive>

    select * from bdqn_student st join (
    select sno, count() from bdqn_score group by sno having count()<>3) temp on temp.sno=st.sno;
    Total MapReduce jobs = 2
    Launching Job 1 out of 2
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
    set hive.exec.reducers.bytes.per.reducer=
    In order to limit the maximum number of reducers:
    set hive.exec.reducers.max=
    In order to set a constant number of reducers:
    set mapred.reduce.tasks=
    Starting Job = job_201507050950_0011, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201507050950_0011
    Kill Command = /opt/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507050950_0011
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2015-07-06 16:05:29,038 Stage-1 map = 0%, reduce = 0%
    2015-07-06 16:05:32,051 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
    2015-07-06 16:05:33,057 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
    2015-07-06 16:05:34,062 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
    2015-07-06 16:05:35,067 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
    2015-07-06 16:05:36,072 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
    2015-07-06 16:05:37,077 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
    2015-07-06 16:05:38,082 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
    2015-07-06 16:05:39,088 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.21 sec
    2015-07-06 16:05:40,093 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 1.21 sec
    2015-07-06 16:05:41,098 Stage-1 map = 100%, reduce = 33%, Cumulative CPU 1.21 sec
    2015-07-06 16:05:42,103 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.63 sec
    2015-07-06 16:05:43,109 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.63 sec
    2015-07-06 16:05:44,115 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 4.63 sec
    MapReduce Total cumulative CPU time: 4 seconds 630 msec
    Ended Job = job_201507050950_0011
    Launching Job 2 out of 2
    Number of reduce tasks not specified. Estimated from input data size: 1
    In order to change the average load for a reducer (in bytes):
    set hive.exec.reducers.bytes.per.reducer=
    In order to limit the maximum number of reducers:
    set hive.exec.reducers.max=
    In order to set a constant number of reducers:
    set mapred.reduce.tasks=
    Starting Job = job_201507050950_0012, Tracking URL = http://master:50030/jobdetails.jsp?jobid=job_201507050950_0012
    Kill Command = /opt/hadoop/hadoop-1.2.1/libexec/../bin/hadoop job -kill job_201507050950_0012
    Hadoop job information for Stage-2: number of mappers: 2; number of reducers: 1
    2015-07-06 16:05:51,818 Stage-2 map = 0%, reduce = 0%
    2015-07-06 16:05:54,833 Stage-2 map = 50%, reduce = 0%, Cumulative CPU 1.0 sec
    2015-07-06 16:05:55,838 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.06 sec
    2015-07-06 16:05:56,844 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.06 sec
    2015-07-06 16:05:57,849 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.06 sec
    2015-07-06 16:05:58,854 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.06 sec
    2015-07-06 16:05:59,859 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.06 sec
    2015-07-06 16:06:00,865 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.06 sec
    2015-07-06 16:06:01,870 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 2.06 sec
    2015-07-06 16:06:02,875 Stage-2 map = 100%, reduce = 33%, Cumulative CPU 2.06 sec
    2015-07-06 16:06:03,881 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.92 sec
    2015-07-06 16:06:04,887 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.92 sec
    2015-07-06 16:06:05,893 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 4.92 sec
    MapReduce Total cumulative CPU time: 4 seconds 920 msec
    Ended Job = job_201507050950_0012
    MapReduce Jobs Launched:
    Job 0: Map: 1 Reduce: 1 Cumulative CPU: 4.63 sec HDFS Read: 377 HDFS Write: 153 SUCCESS
    Job 1: Map: 2 Reduce: 1 Cumulative CPU: 4.92 sec HDFS Read: 1038 HDFS Write: 79 SUCCESS
    Total MapReduce CPU Time Spent: 9 seconds 550 msec
    OK
    5 周梅 1991/12/1 女 5 2
    6 吴兰 1992/3/1 女 6 2
    7 郑竹 1989/7/1 女 7 2
    Time taken: 43.597 seconds, Fetched: 3 row(s)

  • 相关阅读:
    Windows关于文件句柄数的限制
    python 的未来5-10年的就业方向
    想着给要做的软件起一个名儿~
    sql server 特殊sql
    字节序之大小端_Intelx86是小端_网络传输是大端
    维基百科Wikipedia镜像网站列表
    NodeJS开启GZIP功能
    SqlServer2008 跨服务器同步数据
    Asp.net中的ViewState用法
    JAVA Netty入门Demo实例代码(自写测试可用)实现客户端服务器端互传数据
  • 原文地址:https://www.cnblogs.com/mrcharles/p/11879901.html
Copyright © 2011-2022 走看看