Spark SQL所支持的SQL语法
select [distinct] [column names]|[wildcard]
from tableName
[join clause tableName on join condition]
[where condition]
[group by column name]
[having conditions]
[order by column names [asc|desc]]
如果只用join进行查询,则支持的语法为:
select statement
from statement
[join | inner join | left join | left semi join | left outer join | right join |right outer join | full join | full outer join]
on join condition
Spark SQL的SQL的框架
与Hive Metastore结合
(1)Spark要能找到HDFS和Hive的配置文件
- 第1种方法:可以直接将core-site.xml、hdfs-site.xml和hive-site.xml复制到Spark安装目录下的conf目录中。该方法存在一个缺陷,如果HDFS或Hive的配置修改了,则需要手动修改Spark对应的配置文件。
- 第2种方法:在Spark配置文件中指定Hadoop配置文件目录
(2)Spark SQL与Hive Metastore结合,直接使用spark.sql(“select … from table where …”)
15.4 实例演示
(1)spark-shell
[root@node1 ~]# spark-shell
17/10/24 10:15:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://192.168.80.131:4040
Spark context available as 'sc' (master = local[*], app id = local-1508854525067).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_ / _ / _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_ version 2.2.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
Type in expressions to have them evaluated.
Type :help for more information.
scala> spark.sql("show databases").show
+------------+
|databaseName|
+------------+
| default|
| test|
+------------+
scala> spark.sql("show tables").show
+--------+---------+-----------+
|database|tableName|isTemporary|
+--------+---------+-----------+
| default| copyemp| false|
| default| demo| false|
| default| dept| false|
| default| dual| false|
| default| emp| false|
| default| empbak| false|
| default|employees| false|
| default| mytb| false|
| default| users| false|
+--------+---------+-----------+
scala> spark.sql("select * from emp").show
+----+------+---------+----+----------+------+------+----+
| eid| ename| job| mgr| hiredate| sal| comm| did|
+----+------+---------+----+----------+------+------+----+
|7782| CLARK| MANAGER|7839|1981-06-09|2450.0| 0.0| 10|
|7839| KING|PRESIDENT| 0|1981-11-17|5000.0| 0.0| 10|
|7934|MILLER| CLERK|7782|1982-01-23|1300.0| 0.0| 10|
|7369| SMITH| CLERK|7902|1980-12-17| 800.0| 0.0| 20|
|7566| JONES| MANAGER|7839|1981-04-02|2975.0| 0.0| 20|
|7902| FORD| ANALYST|7566|1981-12-03|3000.0| 0.0| 20|
|7499| ALLEN| SALESMAN|7698|1981-02-20|1600.0| 300.0| 30|
|7521| WARD| SALESMAN|7698|1981-02-22|1250.0| 500.0| 30|
|7654|MARTIN| SALESMAN|7698|1981-09-28|1250.0|1400.0| 30|
|7698| BLAKE| MANAGER|7839|1981-05-01|2850.0| 0.0| 30|
|7844|TURNER| SALESMAN|7698|1981-09-08|1500.0| 0.0| 30|
|7900| JAMES| CLERK|7698|1981-12-03| 950.0| 0.0| 30|
|8888|HADRON| null|null|2016-08-31|6666.0| null|null|
+----+------+---------+----+----------+------+------+----+
(2)spark-sql
[root@node1 ~]# spark-sql
17/10/24 10:17:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/10/24 10:17:32 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
17/10/24 10:17:32 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
spark-sql> show databases;
default
test
Time taken: 3.93 seconds, Fetched 2 row(s)
spark-sql> show tables;
default copyemp false
default demo false
default dept false
default dual false
default emp false
default empbak false
default employees false
default mytb false
default users false
Time taken: 0.145 seconds, Fetched 9 row(s)
spark-sql> select * from emp;
7782 CLARK MANAGER 7839 1981-06-09 2450.0 0.0 10
7839 KING PRESIDENT 0 1981-11-17 5000.0 0.0 10
7934 MILLER CLERK 7782 1982-01-23 1300.0 0.0 10
7369 SMITH CLERK 7902 1980-12-17 800.0 0.0 20
7566 JONES MANAGER 7839 1981-04-02 2975.0 0.0 20
7902 FORD ANALYST 7566 1981-12-03 3000.0 0.0 20
7499 ALLEN SALESMAN 7698 1981-02-20 1600.0 300.0 30
7521 WARD SALESMAN 7698 1981-02-22 1250.0 500.0 30
7654 MARTIN SALESMAN 7698 1981-09-28 1250.0 1400.0 30
7698 BLAKE MANAGER 7839 1981-05-01 2850.0 0.0 30
7844 TURNER SALESMAN 7698 1981-09-08 1500.0 0.0 30
7900 JAMES CLERK 7698 1981-12-03 950.0 0.0 30
8888 HADRON NULL NULL 2016-08-31 6666.0 NULL NULL
Time taken: 3.266 seconds, Fetched 13 row(s)
spark-sql>