zoukankan      html  css  js  c++  java
  • Spark2.x学习笔记:Spark SQL的SQL

    Spark SQL所支持的SQL语法

    select [distinct] [column names]|[wildcard]
    from tableName
    [join clause tableName on join condition]
    [where condition]
    [group by column name]
    [having conditions]
    [order by column names [asc|desc]]

    如果只用join进行查询,则支持的语法为:

    select statement
    from statement
    [join | inner join | left join | left semi join | left outer join | right join |right outer join | full join | full outer join]
    on join condition

    Spark SQL的SQL的框架

    与Hive Metastore结合

    (1)Spark要能找到HDFS和Hive的配置文件

    • 第1种方法:可以直接将core-site.xml、hdfs-site.xml和hive-site.xml复制到Spark安装目录下的conf目录中。该方法存在一个缺陷,如果HDFS或Hive的配置修改了,则需要手动修改Spark对应的配置文件。
    • 第2种方法:在Spark配置文件中指定Hadoop配置文件目录

    (2)Spark SQL与Hive Metastore结合,直接使用spark.sql(“select … from table where …”)

    15.4 实例演示

    (1)spark-shell

    [root@node1 ~]# spark-shell
    17/10/24 10:15:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Spark context Web UI available at http://192.168.80.131:4040
    Spark context available as 'sc' (master = local[*], app id = local-1508854525067).
    Spark session available as 'spark'.
    Welcome to
          ____              __
         / __/__  ___ _____/ /__
        _ / _ / _ `/ __/  '_/
       /___/ .__/\_,_/_/ /_/\_   version 2.2.0
          /_/
    
    Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_112)
    Type in expressions to have them evaluated.
    Type :help for more information.
    
    scala> spark.sql("show databases").show
    +------------+
    |databaseName|
    +------------+
    |     default|
    |        test|
    +------------+
    
    
    scala> spark.sql("show tables").show
    +--------+---------+-----------+
    |database|tableName|isTemporary|
    +--------+---------+-----------+
    | default|  copyemp|      false|
    | default|     demo|      false|
    | default|     dept|      false|
    | default|     dual|      false|
    | default|      emp|      false|
    | default|   empbak|      false|
    | default|employees|      false|
    | default|     mytb|      false|
    | default|    users|      false|
    +--------+---------+-----------+
    
    
    scala> spark.sql("select * from emp").show
    +----+------+---------+----+----------+------+------+----+
    | eid| ename|      job| mgr|  hiredate|   sal|  comm| did|
    +----+------+---------+----+----------+------+------+----+
    |7782| CLARK|  MANAGER|7839|1981-06-09|2450.0|   0.0|  10|
    |7839|  KING|PRESIDENT|   0|1981-11-17|5000.0|   0.0|  10|
    |7934|MILLER|    CLERK|7782|1982-01-23|1300.0|   0.0|  10|
    |7369| SMITH|    CLERK|7902|1980-12-17| 800.0|   0.0|  20|
    |7566| JONES|  MANAGER|7839|1981-04-02|2975.0|   0.0|  20|
    |7902|  FORD|  ANALYST|7566|1981-12-03|3000.0|   0.0|  20|
    |7499| ALLEN| SALESMAN|7698|1981-02-20|1600.0| 300.0|  30|
    |7521|  WARD| SALESMAN|7698|1981-02-22|1250.0| 500.0|  30|
    |7654|MARTIN| SALESMAN|7698|1981-09-28|1250.0|1400.0|  30|
    |7698| BLAKE|  MANAGER|7839|1981-05-01|2850.0|   0.0|  30|
    |7844|TURNER| SALESMAN|7698|1981-09-08|1500.0|   0.0|  30|
    |7900| JAMES|    CLERK|7698|1981-12-03| 950.0|   0.0|  30|
    |8888|HADRON|     null|null|2016-08-31|6666.0|  null|null|
    +----+------+---------+----+----------+------+------+----+

    (2)spark-sql

    [root@node1 ~]# spark-sql
    17/10/24 10:17:21 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    17/10/24 10:17:32 WARN ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.2.0
    17/10/24 10:17:32 WARN ObjectStore: Failed to get database default, returning NoSuchObjectException
    spark-sql> show databases;
    default
    test
    Time taken: 3.93 seconds, Fetched 2 row(s)
    spark-sql> show tables;
    default copyemp false
    default demo    false
    default dept    false
    default dual    false
    default emp false
    default empbak  false
    default employees   false
    default mytb    false
    default users   false
    Time taken: 0.145 seconds, Fetched 9 row(s)
    spark-sql> select * from emp;
    7782    CLARK   MANAGER 7839    1981-06-09  2450.0  0.0 10
    7839    KING    PRESIDENT   0   1981-11-17  5000.0  0.0 10
    7934    MILLER  CLERK   7782    1982-01-23  1300.0  0.0 10
    7369    SMITH   CLERK   7902    1980-12-17  800.0   0.0 20
    7566    JONES   MANAGER 7839    1981-04-02  2975.0  0.0 20
    7902    FORD    ANALYST 7566    1981-12-03  3000.0  0.0 20
    7499    ALLEN   SALESMAN    7698    1981-02-20  1600.0  300.0   30
    7521    WARD    SALESMAN    7698    1981-02-22  1250.0  500.0   30
    7654    MARTIN  SALESMAN    7698    1981-09-28  1250.0  1400.0  30
    7698    BLAKE   MANAGER 7839    1981-05-01  2850.0  0.0 30
    7844    TURNER  SALESMAN    7698    1981-09-08  1500.0  0.0 30
    7900    JAMES   CLERK   7698    1981-12-03  950.0   0.0 30
    8888    HADRON  NULL    NULL    2016-08-31  6666.0  NULL    NULL
    Time taken: 3.266 seconds, Fetched 13 row(s)
    spark-sql> 
  • 相关阅读:
    Laravel框架中的event事件操作
    PHP魔术方法实例
    PHP 面向对象
    ThinkPHP中where()使用方法详解
    PHP常见错误提示含义解释
    php面向对象编程self和static的区别
    php文件路径获取文件名
    php三种无限分类
    php高精度计算问题
    转:JavaScript定时机制、以及浏览器渲染机制 浅谈
  • 原文地址:https://www.cnblogs.com/itboys/p/9254945.html
Copyright © 2011-2022 走看看