zoukankan      html  css  js  c++  java
  • spark-shell 调hive sql的例子

    [spark@master ~]$ spark-shell --master yarn-client --jars /app/soft/hive/lib/mysql-connector-java-5.1.44-bin.jar
    
    scala> import org.apache.spark.sql.SQLContext
    import org.apache.spark.sql.SQLContext
    
    scala> val sqlContext = new SQLContext(sc)
    warning: there was one deprecation warning; re-run with -deprecation for details
    sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@432a6a69
    
    scala> val res = sqlContext.sql("select * from lb")
    res: org.apache.spark.sql.DataFrame = [cookieid: string, createtime: string ... 1 more field]
    
    scala> res.show()
    +--------+----------+---+
    |cookieid|createtime| pv|
    +--------+----------+---+
    | cookie1|2015-11-11|  1|
    | cookie1|2015-11-12|  4|
    | cookie1|2015-11-13|  5|
    | cookie1|2015-11-14|  4|
    | cookie2|2015-11-11|  7|
    | cookie2|2015-11-12|  3|
    | cookie2|2015-11-13|  8|
    | cookie2|2015-11-14|  2|
    +--------+----------+---+
    

      建表

    scala> val path = "hdfs://master:9000/data/Romeo_and_Juliet.txt"
    path: String = hdfs://master:9000/data/Romeo_and_Juliet.txt
    
    scala> val df2 = spark.sparkContext.textFile(path).flatMap(_.split(" ")).map((_,1)).reduceByKey(_+_).toDF("word","count")
    df2: org.apache.spark.sql.DataFrame = [word: string, count: int]
    
    scala> df2.write.mode("overwrite").saveAsTable("badou.test_a")
    18/01/28 08:15:10 WARN metastore.HiveMetaStore: Location: hdfs://master:9000/user/hive/warehouse/badou.db/test_a specified for non-external table:test_a
    
    
    --------------------
    
    hive> use badou;
    
    hive> show tables;
    
    hive> select * from test_a order by count desc limit 10;
    Total jobs = 1
    Launching Job 1 out of 1
    Number of reduce tasks determined at compile time: 1
    In order to change the average load for a reducer (in bytes):
      set hive.exec.reducers.bytes.per.reducer=<number>
    In order to limit the maximum number of reducers:
      set hive.exec.reducers.max=<number>
    In order to set a constant number of reducers:
      set mapreduce.job.reduces=<number>
    Starting Job = job_1516801273097_0045, Tracking URL = http://master:8088/proxy/application_1516801273097_0045/
    Kill Command = /app/soft/hadoop/bin/hadoop job  -kill job_1516801273097_0045
    Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
    2018-01-28 09:08:22,144 Stage-1 map = 0%,  reduce = 0%
    2018-01-28 09:08:29,615 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.37 sec
    2018-01-28 09:08:37,987 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 3.18 sec
    MapReduce Total cumulative CPU time: 3 seconds 180 msec
    Ended Job = job_1516801273097_0045
    MapReduce Jobs Launched: 
    Job 0: Map: 1  Reduce: 1   Cumulative CPU: 3.18 sec   HDFS Read: 54970 HDFS Write: 69 SUCCESS
    Total MapReduce CPU Time Spent: 3 seconds 180 msec
    OK
            4132
    the     614
    I       531
    and     462
    to      449
    a       392
    of      364
    my      313
    is      290
    in      282
    Time taken: 28.159 seconds, Fetched: 10 row(s)
    

      

  • 相关阅读:
    爬虫代理及ssl验证
    python3编程基础之一:量的表示
    python3编程基础之一:标识符
    python3编程基础之一:关键字
    dell如何安装Win10/Ubuntu双系统
    linux修改用户名和密码
    cmake入门之内部构建
    入门cmake,窥探编译过程
    数据结构交作业代码的仓库名称
    手动制作BIOS和EFI多启动U盘
  • 原文地址:https://www.cnblogs.com/mologa-jie/p/8452068.html
Copyright © 2011-2022 走看看