zoukankan      html  css  js  c++  java
  • sparkSql使用hive数据源

    1.pom文件

    <dependency>
          <groupId>org.scala-lang</groupId>
          <artifactId>scala-library</artifactId>
          <version>${scala.version}</version>
        </dependency>
        <dependency>
          <groupId>junit</groupId>
          <artifactId>junit</artifactId>
          <version>4.4</version>
          <scope>test</scope>
        </dependency>
        <dependency>
          <groupId>org.specs</groupId>
          <artifactId>specs</artifactId>
          <version>1.2.5</version>
          <scope>test</scope>
        </dependency>
    
          <!-- https://mvnrepository.com/artifact/oracle/ojdbc6 -->
          <dependency>
              <groupId>com.oracle</groupId>
              <artifactId>ojdbc6</artifactId>
              <version>11.2.0.3</version>
          </dependency>
    
          <!-- https://mvnrepository.com/artifact/mysql/mysql-connector-java -->
        <dependency>
          <groupId>mysql</groupId>
          <artifactId>mysql-connector-java</artifactId>
          <version>${mysql.version}</version>
        </dependency>
    
        <!-- https://mvnrepository.com/artifact/com.alibaba/druid -->
        <dependency>
          <groupId>com.alibaba</groupId>
          <artifactId>druid</artifactId>
          <version>${druid.version}</version>
        </dependency>
    
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core -->
        <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-core_2.11</artifactId>
          <version>${spark.verson}</version>
        </dependency>
    
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming -->
        <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-streaming_2.11</artifactId>
          <version>${spark.verson}</version>
          <scope>provided</scope>
        </dependency>
    
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
        <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-sql_2.11</artifactId>
          <version>${spark.verson}</version>
        </dependency>
    
        <dependency>
          <groupId>org.apache.spark</groupId>
          <artifactId>spark-hive_2.11</artifactId>
          <version>${spark.verson}</version>
        </dependency>
    

      

    2.代码

    import org.apache.spark.{SparkConf, SparkContext}
    import org.apache.spark.sql.hive.HiveContext
    
    object HiveDataSource extends App {
      val config = new SparkConf().setAppName("HiveDataSource").setMaster("local")
      val sc = new SparkContext(config)
    
      val sqlContext = new HiveContext(sc)
    
      sqlContext.sql("drop table if exists default.student_infos")
    
      sqlContext.sql("create  table if not exists default.student_infos (name string,age int) row format delimited fields terminated by ',' stored  as textfile")
    
      sqlContext.sql("load data inpath '/tmp/student_infos.txt' into table  default.student_infos")
    
      // 用同样的方式,给student_scores导入数据
    
      sqlContext.sql("DROP  TABLE  IF EXISTS default.student_scores")
    
      sqlContext.sql("create  table if not exists default.student_scores (name string,score int) row format delimited fields terminated by ',' stored  as textfile")
    
      sqlContext.sql("load data inpath '/tmp/student_scores.txt' into table  default.student_scores")
    
      // 关联两张表执行查询,查询成绩大于80分的学生
      val goodStudentDf = sqlContext.sql("select t1.name,t1.age,t2.score from default.student_infos t1 join default.student_scores t2 on t1.name = t2.name")
    
      goodStudentDf.show()
    
    }
    

      

     
    3.拷贝hive/config下的hive-site.xml到src/main/resources中
     
     
    4.编译打包
     
    5.jar包放到服务器上
     
    6.添加脚本:
    /home/hadoop/app/spark/bin/spark-submit \
    --class com.dsj361.HiveDataSource \
    --master local[*] \
    --num-executors 2 \
    --driver-memory 1000m \
    --executor-memory 1000m \
    --executor-cores 2 \
    /home/hadoop/sparksqlapp/jar/sparkSqlStudy.jar
     
     
     
     
    7.运行即可
    比hive快很多
     
     

    <wiz_tmp_tag id="wiz-table-range-border" contenteditable="false" style="display: none;">

    附件列表

  • 相关阅读:
    A debugger is already attached
    鼠标指向GridView某列显示DIV浮动列表
    天气插件的替换
    ZPL打印中文信息
    「PowerBI」使用TabularEditor进行PowerBIDeskTop模型开发最佳实践
    「PowerBI」丢弃SSDT选择TabularEditor成为你的首选建模开发工具(下)
    「PowerBI」丢弃SSDT选择TabularEditor成为你的首选建模开发工具(中)
    「PowerBI」丢弃SSDT选择TabularEditor成为你的首选建模开发工具(上)
    「Azure」数据分析师有理由爱Azure之十-使用PowerShell自动化AzureAS
    「Azure」数据分析师有理由爱Azure之九-填坑-PowerBI Pro连接Azure AS模型
  • 原文地址:https://www.cnblogs.com/nicekk/p/10087684.html
Copyright © 2011-2022 走看看