zoukankan      html  css  js  c++  java
  • 使用spark操作kudu

    Spark与KUDU集成支持:

    • DDL操作(创建/删除)

    • 本地Kudu RDD

    • Native Kudu数据源,用于DataFrame集成

    • 从kudu读取数据

    • 从Kudu执行插入/更新/ upsert /删除

    • 谓词下推

    • Kudu和Spark SQL之间的模式映射

      到目前为止,我们已经听说过几个上下文,例如SparkContext,SQLContext,HiveContext,SparkSession,现在,我们将使用Kudu引入一个KuduContext。这是可在Spark应用程序中广播的主要可序列化对象。此类代表在Spark执行程序中与Kudu Java客户端进行交互。

      KuduContext提供执行DDL操作所需的方法,与本机Kudu RDD的接口,对数据执行更新/插入/删除,将数据类型从Kudu转换为Spark等。

      比较常见的操作:

    // Create a Spark and SQL context
    val sc = new SparkContext(sparkConf)
    val sqlContext = new SQLContext(sc)
     
    // Comma-separated list of Kudu masters with port numbers
    val master1 = "ip-10-13-4-249.ec2.internal:7051"
    val master2 = "ip-10-13-5-150.ec2.internal:7051"
    val master3 = "ip-10-13-5-56.ec2.internal:7051"
    val kuduMasters = Seq(master1, master2, master3).mkString(",")
     
    // Create an instance of a KuduContext
    val kuduContext = new KuduContext(kuduMasters)

    Maven导包

     <repositories>
            <repository>
                <id>cloudera</id>
                <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
            </repository>
        </repositories>
    
    
    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.kudu/kudu-client -->
        <dependency>
            <groupId>org.apache.kudu</groupId>
            <artifactId>kudu-client</artifactId>
            <version>1.6.0-cdh5.14.0</version>
            <scope>test</scope>
        </dependency>
    
    
        <!-- https://mvnrepository.com/artifact/org.apache.kudu/kudu-client-tools -->
        <dependency>
            <groupId>org.apache.kudu</groupId>
            <artifactId>kudu-client-tools</artifactId>
            <version>1.6.0-cdh5.14.0</version>
        </dependency>
    
    
        <!-- https://mvnrepository.com/artifact/org.apache.kudu/kudu-spark2 -->
        <dependency>
            <groupId>org.apache.kudu</groupId>
            <artifactId>kudu-spark2_2.11</artifactId>
            <version>1.6.0-cdh5.14.0</version>
        </dependency>
    
        <!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql -->
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_2.11</artifactId>
            <version>2.1.0</version>
        </dependency>
    </dependencies>
    View Code

    具体详细代码看下一章介绍

  • 相关阅读:
    0006 字符串转整数
    0005 反转整数
    0004 最长回文子串
    0003 无重复字符的最长子串
    0002 两数相加
    0001 两数之和
    使用jquery+css实现瀑布流布局
    更简单的轮播实现
    类和对象
    生产者-消费者(wait-notify实现)
  • 原文地址:https://www.cnblogs.com/niutao/p/10555229.html
Copyright © 2011-2022 走看看