zoukankan      html  css  js  c++  java
  • elasticsearch+spark+hbase 整合

      1.用到的maven依赖

         

           <dependency>
                <groupId>org.apache.spark</groupId>
                <artifactId>spark-hive_2.11</artifactId>
                <version>1.6.1</version>
            </dependency>
    
            <dependency>
                <groupId>org.elasticsearch</groupId>
                <artifactId>elasticsearch-hadoop</artifactId>
                <version>2.4.0</version>
            </dependency>

      注意:上面两个依赖的顺序不能换,否则编译代码的Scala版本会变成 2.10(这是因为maven顺序加载pom中的依赖jar),会导致下述问题:

          

    15/05/26 21:33:24 INFO cluster.SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0  
    Exception in thread "main" java.lang.NoSuchMethodError: scala.Predef$.ArrowAssoc(Ljava/lang/Object;)Ljava/lang/Object;  
        at stream.tan14.cn.streamTest$.main(streamTest.scala:25)  
        at stream.tan14.cn.streamTest.main(streamTest.scala)  
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)  
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)  
        at java.lang.reflect.Method.invoke(Method.java:606)  
        at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:328)  
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75)  
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)  

    2、spark和elasticsearch 整合查询接口

       1)参考地址 :

             https://www.elastic.co/guide/en/elasticsearch/reference/2.4/query-dsl.html

             https://www.elastic.co/guide/en/elasticsearch/hadoop/2.4/spark.html#spark-installation

       2)接口代码:

         

     val query =
          """{
              "query": {
                "bool": {
                  "must": [{
                    "range":{
                      "updatetime": {
                        "gte": ""
                      }
                    }
                  }]
                }
              }
            }"""

    // 上述query用于过滤es数据,如果没有添加这一项,直接用spark的dataframe 过滤,性能会受到很大的影响!!
    val options = Map("es.nodes" -> ES_URL, "es.port" -> ES_PORT, "es.query" -> query) ctx.read.format("org.elasticsearch.spark.sql").options(options).load("index/type").registerTempTable("test")
  • 相关阅读:
    TCP的初始cwnd和ssthresh
    C/C++ main
    PHP Function
    run bin
    PHP
    LAMP
    PHP MATH
    PHP array sort
    inline
    gcc g++
  • 原文地址:https://www.cnblogs.com/RichardYD/p/6282775.html
Copyright © 2011-2022 走看看