zoukankan      html  css  js  c++  java
  • 用 Spark 为 Elasticsearch 导入搜索数据

    越来越健忘了,得记录下自己的操作才行!

    ES和spark版本:

    spark-1.6.0-bin-hadoop2.6

    Elasticsearch for Apache Hadoop 2.1.2

    如果是其他版本,在索引数据写入的时候可能会出错。

    首先,启动es后,spark shell导入es-hadoop jar包:

    cp elasticsearch-hadoop-2.1.2/dist/elasticsearch-spark* spark-1.6.0-bin-hadoop2.6/lib/
    cd spark-1.6.0-bin-hadoop2.6/bin
    ./spark-shell --jars ../lib/elasticsearch-spark-1.2_2.10-2.1.2.jar

    交互如下:

    import org.apache.spark.SparkConf
    import org.elasticsearch.spark._
    val conf = new SparkConf()
    conf.set("es.index.auto.create", "true")
    conf.set("es.nodes", "127.0.0.1")
    val numbers = Map("one" -> 1, "two" -> 2, "three" -> 3)
    val airports = Map("OTP" -> "Otopeni", "SFO" -> "San Fran")
    sc.makeRDD(Seq(numbers, airports)).saveToEs("spark/docs")

    然后查看ES中的数据:

    http://127.0.0.1:9200/spark/docs/_search?q=*

    结果如下:

    {"took":71,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":2,"max_score":1.0,"hits":[{"_index":"spark","_type":"docs","_id":"AVfhVqPBv9dlWdV2DcbH","_score":1.0,"_source":{"OTP":"Otopeni","SFO":"San Fran"}},{"_index":"spark","_type":"docs","_id":"AVfhVqPOv9dlWdV2DcbI","_score":1.0,"_source":{"one":1,"two":2,"three":3}}]}}

    参考:

    https://www.elastic.co/guide/en/elasticsearch/hadoop/2.1/spark.html#spark-installation

    http://spark.apache.org/docs/latest/programming-guide.html

    http://chenlinux.com/2014/09/04/spark-to-elasticsearch/

  • 相关阅读:
    深入理解JS中的变量及变量作用域
    浏览器加载、解析、渲染的过程
    gerrit和git
    宽高等比缩放
    常见的网站性能优化手段
    JS实现数组去重(重复的元素只保留一个)
    重构与回流
    APP开放接口API安全性——Token令牌Sign签名的设计与实现
    索引原理-btree索引与hash索引的区别
    从四个维度谈谈如何做好团队管理
  • 原文地址:https://www.cnblogs.com/bonelee/p/5981699.html
Copyright © 2011-2022 走看看