zoukankan      html  css  js  c++  java
  • elasticserch-hadoop spak 网络配置异常排查

    elasticserch hadoop

    在本地测试写入 elasticsearch:9200时成功

    线上环境却报错如下

    org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
    at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
    at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
    at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
    17/12/01 07:47:46 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
    at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
    at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
    at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
    
    17/12/01 07:47:46 ERROR scheduler.TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job
    17/12/01 07:47:46 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
    17/12/01 07:47:46 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1
    17/12/01 07:47:46 INFO scheduler.DAGScheduler: ResultStage 1 (runJob at EsSpark.scala:102) failed in 0.349 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
    at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
    at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
    at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)


    排查思路
    1线上使用的 ip 未使用 host,可能会有问题
    2线上 es 集群前置了 nginx 作代理,写入地址并非直接是es 服务地址

    以对 es 已知的了解2的可能性较大,可能会直接通过 es 寻找压力较小的节点

    看官方文档
    https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html

    Networkedit
    es.nodes.discovery (default true)
    Whether to discover the nodes within the Elasticsearch cluster or only to use the ones given in es.nodes for metadata queries. Note that this setting only applies during start-up; afterwards when reading and writing, elasticsearch-hadoop uses the target index shards (and their hosting nodes) unless es.nodes.client.only is enabled.

    es.nodes.client.only (default false)
    Whether to use Elasticsearch client nodes (or load-balancers). When enabled, elasticsearch-hadoop will route all its requests (after nodes discovery, if enabled) through the client nodes within the cluster. Note this typically significantly reduces the node parallelism and thus it is disabled by default. Enabling it also disables es.nodes.data.only (since a client node is a non-data node).

    es.nodes.wan.only (default false)
    Whether the connector is used against an Elasticsearch instance in a cloud/restricted environment over the WAN, such as Amazon Web Services. In this mode, the connector disables discovery and only connects through the declared es.nodes during all operations, including reads and writes. Note that in this mode, performance is highly affected.

    找到几个相关信息 

    配置 "es.nodes.wan.only"->"true" 则服务正常运行

    写入 es 部分如下

    esDatas.saveToEs(Map[String, String](
    "es.resource" -> "{esIndex}/{esType}",
    "es.nodes" -> "ip:19200",
    "es.input.json" -> "false",
    "es.nodes.discovery"->"false",
    "es.nodes.wan.only"->"true",
    "es.write.operation" -> "upsert",
    "es.mapping.exclude" -> "id,esIndex,esType",
    "es.mapping.id" -> "id"
    ))
  • 相关阅读:
    苹果 iOS 8 新固件新功能特性总结汇总 (苹果 iPhone/iPad 最新移动操作系统)
    为什么魔兽世界里男性玩家爱选女性角色?
    腾讯PK微软 王者之战一触即发
    策略模式
    代理模式
    模板模式
    适配器模式
    工厂模式
    随机森林步骤(转)
    马氏距离通俗理解(转)
  • 原文地址:https://www.cnblogs.com/zihunqingxin/p/7992332.html
Copyright © 2011-2022 走看看