zoukankan      html  css  js  c++  java
  • elasticserch-hadoop spak 网络配置异常排查

    elasticserch hadoop

    在本地测试写入 elasticsearch:9200时成功

    线上环境却报错如下

    org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
    at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
    at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
    at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
    17/12/01 07:47:46 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
    at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
    at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
    at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
    
    17/12/01 07:47:46 ERROR scheduler.TaskSetManager: Task 0 in stage 1.0 failed 1 times; aborting job
    17/12/01 07:47:46 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
    17/12/01 07:47:46 INFO scheduler.TaskSchedulerImpl: Cancelling stage 1
    17/12/01 07:47:46 INFO scheduler.DAGScheduler: ResultStage 1 (runJob at EsSpark.scala:102) failed in 0.349 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1.0 (TID 1, localhost, executor driver): org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: No data nodes with HTTP-enabled available
    at org.elasticsearch.hadoop.rest.InitializationUtils.filterNonDataNodesIfNeeded(InitializationUtils.java:157)
    at org.elasticsearch.hadoop.rest.RestService.createWriter(RestService.java:575)
    at org.elasticsearch.spark.rdd.EsRDDWriter.write(EsRDDWriter.scala:58)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
    at org.elasticsearch.spark.rdd.EsSpark$$anonfun$doSaveToEs$1.apply(EsSpark.scala:102)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)


    排查思路
    1线上使用的 ip 未使用 host,可能会有问题
    2线上 es 集群前置了 nginx 作代理,写入地址并非直接是es 服务地址

    以对 es 已知的了解2的可能性较大,可能会直接通过 es 寻找压力较小的节点

    看官方文档
    https://www.elastic.co/guide/en/elasticsearch/hadoop/current/configuration.html

    Networkedit
    es.nodes.discovery (default true)
    Whether to discover the nodes within the Elasticsearch cluster or only to use the ones given in es.nodes for metadata queries. Note that this setting only applies during start-up; afterwards when reading and writing, elasticsearch-hadoop uses the target index shards (and their hosting nodes) unless es.nodes.client.only is enabled.

    es.nodes.client.only (default false)
    Whether to use Elasticsearch client nodes (or load-balancers). When enabled, elasticsearch-hadoop will route all its requests (after nodes discovery, if enabled) through the client nodes within the cluster. Note this typically significantly reduces the node parallelism and thus it is disabled by default. Enabling it also disables es.nodes.data.only (since a client node is a non-data node).

    es.nodes.wan.only (default false)
    Whether the connector is used against an Elasticsearch instance in a cloud/restricted environment over the WAN, such as Amazon Web Services. In this mode, the connector disables discovery and only connects through the declared es.nodes during all operations, including reads and writes. Note that in this mode, performance is highly affected.

    找到几个相关信息 

    配置 "es.nodes.wan.only"->"true" 则服务正常运行

    写入 es 部分如下

    esDatas.saveToEs(Map[String, String](
    "es.resource" -> "{esIndex}/{esType}",
    "es.nodes" -> "ip:19200",
    "es.input.json" -> "false",
    "es.nodes.discovery"->"false",
    "es.nodes.wan.only"->"true",
    "es.write.operation" -> "upsert",
    "es.mapping.exclude" -> "id,esIndex,esType",
    "es.mapping.id" -> "id"
    ))
  • 相关阅读:
    JNI概述
    Android shape的使用
    全局对象Application的使用,以及如何在任何地方得到Application全局对象
    EditText中禁止输入中文的方法
    利用Selenium实现图片文件上传的两种方式介绍
    LoadRunner结果分析 – TPS
    详解 Spotlight on MySQL监控MySQL服务器
    Linux 服务器运行健康状况监控利器 Spotlight on Unix 的安装与使用
    资源监控工具Spotlight-使用说明
    RobotFrameWork+APPIUM实现对安卓APK的自动化测试----第七篇【元素定位介绍】
  • 原文地址:https://www.cnblogs.com/zihunqingxin/p/7992332.html
Copyright © 2011-2022 走看看