目录
1、各类druid服务启动都报同一个错误:
错误如下:
2020-01-05T00:24:19,366 INFO [main] io.druid.guice.Exception in thread "main" io.druid.java.util.common.ISE: Extension [/usr/develop/druid-0.10.1/extensions/mysql-metadata-storage] specified in "druid.extensions.loadList" didn't exist!?
原因:未添加mysql-metadata-storage扩展。
解决方法:添加mysql-metadata-storage扩展。
cd /usr/apps/
wget -c http://static.druid.io/artifacts/releases/mysql-metadata-storage-0.10.1.tar.gz
tar -zxvf mysql-metadata-storage-0.10.1.tar.gz -C /usr/develop/druid-0.10.1/extensions/
2、启动broker报错
(nohup bin/broker.sh start >log/broker.log 2>&1 &
)
报错内容如下:
Not enough direct memory. Please adjust -XX:MaxDirectMemorySize,
druid.processing.buffer.sizeBytes, druid.processing.numThreads,
or druid.processing.numMergeBuffers: maxDirectMemory[1,179,648,000],
memoryNeeded[1,184,354,560] = druid.processing.buffer.sizeBytes[236,870,912] *
(druid.processing.numMergeBuffers[2] + druid.processing.numThreads[2] + 1)
原因:直接内存太少,需要重新配置。
解决方法:集群所有节点上作出如下修改:
vim /usr/develop/druid-0.10.1/conf/druid/broker/jvm.config
把配置项-XX:MaxDirectMemorySize=1125m 修改为
-XX:MaxDirectMemorySize=1130m
3、提交本地批量索引任务报错
(curl -X 'POST' -H'Content-Type: application/json' -d @quickstart/job/query-adtest.json http://cdh03:8082/druid/v2/?pretty
)
报错内容如下:
Not enough direct memory. Please adjust -XX:MaxDirectMemorySize,
druid.processing.buffer.sizeBytes, druid.processing.numThreads,
or druid.processing.numMergeBuffers: maxDirectMemory[954,728,448],
memoryNeeded[1,184,354,560] =
druid.processing.buffer.sizeBytes[236,870,912] * (druid.processing.numMergeBuffers[2] + druid.processing.numThreads[2] + 1)
原因:直接内存不足。
解决方法:
一开始想增加maxDirectMemory,不过不生效,后来就想办法减小memoryNeeded的值。方法如下:
在集群的所有节点上修改如下文件:
vim /usr/develop/druid-0.10.1/conf/druid/broker/runtime.properties
druid.processing.buffer.sizeBytes=236870912 修改为
druid.processing.buffer.sizeBytes=190945689
vim /usr/develop/druid-0.10.1/conf/druid/historical/runtime.properties
druid.processing.buffer.sizeBytes=236870912 修改为
druid.processing.buffer.sizeBytes=190945689
vim /usr/develop/druid-0.10.1/conf/druid/middleManager/runtime.properties
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=236870912 修改为
druid.indexer.fork.property.druid.processing.buffer.sizeBytes=190945689
不过,该错误解决后,又出现以下报错:
java.lang.IllegalStateException: Failed to create directory within 10000 attempts (tried 1578222406564-0 to 1578222406564-9999)
at com.google.common.io.Files.createTempDir(Files.java:600) ~[guava-16.0.1.jar:?]
2020-01-05T19:06:47,469 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
"id" : "index_ad_event_local_2020-01-05T19:06:15.633+08:00",
"status" : "FAILED",
"duration" : 2191
}
io.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner.stop()] on object[io.druid.initialization.Log4jShutterDownerModule$Log4jShutterDowner@6ee5f485].
2020-01-05 19:06:51,872 Thread-3 ERROR Unable to register shutdown hook because JVM is shutting down. java.lang.IllegalStateException: Not started
at io.druid.common.config.Log4jShutdown.addShutdownCallback(Log4jShutdown.java:45)
原因:创建目录失败。
解决方法:手动创建临时目录: mkdir -p /usr/develop/druid-0.10.1/var/tmp
4、启动broker报错
(nohup bin/broker.sh start >log/broker.log 2>&1 &
)
报错内容如下:
2020-01-05T01:21:06,848 ERROR [main] io.druid.cli.CliBroker - Error when starting up. Failing.
java.lang.IllegalStateException: Insufficient threads: max=2 < needed(acceptors=1 + selectors=1 + request=1)
原因:线程数(HTTP server threads)太少,需要重新配置。
解决方法:集群所有节点上作出如下修改:
vim /usr/develop/druid-0.10.1/conf/druid/broker/runtime.properties
将配置项druid.server.http.numThreads=2 修改为
druid.server.http.numThreads=3
vim /usr/develop/druid-0.10.1/conf/druid/historical/runtime.properties
将配置项druid.server.http.numThreads=2 修改为
druid.server.http.numThreads=3
vim /usr/develop/druid-0.10.1/conf/druid/middleManager/runtime.properties
将配置项druid.server.http.numThreads=2 修改为
druid.server.http.numThreads=3
5、提交MR任务报错1
curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/job/hadoop-index/hadoop-index.json cdh01:8090/druid/indexer/v1/task
报错内容如下:
io.druid.java.util.common.ISE: Hadoop dependency [/usr/develop/druid-0.10.1/hadoop-dependencies/hadoop-client/2.6.0] didn't exist!?
原因:hadoop的依赖jar包不存在。
解决方法:
hadoop的依赖jar包存放在 /usr/develop/druid-0.10.1/hadoop-dependencies/hadoop-client/
中。
执行命令,下载依赖包:
cd /usr/develop/druid-0.10.1
#java -classpath "lib/*" io.druid.cli.Main tools pull-deps --defaultVersion 0.10.1 -c io.druid.extensions:mysql-metadata-storage:0.10.1 -c druid-hdfs-storage -h org.apache.hadoop:hadoop-client:2.6.0 不需要下载这么多依赖
java -classpath "lib/*" io.druid.cli.Main tools pull-deps --defaultVersion 0.10.1 -h org.apache.hadoop:hadoop-client:2.6.0
注:该命令貌似不行,下载速度很慢,或者干脆没法下载,会在下载hadoop-client的jar包时卡住。
最终解决方法:在apache官网上下载并使用hadoop-2.7.3,注意我使用的hadoop版本是没有经过编译的,但是能够使用。
下载地址: http://archive.apache.org/dist/hadoop/common/hadoop-2.7.3/
6、提交MR任务报错2
(curl -X 'POST' -H 'Content-Type:application/json' -d @quickstart/job/hadoop-index/hadoop-index.json cdh01:8090/druid/indexer/v1/task
)
报错内容如下:
java.net.ConnectException: Call From cdh02.develop.cn/192.168.8.110 to cdh01:9000 failed
on connection exception: java.net.ConnectException: æ‹’ç»è¿žæŽ¥;
For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
原因:cdh01未使用9000端口,而是8020端口,当然无法连接到cdh01:9000。
是在core-site.xml中配置的:
<property>
<name>fs.default.name</name>
<value>hdfs://cdh01:8020</value>
</property>
解决方法:
修改提交任务的配置文件hadoop-index.json
,将9000的端口修改为8020。另外,也要注意datanode和resourceManager的设置:
"jobProperties": {
"fs.default.name": "hdfs://cdh01:8020",
"fs.defaultFS": "hdfs://cdh01:8020",
"dfs.datanode.address": "cdh02",
"yarn.resourcemanager.hostname": "cdh01",
}
7、提交Kafka-indexing-service任务报错
(curl POST -H 'Content-Type: application/json' -d @quickstart/job/kafka-test/kafka_test_index.json http://cdh01:8090/druid/indexer/v1/supervisor
)
报错内容如下:
2020-01-07T19:02:05,459 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[KafkaIndexTask{id=index_kafka_kafkatest_2ad230647214b53_hbikmkji, type=index_kafka, dataSource=kafkatest}]
io.druid.java.util.common.ISE: Could not allocate segment for row with timestamp[2019-06-05T20:02:45.453+08:00]
at io.druid.indexing.kafka.KafkaIndexTask.run(KafkaIndexTask.java:462) ~[?:?]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.10.1.jar:0.10.1]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.10.1.jar:0.10.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_211]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_211]
at java.lang.Thread.run(Thread.java:748) [?:1.8.0_211]
2020-01-07T14:02:05,477 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_kafka_kafkatest_2ad230647214b53_hbikmkji] status changed to [FAILED].
2020-01-07T14:02:05,483 INFO [main] com.sun.jersey.guice.spi.container.GuiceComponentProviderFactory - Binding io.druid.server.QueryResource to GuiceInstantiatedComponentProvider
2020-01-07T14:02:05,484 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
"id" : "index_kafka_kafkatest_2ad230647214b53_hbikmkji",
"status" : "FAILED",
"duration" : 1438
}
提交查询任务时报错如下:
[root@cdh02 druid-0.10.1]# curl -X 'POST' -H'Content-Type: application/json'd @quickstart/job/query-kafkatest.json http://cdh03:8082/druid/v2/?pretty
{
"error" : "Unknown exception",
"errorMessage" : "Failure getting results for query[c05a55f2-6edf-416f-9414-c62cd03f59d1] url[http://cdh02.develop.cn:8100/druid/v2/] because of [org.jboss.netty.channel.ChannelException: Channel disconnected]",
"errorClass" : "io.druid.java.util.common.RE",
"host" : null
}
查看overload.log日志,发现如下警告:
2020-01-08T21:25:40,476 WARN [qtp793483510-63] io.druid.metadata.IndexerSQLMetadataStorageCoordinator - Cannot allocate new segment for dataSource[kafkatest2], interval[2019-06-05T12:02:00.000Z/2019-06-05T12:03:00.000Z], maxVersion[2020-01-08T21:25:40.434+08:00]: conflicting segment[kafkatest2_2019-06-05T08:00:00.000+08:00_2019-06-06T08:00:00.000+08:00_2020-01-08T21:15:23.876+08:00].
网上的原因解析:(然并卵)
1、druid_pendingSegments 中对应的时间段存在冲突。
2、查看overlord日志,标识中存在 Not updating metadata, existing state is not the expected start state ,这是由于修改topic 导致校验异常,删除druid_dataSource 对应的数据即可。
异常会导致,segment hand off 失败,无法保留数据。
原因:druid_pendingSegments 中对应的时间段存在冲突。
解决方案:清理 druid_pendingSegments 对应的时间段。
是否要设置tuningConfig的partitionsSpec呢?
"tuningConfig" : {
"type" : "kafka",
"reportParseExceptions": false,
"partitionsSpec" : {
"type" : "hashed",
"targetPartitionSize": 5000000
}
}