zoukankan      html  css  js  c++  java
  • flume 集群datanode节点失败导致hdfs写失败(转)

    来自:http://www.geedoo.info/dfs-client-block-write-replace-datanode-on-failure-enable.html

    这几天由于杭州集群处于升级过度时期,任务量大,集群节点少(4个DN),集群不断出现问题,导致flume收集数据出现错误,以致数据丢失。

    出现数据丢失,最先拿来开刀的就是数据收集,好嘛,先看看flume的错误日志:

    [php]Caused by: java.io.IOException: Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-<br />
    on-failure.policy in configuration, where the current policy is DEFAULT. (Nodes: current=[10.0.2.163:50010, 10.0.2.164:50010], original=[10.0.2.163:50010, 10.0.2.164:50010])<br />
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:817)<br />
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:877)<br />
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:983)<br />
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:780)<br />
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)[/php]

    错误:

    Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT

    从日志上看是说添加DN失败,需要关闭dfs.client.block.write.replace-datanode-on-failure.policy特性。但是我没有添加节点啊?看来问题不是这么简单。

    通过查看官方配置文档对上述的参数配置:

    参数 默认值 说明
    dfs.client.block.write.replace-datanode-on-failure.enable true If there is a datanode/network failure in the write pipeline, DFSClient will try to remove the failed datanode from the pipeline and then continue writing with the remaining datanodes. As a result, the number of datanodes in the pipeline is decreased. The feature is to add new datanodes to the pipeline. This is a site-wide property to enable/disable the feature. When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to NEVER in the default configuration file or disable this feature. Otherwise, users may experience an unusually high rate of pipeline failures since it is impossible to find new datanodes for replacement. See also dfs.client.block.write.replace-datanode-on-failure.policy
     dfs.client.block.write.replace-datanode-on-failure.policy  DEFAULT  This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. ALWAYS: always add a new datanode when an existing datanode is removed. NEVER: never add a new datanode. DEFAULT: Let r be the replication number. Let n be the number of existing datanodes. Add a new datanode only if r is greater than or equal to 3 and either (1) floor(r/2) is greater than or equal to n; or (2) r is greater than n and the block is hflushed/appended.

    来自:https://hadoop.apache.org/docs/current2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

    然后寻找源码位置在dfsclient中,发现是客户端在pipeline写数据块时候的问题,也找到了两个相关的参数:
    dfs.client.block.write.replace-datanode-on-failure.enable
    dfs.client.block.write.replace-datanode-on-failure.policy
    前者是,客户端在写失败的时候,是否使用更换策略,默认是true没有问题。
    后者是,更换策略的具体细节,默认是default。
    default在3个或以上备份的时候,是会尝试更换 结点次数??次datanode。而在两个备份的时候,不更换datanode,直接开始写。

    由于我的节点只有4个,当集群负载太高的时候,同时两台以上DN没有响应,则出现HDFS写的问题。当集群比较小的时候我们可以关闭这个特性。

    参考:

    记录一次hadoop的datanode的报错追查

    Where can I set dfs.client.block.write.replace-datanode-on-failure.enable?

    cdh4 vs cdh3 client处理DataNode异常的不同

    flume JIRA 说明

    https://issues.apache.org/jira/browse/FLUME-2261

  • 相关阅读:
    selenium屏蔽浏览器检测
    cookies字符串转化为字典
    python压缩图片大小
    python异步爬虫aiohttp
    python通过命令行接收参数
    hustoj实现远程判题的两种方案
    SqlLocalDB工具的一些有趣发现
    Chrome中编译安装vue-devtools插件
    用友政务表格技术应用开发实践:预算一体化产品核心功能搭建
    企业表格技术应用开发案例大赛圆满落幕!
  • 原文地址:https://www.cnblogs.com/sunxucool/p/3957414.html
Copyright © 2011-2022 走看看