zoukankan      html  css  js  c++  java
  • flume 集群datanode节点失败导致hdfs写失败(转)

    来自:http://www.geedoo.info/dfs-client-block-write-replace-datanode-on-failure-enable.html

    这几天由于杭州集群处于升级过度时期,任务量大,集群节点少(4个DN),集群不断出现问题,导致flume收集数据出现错误,以致数据丢失。

    出现数据丢失,最先拿来开刀的就是数据收集,好嘛,先看看flume的错误日志:

    [php]Caused by: java.io.IOException: Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-<br />
    on-failure.policy in configuration, where the current policy is DEFAULT. (Nodes: current=[10.0.2.163:50010, 10.0.2.164:50010], original=[10.0.2.163:50010, 10.0.2.164:50010])<br />
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:817)<br />
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:877)<br />
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:983)<br />
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.processDatanodeError(DFSOutputStream.java:780)<br />
    at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:449)[/php]

    错误:

    Failed to add a datanode. User may turn off this feature by setting dfs.client.block.write.replace-datanode-on-failure.policy in configuration, where the current policy is DEFAULT

    从日志上看是说添加DN失败,需要关闭dfs.client.block.write.replace-datanode-on-failure.policy特性。但是我没有添加节点啊?看来问题不是这么简单。

    通过查看官方配置文档对上述的参数配置:

    参数 默认值 说明
    dfs.client.block.write.replace-datanode-on-failure.enable true If there is a datanode/network failure in the write pipeline, DFSClient will try to remove the failed datanode from the pipeline and then continue writing with the remaining datanodes. As a result, the number of datanodes in the pipeline is decreased. The feature is to add new datanodes to the pipeline. This is a site-wide property to enable/disable the feature. When the cluster size is extremely small, e.g. 3 nodes or less, cluster administrators may want to set the policy to NEVER in the default configuration file or disable this feature. Otherwise, users may experience an unusually high rate of pipeline failures since it is impossible to find new datanodes for replacement. See also dfs.client.block.write.replace-datanode-on-failure.policy
     dfs.client.block.write.replace-datanode-on-failure.policy  DEFAULT  This property is used only if the value of dfs.client.block.write.replace-datanode-on-failure.enable is true. ALWAYS: always add a new datanode when an existing datanode is removed. NEVER: never add a new datanode. DEFAULT: Let r be the replication number. Let n be the number of existing datanodes. Add a new datanode only if r is greater than or equal to 3 and either (1) floor(r/2) is greater than or equal to n; or (2) r is greater than n and the block is hflushed/appended.

    来自:https://hadoop.apache.org/docs/current2/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

    然后寻找源码位置在dfsclient中,发现是客户端在pipeline写数据块时候的问题,也找到了两个相关的参数:
    dfs.client.block.write.replace-datanode-on-failure.enable
    dfs.client.block.write.replace-datanode-on-failure.policy
    前者是,客户端在写失败的时候,是否使用更换策略,默认是true没有问题。
    后者是,更换策略的具体细节,默认是default。
    default在3个或以上备份的时候,是会尝试更换 结点次数??次datanode。而在两个备份的时候,不更换datanode,直接开始写。

    由于我的节点只有4个,当集群负载太高的时候,同时两台以上DN没有响应,则出现HDFS写的问题。当集群比较小的时候我们可以关闭这个特性。

    参考:

    记录一次hadoop的datanode的报错追查

    Where can I set dfs.client.block.write.replace-datanode-on-failure.enable?

    cdh4 vs cdh3 client处理DataNode异常的不同

    flume JIRA 说明

    https://issues.apache.org/jira/browse/FLUME-2261

  • 相关阅读:
    2020重新出发,JAVA入门,流程控制
    2020重新出发,JAVA入门,运算符
    2020重新出发,JAVA入门,标识符&修饰符
    2020重新出发,JAVA入门,关键字&保留字
    maven中解决Type javax.servlet.jsp cannot be resolved to a type(转)
    服务器无响应(或者本地MySQL服务器的套接字没有正确配置)的问题
    完美解决win10家庭版本系统无法远程连接问题(转)
    激活win10系统
    服务器内存线性增长,根据句柄数查找问题进程
    服务器内存占用不断的增加 & 任务管理器(PF使用率)不断的增加:关注句柄数(转)
  • 原文地址:https://www.cnblogs.com/sunxucool/p/3957414.html
Copyright © 2011-2022 走看看