zoukankan      html  css  js  c++  java
  • spark streaming的容错:防止数据丢失

    官方这么说的

    [Since Spark 1.2] Configuring write ahead logs - Since Spark 1.2, we have introduced write ahead logs for achieving strong fault-tolerance guarantees. If enabled, all the data received from a receiver gets written into a write ahead log in the configuration checkpoint directory. This prevents data loss on driver recovery, thus ensuring zero data loss (discussed in detail in the Fault-tolerance Semantics section). This can be enabled by setting the configuration parameter spark.streaming.receiver.writeAheadLog.enable to true. However, these stronger semantics may come at the cost of the receiving throughput of individual receivers. This can be corrected by running more receivers in parallel to increase aggregate throughput. Additionally, it is recommended that the replication of the received data within Spark be disabled when the write ahead log is enabled as the log is already stored in a replicated storage system. This can be done by setting the storage level for the input stream to StorageLevel.MEMORY_AND_DISK_SER.
    

    我理解,当worker或者driver挂掉后,可能会将receive的数据丢失,那么官方给的方案就是将接受的数据checkpoint到本地。

    通过使用spark.streaming.receiver.writeAheadLog.enable=true来启用。 另外,如果启动这个的话, 那么streaming的存储策略就没有必要多个复本了,官方推荐使用StorageLevel.MEMORY_AND_DISK_SER即可

  • 相关阅读:
    点击事件
    php if语句判定my查询是否为空
    php if语句判定ms查询是否为空
    thinkphp 原生sql使用分页类
    从JAVA客户端访问Redis示例(入门)
    Log4j日志级别
    网页正文抽取(包含提取图片)
    网络爬虫基本原理
    Java中替换HTML标签的方法代码
    Java/Js下使用正则表达式匹配嵌套Html标签
  • 原文地址:https://www.cnblogs.com/hark0623/p/4503617.html
Copyright © 2011-2022 走看看