zoukankan      html  css  js  c++  java
  • spark streaming的容错:防止数据丢失

    官方这么说的

    [Since Spark 1.2] Configuring write ahead logs - Since Spark 1.2, we have introduced write ahead logs for achieving strong fault-tolerance guarantees. If enabled, all the data received from a receiver gets written into a write ahead log in the configuration checkpoint directory. This prevents data loss on driver recovery, thus ensuring zero data loss (discussed in detail in the Fault-tolerance Semantics section). This can be enabled by setting the configuration parameter spark.streaming.receiver.writeAheadLog.enable to true. However, these stronger semantics may come at the cost of the receiving throughput of individual receivers. This can be corrected by running more receivers in parallel to increase aggregate throughput. Additionally, it is recommended that the replication of the received data within Spark be disabled when the write ahead log is enabled as the log is already stored in a replicated storage system. This can be done by setting the storage level for the input stream to StorageLevel.MEMORY_AND_DISK_SER.
    

    我理解,当worker或者driver挂掉后,可能会将receive的数据丢失,那么官方给的方案就是将接受的数据checkpoint到本地。

    通过使用spark.streaming.receiver.writeAheadLog.enable=true来启用。 另外,如果启动这个的话, 那么streaming的存储策略就没有必要多个复本了,官方推荐使用StorageLevel.MEMORY_AND_DISK_SER即可

  • 相关阅读:
    Hibernate使用固定值关联表
    使用spring-data-JPA调用存储过程
    angularjs动态添加节点时,绑定到$scope中
    JPA子查询
    表格表头固定的一种实现方式
    Unicode、UTF8与UTF16
    primefaces4.0基本教程以及增删改查
    tomcat发布webservice
    KonBoot – 只要5K映象文件轻易绕过您的WindowsXP/VISTA/7系统的密码
    Setting .xap MIME Type for Silverlight
  • 原文地址:https://www.cnblogs.com/hark0623/p/4503617.html
Copyright © 2011-2022 走看看