zoukankan      html  css  js  c++  java
  • streaming优化:并行接收数据

    val numStreams = 5
    val kafkaStreams = (1 to numStreams).map { i => KafkaUtils.createStream(...) }
    val unifiedStream = streamingContext.union(kafkaStreams)
    unifiedStream.print()
    

    官方是这么说的

    Receiving data over the network (like Kafka, Flume, socket, etc.) requires the data to deserialized and stored in Spark. If the data receiving becomes a bottleneck in the system, then consider parallelizing the data receiving. Note that each input DStream creates a single receiver (running on a worker machine) that receives a single stream of data. Receiving multiple data streams can therefore be achieved by creating multiple input DStreams and configuring them to receive different partitions of the data stream from the source(s). For example, a single Kafka input DStream receiving two topics of data can be split into two Kafka input streams, each receiving only one topic. This would run two receivers on two workers, thus allowing data to be received in parallel, and increasing overall throughput. These multiple DStream can be unioned together to create a single DStream. Then the transformations that was being applied on the single input DStream can applied on the unified stream. 
    
  • 相关阅读:
    javascript模块化进阶
    javascript模块化基础
    css架构探索
    javascript函数基础概念 (补充完结)
    聊聊圣杯布局
    javascript函数基础概念
    yum提示This system is not registered with RHN.RHN support will be disabled.
    Linux分区和挂载硬盘
    Thunderbird扩展
    yum install nginx
  • 原文地址:https://www.cnblogs.com/hark0623/p/4502800.html
Copyright © 2011-2022 走看看