zoukankan      html  css  js  c++  java
  • streaming优化:并行接收数据

    val numStreams = 5
    val kafkaStreams = (1 to numStreams).map { i => KafkaUtils.createStream(...) }
    val unifiedStream = streamingContext.union(kafkaStreams)
    unifiedStream.print()
    

    官方是这么说的

    Receiving data over the network (like Kafka, Flume, socket, etc.) requires the data to deserialized and stored in Spark. If the data receiving becomes a bottleneck in the system, then consider parallelizing the data receiving. Note that each input DStream creates a single receiver (running on a worker machine) that receives a single stream of data. Receiving multiple data streams can therefore be achieved by creating multiple input DStreams and configuring them to receive different partitions of the data stream from the source(s). For example, a single Kafka input DStream receiving two topics of data can be split into two Kafka input streams, each receiving only one topic. This would run two receivers on two workers, thus allowing data to be received in parallel, and increasing overall throughput. These multiple DStream can be unioned together to create a single DStream. Then the transformations that was being applied on the single input DStream can applied on the unified stream. 
    
  • 相关阅读:
    SQL server 函数
    SQL server --时间日期函数、类型转换
    SQL server 基础知识
    14.C#的递归
    13.C#的函数练习
    使用bind部署DNS主从服务器
    创建yum仓库
    Linux基础服务搭建综合
    完整的URL是怎样的?
    mysqli_fetch_row()函数返回结果的理解
  • 原文地址:https://www.cnblogs.com/hark0623/p/4502800.html
Copyright © 2011-2022 走看看