zoukankan      html  css  js  c++  java
  • streaming优化:并行接收数据

    val numStreams = 5
    val kafkaStreams = (1 to numStreams).map { i => KafkaUtils.createStream(...) }
    val unifiedStream = streamingContext.union(kafkaStreams)
    unifiedStream.print()
    

    官方是这么说的

    Receiving data over the network (like Kafka, Flume, socket, etc.) requires the data to deserialized and stored in Spark. If the data receiving becomes a bottleneck in the system, then consider parallelizing the data receiving. Note that each input DStream creates a single receiver (running on a worker machine) that receives a single stream of data. Receiving multiple data streams can therefore be achieved by creating multiple input DStreams and configuring them to receive different partitions of the data stream from the source(s). For example, a single Kafka input DStream receiving two topics of data can be split into two Kafka input streams, each receiving only one topic. This would run two receivers on two workers, thus allowing data to be received in parallel, and increasing overall throughput. These multiple DStream can be unioned together to create a single DStream. Then the transformations that was being applied on the single input DStream can applied on the unified stream. 
    
  • 相关阅读:
    (转)动态SQL和PL/SQL的EXECUTE IMMEDIATE选项
    MyBase代码
    LinkedList、ArrayList、Vector
    MyEclipse8.5的Help菜单下没有Software Updates的设置方法
    球星们
    文件内容提取到byte数组里
    List<>Array
    ArcGIS9.3全套下载地址
    administrator用户不见了
    ArcEngine VS2005 C#
  • 原文地址:https://www.cnblogs.com/hark0623/p/4502800.html
Copyright © 2011-2022 走看看