zoukankan      html  css  js  c++  java
  • streaming优化:并行接收数据

    val numStreams = 5
    val kafkaStreams = (1 to numStreams).map { i => KafkaUtils.createStream(...) }
    val unifiedStream = streamingContext.union(kafkaStreams)
    unifiedStream.print()
    

    官方是这么说的

    Receiving data over the network (like Kafka, Flume, socket, etc.) requires the data to deserialized and stored in Spark. If the data receiving becomes a bottleneck in the system, then consider parallelizing the data receiving. Note that each input DStream creates a single receiver (running on a worker machine) that receives a single stream of data. Receiving multiple data streams can therefore be achieved by creating multiple input DStreams and configuring them to receive different partitions of the data stream from the source(s). For example, a single Kafka input DStream receiving two topics of data can be split into two Kafka input streams, each receiving only one topic. This would run two receivers on two workers, thus allowing data to be received in parallel, and increasing overall throughput. These multiple DStream can be unioned together to create a single DStream. Then the transformations that was being applied on the single input DStream can applied on the unified stream. 
    
  • 相关阅读:
    Lambda表达式的演变
    反射小例
    进程外Session
    页面缓存的几种方式
    数据缓存的几种方式
    Session
    Cookie
    AJAX学习
    验证码的实现
    ASP.NET动态显示数据的两种方式
  • 原文地址:https://www.cnblogs.com/hark0623/p/4502800.html
Copyright © 2011-2022 走看看