zoukankan html css js c++ java

streaming优化：并行接收数据

val numStreams = 5
val kafkaStreams = (1 to numStreams).map { i => KafkaUtils.createStream(...) }
val unifiedStream = streamingContext.union(kafkaStreams)
unifiedStream.print()

官方是这么说的

Receiving data over the network (like Kafka, Flume, socket, etc.) requires the data to deserialized and stored in Spark. If the data receiving becomes a bottleneck in the system, then consider parallelizing the data receiving. Note that each input DStream creates a single receiver (running on a worker machine) that receives a single stream of data. Receiving multiple data streams can therefore be achieved by creating multiple input DStreams and configuring them to receive different partitions of the data stream from the source(s). For example, a single Kafka input DStream receiving two topics of data can be split into two Kafka input streams, each receiving only one topic. This would run two receivers on two workers, thus allowing data to be received in parallel, and increasing overall throughput. These multiple DStream can be unioned together to create a single DStream. Then the transformations that was being applied on the single input DStream can applied on the unified stream.

查看全文

相关阅读:
Python元类
 Python魔术方法
 Python反射
 Failed to enable constraints. One or more rows contain values violating non-null, unique, or foreign-key constraints.
游标使用的简单示例
 C# 指定物理目录下载文件，Response.End导致“正在中止线程”异常的问题
 “一键制作启动u盘失败”的主要原因是什么？
IE11 不能正常方法网页
 Notepad++的右键菜单
 [datatable]排序时指定某列不可排序

原文地址：https://www.cnblogs.com/hark0623/p/4502800.html