Spart RDD - 走看看

zoukankan html css js c++ java

Spart RDD

RDD: Resilient Distributed Dataset

1. Spark RDD is immutable

Since the RDD is immutable, splitting a big one to smaller ones, distributing them to
various worker nodes for processing, and finally compiling the results to produce the final
result can be done safely without worrying about the underlying data getting changed.

2.Spark RDD is distributable

3.Spark RDD lives in memory

Spark does keep all the RDDs in the memory as much as it can. Only in rare situations,
where Spark is running out of memory or if the data size is growing beyond the capacity, is
it written to disk. Most of the processing on RDD happens in the memory, and that is the
reason why Spark is able to process the data at a lightning fast speed.

4.Spark RDD is strongly typed

Spark RDD can be created using any supported data types. These data types can be
Scala/Java supported intrinsic data types or custom created data types such as your own
classes. The biggest advantage coming out of this design decision is the freedom from
runtime errors. If it is going to break because of a data type issue, it will break during
compile time.

查看全文

相关阅读:
Spoken English Practice（If you fail to do as I say, I will take you suffer.）
Powershell Exchange Message Per Day Sent and Reveive
Oracle实例的恢复、介质恢复（ crash recovery)（ Media recovery)
Oracle-Rman(物理备份）
OracleUNDO
Oracle重做日志REDO
Oracle存储——逻辑结构
 Oracle 数据库的组成（instance+database）
Spoken English Practice（I won't succumb to you, not ever again）
Grammar Rules

原文地址：https://www.cnblogs.com/ordili/p/6684089.html

Copyright © 2011-2022 走看看