zoukankan      html  css  js  c++  java
  • Spart RDD

    RDD: Resilient Distributed Dataset

    1. Spark RDD is immutable

    Since the RDD is immutable, splitting a big one to smaller ones, distributing them to
    various worker nodes for processing, and finally compiling the results to produce the final
    result can be done safely without worrying about the underlying data getting changed.

    2.Spark RDD is distributable

    3.Spark RDD lives in memory

    Spark does keep all the RDDs in the memory as much as it can. Only in rare situations,
    where Spark is running out of memory or if the data size is growing beyond the capacity, is
    it written to disk. Most of the processing on RDD happens in the memory, and that is the
    reason why Spark is able to process the data at a lightning fast speed.

    4.Spark RDD is strongly typed

    Spark RDD can be created using any supported data types. These data types can be
    Scala/Java supported intrinsic data types or custom created data types such as your own
    classes. The biggest advantage coming out of this design decision is the freedom from
    runtime errors. If it is going to break because of a data type issue, it will break during
    compile time.

  • 相关阅读:
    UVA 11997 K个最小和
    UVALive 3135阿格斯
    UVA 10635 王子和公主
    UVA11991线性查询
    UVA1339仿射和换位密码
    UVA 10382喷水设施
    LA2965字符串合并
    FatMouse's Speed--hdu1160(dp+输出路径)
    Dividing--hdu1059(动态规划)
    Piggy-Bank--hdu1114(完全背包)
  • 原文地址:https://www.cnblogs.com/ordili/p/6684089.html
Copyright © 2011-2022 走看看