zoukankan      html  css  js  c++  java
  • Spart RDD

    RDD: Resilient Distributed Dataset

    1. Spark RDD is immutable

    Since the RDD is immutable, splitting a big one to smaller ones, distributing them to
    various worker nodes for processing, and finally compiling the results to produce the final
    result can be done safely without worrying about the underlying data getting changed.

    2.Spark RDD is distributable

    3.Spark RDD lives in memory

    Spark does keep all the RDDs in the memory as much as it can. Only in rare situations,
    where Spark is running out of memory or if the data size is growing beyond the capacity, is
    it written to disk. Most of the processing on RDD happens in the memory, and that is the
    reason why Spark is able to process the data at a lightning fast speed.

    4.Spark RDD is strongly typed

    Spark RDD can be created using any supported data types. These data types can be
    Scala/Java supported intrinsic data types or custom created data types such as your own
    classes. The biggest advantage coming out of this design decision is the freedom from
    runtime errors. If it is going to break because of a data type issue, it will break during
    compile time.

  • 相关阅读:
    tomcat调试页面的时候,不刷新
    $.ajax()方法详解(转)
    Zookeeper简述
    简述Dubbo
    Nginx入门
    Redis入门
    JVM入门
    spring MVC框架入门(外加SSM整合)
    Mybatis框架入门
    Spring+Hibernate+Struts(SSH)框架整合
  • 原文地址:https://www.cnblogs.com/ordili/p/6684089.html
Copyright © 2011-2022 走看看