Partitions and Partitioning
Partitions and Partitioning Introduction Depending on how you look at Spark (programmer, devop, admin), an RDD is ...
Partitions and Partitioning Introduction Depending on how you look at Spark (programmer, devop, admin), an RDD is ...
StorageLevel StorageLevel describes how an RDD is persisted (and addresses the following concerns): Does RDD u ...
RDD Caching and Persistence cache和persist都是用于将一个RDD进行缓存的,这样在之后使用的过程中就不需要重新计算了, ...
Actions Actions are RDD operations that produce non-RDD values. They materialize a value in a Spark program. In ot ...
PairRDDFunctions Tip Read up the scaladoc of PairRDDFunctions. PairRDDFunctions are available in RDDs of ...
Transformations Transformations are lazy operations on a RDD that create one or many new RDDs, e.g. map, filter, ...
Operators - Transformations and Actions RDDs have two types of operations: transformations and actions. Note ...
ShuffledRDD ShuffledRDD is an RDD of key-value pairs that represents the shuffle step in a RDD lineage. It uses cu ...
NewHadoopRDD NewHadoopRDD is an RDD of K keys and V values. NewHadoopRDD is created when: SparkContext.newAP ...
HadoopRDD HadoopRDD is an RDD that provides core functionality for reading data stored in HDFS, a local file syste ...