
Map/Reduce-side Aggregator
Map/Reduce-side Aggregator Aggregator is a set of functions used to aggregate distributed data sets: [crayon-692e ...

Map/Reduce-side Aggregator Aggregator is a set of functions used to aggregate distributed data sets: [crayon-692e ...

ShuffleDependency — Shuffle Dependency

NarrowDependency — Narrow Dependencies

RDD Dependencies Dependency class is the base (abstract) class to model a dependency relationship between two or m ...

CheckpointRDD Caution FIXME

Checkpointing Checkpointing is a process of truncating RDD lineage graph and saving it to a reliable distributed ( ...

RDD shuffling Tip Read the official documentation about the topic Shuffle operations. It is still better than ...

HashPartitioner HashPartitioner is a Partitioner that uses partitions configurable number of partitions to shuffle ...

Partitioner Caution FIXME Partitioner captures data distribution at the output. A scheduler can optimize ...

Partition Partition is a contract of a partition index of a RDD. Note A partition is missing when it has no ...