
Map/Reduce-side Aggregator
Map/Reduce-side Aggregator Aggregator is a set of functions used to aggregate distributed data sets: [crayon-6803 ...
Map/Reduce-side Aggregator Aggregator is a set of functions used to aggregate distributed data sets: [crayon-6803 ...
ShuffleDependency — Shuffle Dependency
NarrowDependency — Narrow Dependencies
RDD Dependencies Dependency class is the base (abstract) class to model a dependency relationship between two or m ...
CheckpointRDD Caution FIXME
Checkpointing Checkpointing is a process of truncating RDD lineage graph and saving it to a reliable distributed ( ...
RDD shuffling Tip Read the official documentation about the topic Shuffle operations. It is still better than ...
HashPartitioner HashPartitioner is a Partitioner that uses partitions configurable number of partitions to shuffle ...
Partitioner Caution FIXME Partitioner captures data distribution at the output. A scheduler can optimize ...
Partition Partition is a contract of a partition index of a RDD. Note A partition is missing when it has no ...