关注 spark技术分享,
撸spark源码 玩spark最佳实践

RDDConversions Helper Object

RDDConversions Helper Object

RDDConversions is a Scala object that is used to productToRowRdd and rowToRowRdd methods.

productToRowRdd Method

productToRowRdd…​FIXME

Note
productToRowRdd is used when…​FIXME

Converting Scala Objects In Rows to Values Of Catalyst Types — rowToRowRdd Method

rowToRowRdd maps over partitions of the input RDD[Row] (using RDD.mapPartitions operator) that creates a MapPartitionsRDD with a “map” function.

Tip
Use RDD.toDebugString to see the additional MapPartitionsRDD in an RDD lineage.

The “map” function takes a Scala Iterator of Row objects and does the following:

  1. Creates a GenericInternalRow (of the size that is the number of columns per the input Seq[DataType])

  2. Creates a converter function for every DataType in Seq[DataType]

  3. For every Row object in the partition (iterator), applies the converter function per position and adds the result value to the GenericInternalRow

  4. In the end, returns a GenericInternalRow for every row

Note
rowToRowRdd is used exclusively when DataSourceStrategy execution planning strategy is executed (and requested to toCatalystRDD).
赞(0) 打赏
未经允许不得转载:spark技术分享 » RDDConversions Helper Object
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏