关注 spark技术分享,
撸spark源码 玩spark最佳实践

DataSourceRDD — Input RDD Of DataSourceV2ScanExec Physical Operator

DataSourceRDD — Input RDD Of DataSourceV2ScanExec Physical Operator

DataSourceRDD is an RDD that is created exclusively when DataSourceV2ScanExec physical operator is requested for the input RDD (when WholeStageCodegenExec physical operator is executed).

DataSourceRDD uses DataSourceRDDPartition partitions.

Requesting Preferred Locations Of DataReaderFactory (For Partition) — getPreferredLocations Method

Note
getPreferredLocations is part of Spark Core’s RDD Contract to…​FIXME.

getPreferredLocations simply requests the preferred locations of the DataReaderFactory of the input DataSourceRDDPartition partition.

getPartitions Method

Note
getPartitions is part of Spark Core’s RDD Contract to…​FIXME

getPartitions simply creates a DataSourceRDDPartition for every DataReaderFactory in the readerFactories.

Creating DataSourceRDD Instance

DataSourceRDD takes the following when created:

Computing Partition (in TaskContext) — compute Method

Note
compute is part of Spark Core’s RDD Contract to compute a partition (in a TaskContext).

compute requests the DataReaderFactory (of the DataSourceRDDPartition partition) to createDataReader.

compute registers a Spark Core TaskCompletionListener that requests the DataReader to close at a task completion.

compute returns a Spark Core InterruptibleIterator that…​FIXME

赞(0) 打赏
未经允许不得转载:spark技术分享 » DataSourceRDD — Input RDD Of DataSourceV2ScanExec Physical Operator
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏