DataSourceRDD — Input RDD Of DataSourceV2ScanExec Physical Operator
DataSourceRDD
is an RDD
that is created exclusively when DataSourceV2ScanExec
physical operator is requested for the input RDD (when WholeStageCodegenExec
physical operator is executed).
DataSourceRDD
uses DataSourceRDDPartition partitions.
Requesting Preferred Locations Of DataReaderFactory (For Partition) — getPreferredLocations
Method
1 2 3 4 5 |
getPreferredLocations(split: Partition): Seq[String] |
Note
|
getPreferredLocations is part of Spark Core’s RDD Contract to…FIXME.
|
getPreferredLocations
simply requests the preferred locations of the DataReaderFactory of the input DataSourceRDDPartition
partition.
getPartitions
Method
1 2 3 4 5 |
getPartitions: Array[Partition] |
Note
|
getPartitions is part of Spark Core’s RDD Contract to…FIXME
|
getPartitions
simply creates a DataSourceRDDPartition for every DataReaderFactory in the readerFactories.
Creating DataSourceRDD Instance
DataSourceRDD
takes the following when created:
-
Collection of DataReaderFactory objects
Computing Partition (in TaskContext) — compute
Method
1 2 3 4 5 |
compute(split: Partition, context: TaskContext): Iterator[T] |
Note
|
compute is part of Spark Core’s RDD Contract to compute a partition (in a TaskContext ).
|
compute
requests the DataReaderFactory (of the DataSourceRDDPartition partition) to createDataReader.
compute
registers a Spark Core TaskCompletionListener
that requests the DataReader
to close at a task completion.
compute
returns a Spark Core InterruptibleIterator
that…FIXME