ExternalRDD
ExternalRDD
is a leaf logical operator that is a logical representation of (the data from) an RDD in a logical query plan.
ExternalRDD
is created when:
-
SparkSession
is requested to create a DataFrame from RDD of product types (e.g. Scala case classes, tuples) or Dataset from RDD of a given type -
ExternalRDD
is requested to create a new instance
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
val pairsRDD = sc.parallelize((0, "zero") :: (1, "one") :: (2, "two") :: Nil) // A tuple of Int and String is a product type scala> :type pairsRDD org.apache.spark.rdd.RDD[(Int, String)] val pairsDF = spark.createDataFrame(pairsRDD) // ExternalRDD represents the pairsRDD in the query plan val logicalPlan = pairsDF.queryExecution.logical scala> println(logicalPlan.numberedTreeString) 00 SerializeFromObject [assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._1 AS _1#10, staticinvoke(class org.apache.spark.unsafe.types.UTF8String, StringType, fromString, assertnotnull(assertnotnull(input[0, scala.Tuple2, true]))._2, true, false) AS _2#11] 01 +- ExternalRDD [obj#9] |
ExternalRDD
is a MultiInstanceRelation and a ObjectProducer
.
Note
|
ExternalRDD is resolved to ExternalRDDScanExec when BasicOperators execution planning strategy is executed.
|
newInstance
Method
1 2 3 4 5 |
newInstance(): LogicalRDD.this.type |
Note
|
newInstance is part of MultiInstanceRelation Contract to…FIXME.
|
newInstance
…FIXME
Computing Statistics — computeStats
Method
1 2 3 4 5 |
computeStats(): Statistics |
Note
|
computeStats is part of LeafNode Contract to compute statistics for cost-based optimizer.
|
computeStats
…FIXME
Creating ExternalRDD — apply
Factory Method
1 2 3 4 5 |
apply[T: Encoder](rdd: RDD[T], session: SparkSession): LogicalPlan |
apply
…FIXME
Note
|
apply is used when SparkSession is requested to create a DataFrame from RDD of product types (e.g. Scala case classes, tuples) or Dataset from RDD of a given type.
|