关注 spark技术分享,
撸spark源码 玩spark最佳实践

SerializeFromObjectExec

SerializeFromObjectExec Unary Physical Operator

SerializeFromObjectExec is a unary physical operator (i.e. with one child physical operator) that supports Java code generation.

SerializeFromObjectExec supports Java code generation with the doProduce, doConsume and inputRDDs methods.

SerializeFromObjectExec is a ObjectConsumerExec.

SerializeFromObjectExec is created exclusively when BasicOperators execution planning strategy is requested to plan a SerializeFromObject logical operator.

SerializeFromObjectExec uses the child physical operator when requested for the input RDDs and the outputPartitioning.

SerializeFromObjectExec uses the serializer for the output schema attributes.

Creating SerializeFromObjectExec Instance

SerializeFromObjectExec takes the following when created:

Generating Java Source Code for Consume Path in Whole-Stage Code Generation — doConsume Method

Note
doConsume is part of CodegenSupport Contract to generate the Java source code for consume path in Whole-Stage Code Generation.

doConsume…​FIXME

Generating Java Source Code for Produce Path in Whole-Stage Code Generation — doProduce Method

Note
doProduce is part of CodegenSupport Contract to generate the Java source code for produce path in Whole-Stage Code Generation.

doProduce…​FIXME

Executing Physical Operator (Generating RDD[InternalRow]) — doExecute Method

Note
doExecute is part of SparkPlan Contract to generate the runtime representation of a structured query as a distributed computation over internal binary rows on Apache Spark (i.e. RDD[InternalRow]).

doExecute requests the child physical operator to execute (that triggers physical query planning and generates an RDD[InternalRow]) and transforms it by executing the following function on internal rows per partition with index (using RDD.mapPartitionsWithIndexInternal that creates another RDD):

  1. Creates an UnsafeProjection for the serializer

  2. Requests the UnsafeProjection to initialize (for the partition index)

  3. Executes the UnsafeProjection on all internal binary rows in the partition

Note
doExecute (by RDD.mapPartitionsWithIndexInternal) adds a new MapPartitionsRDD to the RDD lineage. Use RDD.toDebugString to see the additional MapPartitionsRDD.
赞(0) 打赏
未经允许不得转载:spark技术分享 » SerializeFromObjectExec
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏