spark-sql-spark技术分享-第5页

MultiInstanceRelation

MultiInstanceRelation is a contact of logical operators which a single instance might appear multiple times in a logical query plan.



package org.apache.spark.sql.catalyst.analysis

trait MultiInstanceRelation {
  def newInstance(): LogicalPlan
}

1

2

3

4

5

6

7

8

9

package org.apache.spark.sql.catalyst.analysis

trait MultiInstanceRelation {

def newInstance(): LogicalPlan

}

When ResolveReferences logical evaluation is executed, every MultiInstanceRelation in a logical query plan is requested to produce a new version of itself with globally unique expression ids.

Table 1. MultiInstanceRelations
MultiInstanceRelation	Description
`ContinuousExecutionRelation`	Used in Spark Structured Streaming
DataSourceV2Relation
ExternalRDD
HiveTableRelation
InMemoryRelation
LocalRelation
LogicalRDD
LogicalRelation
Range
View
`StreamingExecutionRelation`	Used in Spark Structured Streaming
`StreamingRelation`	Used in Spark Structured Streaming
`StreamingRelationV2`	Used in Spark Structured Streaming

CreateStruct Function Builder

CreateStruct is a function builder (e.g. Seq[Expression] ⇒ Expression) that can create CreateNamedStruct expressions and is the metadata of the struct function.

Metadata of struct Function — `registryEntry` Property



registryEntry: (String, (ExpressionInfo, FunctionBuilder))

1

2

3

4

5

registryEntry: (String, (ExpressionInfo, FunctionBuilder))

registryEntry…FIXME

Note	`registryEntry` is used exclusively when `FunctionRegistry` is requested for the function expression registry.

Creating CreateNamedStruct Expression — `apply` Method



apply(children: Seq[Expression]): CreateNamedStruct

1

2

3

4

5

apply(children: Seq[Expression]): CreateNamedStruct

Note	`apply` is part of Scala’s scala.Function1 contract to create a function of one parameter (e.g. `Seq[Expression]`).

apply creates a CreateNamedStruct expression with the input children expressions as follows:

For NamedExpression expressions that are resolved, apply creates a pair of a Literal expression (with the name of the NamedExpression) and the NamedExpression itself
For NamedExpression expressions that are not resolved yet, apply creates a pair of a NamePlaceholder expression and the NamedExpression itself
For all other expressions, apply creates a pair of a Literal expression (with the value as col[index]) and the Expression itself

Note

apply is used when:

ResolveReferences logical resolution rule is requested to expandStarExpression
InConversion type coercion rule is requested to coerceTypes
ExpressionEncoder is requested to create an ExpressionEncoder for a tuple
Stack generator expression is requested to generate a Java source code
AstBuilder is requested to parse a struct and row constructor
ColumnStat is requested to statExprs
KeyValueGroupedDataset is requested to aggUntyped (when KeyValueGroupedDataset.agg typed operator is used)
Dataset.joinWith typed transformation is used
struct standard function is used
SimpleTypedAggregateExpression expression is requested for the evaluateExpression and resultObjToRow

ScalaReflection

ScalaReflection is the contract and the only implementation of the contract with…FIXME

`serializerFor` Object Method



serializerFor[T : TypeTag](inputObject: Expression): CreateNamedStruct

1

2

3

4

5

serializerFor[T : TypeTag](inputObject: Expression): CreateNamedStruct

serializerFor firstly finds the local type of the input type T and then the class name.

serializerFor uses the internal version of itself with the input inputObject expression, the tpe type and the walkedTypePath with the class name found earlier (of the input type T).



- root class: "[clsName]"

1

2

3

4

5

- root class: "[clsName]"

In the end, serializerFor returns one of the following:

The CreateNamedStruct expression from the false value of the If expression returned only if the type T is definedByConstructorParams
Creates a CreateNamedStruct expression with the Literal with the value as "value" and the expression returned



import org.apache.spark.sql.functions.lit
val inputObject = lit(1).expr

import org.apache.spark.sql.catalyst.ScalaReflection
val serializer = ScalaReflection.serializerFor(inputObject)
scala> println(serializer)
named_struct(value, 1)

1

2

3

4

5

6

7

8

9

10

11

import org.apache.spark.sql.functions.lit

val inputObject = lit(1).expr

import org.apache.spark.sql.catalyst.ScalaReflection

val serializer = ScalaReflection.serializerFor(inputObject)

scala> println(serializer)

named_struct(value, 1)

Note	`serializerFor` is used when…FIXME

`serializerFor` Internal Method



serializerFor(
  inputObject: Expression,
  tpe: `Type`,
  walkedTypePath: Seq[String],
  seenTypeSet: Set[`Type`] = Set.empty): Expression

1

2

3

4

5

6

7

8

9

serializerFor(

inputObject: Expression,

tpe: `Type`,

walkedTypePath: Seq[String],

seenTypeSet: Set[`Type`] = Set.empty): Expression

serializerFor…FIXME

Note	`serializerFor` is used exclusively when `ScalaReflection` is requested to serializerFor.

`localTypeOf` Object Method



localTypeOf[T: TypeTag]: `Type`

1

2

3

4

5

localTypeOf[T: TypeTag]: `Type`

localTypeOf…FIXME



import org.apache.spark.sql.catalyst.ScalaReflection
val tpe = ScalaReflection.localTypeOf[Int]
scala> :type tpe
org.apache.spark.sql.catalyst.ScalaReflection.universe.Type

scala> println(tpe)
Int

1

2

3

4

5

6

7

8

9

10

11

import org.apache.spark.sql.catalyst.ScalaReflection

val tpe = ScalaReflection.localTypeOf[Int]

scala> :type tpe

org.apache.spark.sql.catalyst.ScalaReflection.universe.Type

scala> println(tpe)

Int

Note	`localTypeOf` is used when…FIXME

`getClassNameFromType` Object Method



getClassNameFromType(tpe: `Type`): String

1

2

3

4

5

getClassNameFromType(tpe: `Type`): String

getClassNameFromType…FIXME



import org.apache.spark.sql.catalyst.ScalaReflection
val tpe = ScalaReflection.localTypeOf[java.time.LocalDateTime]
val className = ScalaReflection.getClassNameFromType(tpe)
scala> println(className)
java.time.LocalDateTime

1

2

3

4

5

6

7

8

9

import org.apache.spark.sql.catalyst.ScalaReflection

val tpe = ScalaReflection.localTypeOf[java.time.LocalDateTime]

val className = ScalaReflection.getClassNameFromType(tpe)

scala> println(className)

java.time.LocalDateTime

Note	`getClassNameFromType` is used when…FIXME

`definedByConstructorParams` Object Method



definedByConstructorParams(tpe: Type): Boolean

1

2

3

4

5

definedByConstructorParams(tpe: Type): Boolean

definedByConstructorParams…FIXME

Note	`definedByConstructorParams` is used when…FIXME

AggUtils Helper Object

AggUtils is a Scala object that defines the methods used exclusively when Aggregation execution planning strategy is executed.

planAggregateWithoutDistinct
planAggregateWithOneDistinct

`planAggregateWithOneDistinct` Method



planAggregateWithOneDistinct(
  groupingExpressions: Seq[NamedExpression],
  functionsWithDistinct: Seq[AggregateExpression],
  functionsWithoutDistinct: Seq[AggregateExpression],
  resultExpressions: Seq[NamedExpression],
  child: SparkPlan): Seq[SparkPlan]

1

2

3

4

5

6

7

8

9

10

planAggregateWithOneDistinct(

groupingExpressions: Seq[NamedExpression],

functionsWithDistinct: Seq[AggregateExpression],

functionsWithoutDistinct: Seq[AggregateExpression],

resultExpressions: Seq[NamedExpression],

child: SparkPlan): Seq[SparkPlan]

planAggregateWithOneDistinct…FIXME

Note	`planAggregateWithOneDistinct` is used exclusively when `Aggregation` execution planning strategy is executed.

Creating Physical Plan with Two Aggregate Physical Operators for Partial and Final Aggregations — `planAggregateWithoutDistinct` Method



planAggregateWithoutDistinct(
  groupingExpressions: Seq[NamedExpression],
  aggregateExpressions: Seq[AggregateExpression],
  resultExpressions: Seq[NamedExpression],
  child: SparkPlan): Seq[SparkPlan]

1

2

3

4

5

6

7

8

9

planAggregateWithoutDistinct(

groupingExpressions: Seq[NamedExpression],

aggregateExpressions: Seq[AggregateExpression],

resultExpressions: Seq[NamedExpression],

child: SparkPlan): Seq[SparkPlan]

planAggregateWithoutDistinct is a two-step physical operator generator.

planAggregateWithoutDistinct first creates an aggregate physical operator with aggregateExpressions in Partial mode (for partial aggregations).

Note	`requiredChildDistributionExpressions` for the aggregate physical operator for partial aggregation “stage” is empty.

In the end, planAggregateWithoutDistinct creates another aggregate physical operator (of the same type as before), but aggregateExpressions are now in Final mode (for final aggregations). The aggregate physical operator becomes the parent of the first aggregate operator.

Note	`requiredChildDistributionExpressions` for the parent aggregate physical operator for final aggregation “stage” are the attributes of `groupingExpressions`.

Note	`planAggregateWithoutDistinct` is used exclusively when `Aggregation` execution planning strategy is executed (with no `AggregateExpressions` being distinct).

Creating Aggregate Physical Operator — `createAggregate` Internal Method



createAggregate(
  requiredChildDistributionExpressions: Option[Seq[Expression]] = None,
  groupingExpressions: Seq[NamedExpression] = Nil,
  aggregateExpressions: Seq[AggregateExpression] = Nil,
  aggregateAttributes: Seq[Attribute] = Nil,
  initialInputBufferOffset: Int = 0,
  resultExpressions: Seq[NamedExpression] = Nil,
  child: SparkPlan): SparkPlan

1

2

3

4

5

6

7

8

9

10

11

12

createAggregate(

requiredChildDistributionExpressions: Option[Seq[Expression]] = None,

groupingExpressions: Seq[NamedExpression] = Nil,

aggregateExpressions: Seq[AggregateExpression] = Nil,

aggregateAttributes: Seq[Attribute] = Nil,

initialInputBufferOffset: Int = 0,

resultExpressions: Seq[NamedExpression] = Nil,

child: SparkPlan): SparkPlan

createAggregate creates a physical operator given the input aggregateExpressions aggregate expressions.

Table 1. createAggregate’s Aggregate Physical Operator Selection Criteria (in execution order)
Aggregate Physical Operator	Selection Criteria
HashAggregateExec	`HashAggregateExec` supports all `aggBufferAttributes` of the input `aggregateExpressions` aggregate expressions.
ObjectHashAggregateExec	spark.sql.execution.useObjectHashAggregateExec internal flag enabled (it is by default) `ObjectHashAggregateExec` supports the input `aggregateExpressions` aggregate expressions.
SortAggregateExec	When all the above requirements could not be met.

Note	`createAggregate` is used when `AggUtils` is requested to planAggregateWithoutDistinct, planAggregateWithOneDistinct (and `planStreamingAggregation` for Spark Structured Streaming)

SchemaUtils Helper Object

2013-06-17admin阅读(1845)

SchemaUtils Helper Object

SchemaUtils is a Scala object that is used for the following:

checkColumnNameDuplication
checkSchemaColumnNameDuplication

`checkColumnNameDuplication` Method



checkColumnNameDuplication(
  columnNames: Seq[String], colType: String, resolver: Resolver): Unit  (1)
checkColumnNameDuplication(
  columnNames: Seq[String], colType: String, caseSensitiveAnalysis: Boolean): Unit

1

2

3

4

5

6

7

8

checkColumnNameDuplication(

columnNames: Seq[String], colType: String, resolver: Resolver): Unit (1)

checkColumnNameDuplication(

columnNames: Seq[String], colType: String, caseSensitiveAnalysis: Boolean): Unit

Uses the other checkColumnNameDuplication with caseSensitiveAnalysis flag per isCaseSensitiveAnalysis

checkColumnNameDuplication…FIXME

Note	`checkColumnNameDuplication` is used when…FIXME

`checkSchemaColumnNameDuplication` Method



checkSchemaColumnNameDuplication(
  schema: StructType, colType: String, caseSensitiveAnalysis: Boolean = false): Unit

1

2

3

4

5

6

checkSchemaColumnNameDuplication(

schema: StructType, colType: String, caseSensitiveAnalysis: Boolean = false): Unit

checkSchemaColumnNameDuplication…FIXME

Note	`checkSchemaColumnNameDuplication` is used when…FIXME

`isCaseSensitiveAnalysis` Internal Method



isCaseSensitiveAnalysis(resolver: Resolver): Boolean

1

2

3

4

5

isCaseSensitiveAnalysis(resolver: Resolver): Boolean

isCaseSensitiveAnalysis…FIXME

Note	`isCaseSensitiveAnalysis` is used when…FIXME

PredicateHelper Scala Trait

2013-06-16admin阅读(2055)

PredicateHelper Scala Trait

PredicateHelper defines the methods that are used to work with predicates (mainly).

Table 1. PredicateHelper’s Methods
Method	Description
splitConjunctivePredicates
splitDisjunctivePredicates
replaceAlias
canEvaluate
canEvaluateWithinJoin

Splitting Conjunctive Predicates — `splitConjunctivePredicates` Method



splitConjunctivePredicates(condition: Expression): Seq[Expression]

1

2

3

4

5

splitConjunctivePredicates(condition: Expression): Seq[Expression]

splitConjunctivePredicates takes the input condition expression and splits it to two expressions if they are children of a And binary expression.

splitConjunctivePredicates splits the child expressions recursively down the child expressions until no conjunctive And binary expressions exist.

`splitDisjunctivePredicates` Method



splitDisjunctivePredicates(condition: Expression): Seq[Expression]

1

2

3

4

5

splitDisjunctivePredicates(condition: Expression): Seq[Expression]

splitDisjunctivePredicates…FIXME

Note	`splitDisjunctivePredicates` is used when…FIXME

`replaceAlias` Method



replaceAlias(
  condition: Expression,
  aliases: AttributeMap[Expression]): Expression

1

2

3

4

5

6

7

replaceAlias(

condition: Expression,

aliases: AttributeMap[Expression]): Expression

replaceAlias…FIXME

Note	`replaceAlias` is used when…FIXME

`canEvaluate` Method



canEvaluate(expr: Expression, plan: LogicalPlan): Boolean

1

2

3

4

5

canEvaluate(expr: Expression, plan: LogicalPlan): Boolean

canEvaluate…FIXME

Note	`canEvaluate` is used when…FIXME

`canEvaluateWithinJoin` Method



canEvaluateWithinJoin(expr: Expression): Boolean

1

2

3

4

5

canEvaluateWithinJoin(expr: Expression): Boolean

canEvaluateWithinJoin indicates whether a Catalyst expression can be evaluated within a join, i.e. when one of the following conditions holds:

Expression is deterministic
Expression is not Unevaluable, ListQuery or Exists
Expression is a SubqueryExpression with no child expressions
Expression is a AttributeReference
Any expression with child expressions that meet one of the above conditions

Note	`canEvaluateWithinJoin` is used when: `PushPredicateThroughJoin` logical optimization rule is executed `ReorderJoin` logical optimization rule does createOrderedJoin

SubExprUtils Helper Object

2013-06-15admin阅读(1216)

SubExprUtils Helper Object

SubExprUtils is a Scala object that is used for…FIXME

SubExprUtils uses PredicateHelper for…FIXME

SubExprUtils is used to check whether a condition expression has any null-aware predicate subqueries inside Not expressions.

Checking If Condition Expression Has Any Null-Aware Predicate Subqueries Inside Not — `hasNullAwarePredicateWithinNot` Method



hasNullAwarePredicateWithinNot(condition: Expression): Boolean

1

2

3

4

5

hasNullAwarePredicateWithinNot(condition: Expression): Boolean

hasNullAwarePredicateWithinNot splits conjunctive predicates (i.e. expressions separated by And expression).

hasNullAwarePredicateWithinNot is positive (i.e. true) and is considered to have a null-aware predicate subquery inside a Not expression when conjuctive predicate expressions include a Not expression with an In predicate expression with a ListQuery subquery expression.



import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.dsl.plans._
val plan = LocalRelation('key.int, 'value.string).analyze

import org.apache.spark.sql.catalyst.expressions._
val in = In(value = Literal.create(1), Seq(ListQuery(plan)))
val condition = Not(child = Or(left = Literal.create(false), right = in))

import org.apache.spark.sql.catalyst.expressions.SubExprUtils
val positive = SubExprUtils.hasNullAwarePredicateWithinNot(condition)
assert(positive)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

import org.apache.spark.sql.catalyst.plans.logical._

import org.apache.spark.sql.catalyst.dsl.plans._

val plan = LocalRelation('key.int, 'value.string).analyze

import org.apache.spark.sql.catalyst.expressions._

val in = In(value = Literal.create(1), Seq(ListQuery(plan)))

val condition = Not(child = Or(left = Literal.create(false), right = in))

import org.apache.spark.sql.catalyst.expressions.SubExprUtils

val positive = SubExprUtils.hasNullAwarePredicateWithinNot(condition)

assert(positive)

hasNullAwarePredicateWithinNot is negative (i.e. false) for all the other expressions and in particular the following expressions:

Exists predicate subquery expressions
Not expressions with a Exists predicate subquery expression as the child expression
In expressions with a ListQuery subquery expression as the list expression
Not expressions with a In expression (with a ListQuery subquery expression as the list expression)



import org.apache.spark.sql.catalyst.plans.logical._
import org.apache.spark.sql.catalyst.dsl.plans._
val plan = LocalRelation('key.int, 'value.string).analyze

import org.apache.spark.sql.catalyst.expressions._
import org.apache.spark.sql.catalyst.expressions.SubExprUtils

// Exists
val condition = Exists(plan)
val negative = SubExprUtils.hasNullAwarePredicateWithinNot(condition)
assert(!negative)

// Not Exists
val condition = Not(child = Exists(plan))
val negative = SubExprUtils.hasNullAwarePredicateWithinNot(condition)
assert(!negative)

// In with ListQuery
val condition = In(value = Literal.create(1), Seq(ListQuery(plan)))
val negative = SubExprUtils.hasNullAwarePredicateWithinNot(condition)
assert(!negative)

// Not In with ListQuery
val in = In(value = Literal.create(1), Seq(ListQuery(plan)))
val condition = Not(child = in)
val negative = SubExprUtils.hasNullAwarePredicateWithinNot(condition)
assert(!negative)

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

import org.apache.spark.sql.catalyst.plans.logical._

import org.apache.spark.sql.catalyst.dsl.plans._

val plan = LocalRelation('key.int, 'value.string).analyze

import org.apache.spark.sql.catalyst.expressions._

import org.apache.spark.sql.catalyst.expressions.SubExprUtils

// Exists

val condition = Exists(plan)

val negative = SubExprUtils.hasNullAwarePredicateWithinNot(condition)

assert(!negative)

// Not Exists

val condition = Not(child = Exists(plan))

val negative = SubExprUtils.hasNullAwarePredicateWithinNot(condition)

assert(!negative)

// In with ListQuery

val condition = In(value = Literal.create(1), Seq(ListQuery(plan)))

val negative = SubExprUtils.hasNullAwarePredicateWithinNot(condition)

assert(!negative)

// Not In with ListQuery

val in = In(value = Literal.create(1), Seq(ListQuery(plan)))

val condition = Not(child = in)

val negative = SubExprUtils.hasNullAwarePredicateWithinNot(condition)

assert(!negative)

Note	`hasNullAwarePredicateWithinNot` is used exclusively when `CheckAnalysis` analysis validation is requested to validate analysis of a logical plan (with `Filter` logical operators).

StatFunctions Helper Object

2013-06-14admin阅读(1623)

StatFunctions Helper Object

StatFunctions is a Scala object that defines the methods that are used for…FIXME

Table 1. StatFunctions API

Method

Description

calculateCov



calculateCov(df: DataFrame, cols: Seq[String]): Double

1

2

3

4

5

calculateCov(df: DataFrame, cols: Seq[String]): Double

crossTabulate



crossTabulate(df: DataFrame, col1: String, col2: String): DataFrame

1

2

3

4

5

crossTabulate(df: DataFrame, col1: String, col2: String): DataFrame

multipleApproxQuantiles



multipleApproxQuantiles(
  df: DataFrame,
  cols: Seq[String],
  probabilities: Seq[Double],
  relativeError: Double): Seq[Seq[Double]]

1

2

3

4

5

6

7

8

9

multipleApproxQuantiles(

df: DataFrame,

cols: Seq[String],

probabilities: Seq[Double],

relativeError: Double): Seq[Seq[Double]]

pearsonCorrelation



pearsonCorrelation(df: DataFrame, cols: Seq[String]): Double

1

2

3

4

5

pearsonCorrelation(df: DataFrame, cols: Seq[String]): Double

summary



summary(ds: Dataset[_], statistics: Seq[String]): DataFrame

1

2

3

4

5

summary(ds: Dataset[_], statistics: Seq[String]): DataFrame

`calculateCov` Method



calculateCov(df: DataFrame, cols: Seq[String]): Double

1

2

3

4

5

calculateCov(df: DataFrame, cols: Seq[String]): Double

calculateCov…FIXME

Note	`calculateCov` is used when…FIXME

`crossTabulate` Method



crossTabulate(df: DataFrame, col1: String, col2: String): DataFrame

1

2

3

4

5

crossTabulate(df: DataFrame, col1: String, col2: String): DataFrame

crossTabulate…FIXME

Note	`crossTabulate` is used when…FIXME

`multipleApproxQuantiles` Method



multipleApproxQuantiles(
  df: DataFrame,
  cols: Seq[String],
  probabilities: Seq[Double],
  relativeError: Double): Seq[Seq[Double]]

1

2

3

4

5

6

7

8

9

multipleApproxQuantiles(

df: DataFrame,

cols: Seq[String],

probabilities: Seq[Double],

relativeError: Double): Seq[Seq[Double]]

multipleApproxQuantiles…FIXME

Note	`multipleApproxQuantiles` is used when…FIXME

`pearsonCorrelation` Method



pearsonCorrelation(df: DataFrame, cols: Seq[String]): Double

1

2

3

4

5

pearsonCorrelation(df: DataFrame, cols: Seq[String]): Double

pearsonCorrelation…FIXME

Note	`pearsonCorrelation` is used when…FIXME

Calculating Statistics For Dataset — `summary` Method



summary(ds: Dataset[_], statistics: Seq[String]): DataFrame

1

2

3

4

5

summary(ds: Dataset[_], statistics: Seq[String]): DataFrame

summary…FIXME

Note	`summary` is used exclusively when Dataset.summary action is used.

CatalystTypeConverters Helper Object

2013-06-13admin阅读(1769)

CatalystTypeConverters Helper Object

CatalystTypeConverters is a Scala object that is used to convert Scala types to Catalyst types and vice versa.

`createToCatalystConverter` Method



createToCatalystConverter(dataType: DataType): Any => Any

1

2

3

4

5

createToCatalystConverter(dataType: DataType): Any => Any

createToCatalystConverter…FIXME

Note	`createToCatalystConverter` is used when…FIXME

`convertToCatalyst` Method



convertToCatalyst(a: Any): Any

1

2

3

4

5

convertToCatalyst(a: Any): Any

convertToCatalyst…FIXME

Note	`convertToCatalyst` is used when…FIXME

RDDConversions Helper Object

2013-06-12admin阅读(4055)

RDDConversions Helper Object

RDDConversions is a Scala object that is used to productToRowRdd and rowToRowRdd methods.

`productToRowRdd` Method



productToRowRdd[A <: Product](data: RDD[A], outputTypes: Seq[DataType]): RDD[InternalRow]

1

2

3

4

5

productToRowRdd[A <: Product](data: RDD[A], outputTypes: Seq[DataType]): RDD[InternalRow]

productToRowRdd…FIXME

Note	`productToRowRdd` is used when…FIXME

Converting Scala Objects In Rows to Values Of Catalyst Types — `rowToRowRdd` Method



rowToRowRdd(data: RDD[Row], outputTypes: Seq[DataType]): RDD[InternalRow]

1

2

3

4

5

rowToRowRdd(data: RDD[Row], outputTypes: Seq[DataType]): RDD[InternalRow]

rowToRowRdd maps over partitions of the input RDD[Row] (using RDD.mapPartitions operator) that creates a MapPartitionsRDD with a “map” function.

Tip	Use `RDD.toDebugString` to see the additional `MapPartitionsRDD` in an RDD lineage.

The “map” function takes a Scala Iterator of Row objects and does the following:

Creates a GenericInternalRow (of the size that is the number of columns per the input Seq[DataType])
Creates a converter function for every DataType in Seq[DataType]
For every Row object in the partition (iterator), applies the converter function per position and adds the result value to the GenericInternalRow
In the end, returns a GenericInternalRow for every row

Note	`rowToRowRdd` is used exclusively when `DataSourceStrategy` execution planning strategy is executed (and requested to toCatalystRDD).

spark-sql 第5页

MultiInstanceRelation

CreateStruct Function Builder

Metadata of struct Function — registryEntry Property

Creating CreateNamedStruct Expression — apply Method

ScalaReflection

serializerFor Object Method

serializerFor Internal Method

localTypeOf Object Method

getClassNameFromType Object Method

definedByConstructorParams Object Method

AggUtils Helper Object

planAggregateWithOneDistinct Method

Creating Physical Plan with Two Aggregate Physical Operators for Partial and Final Aggregations — planAggregateWithoutDistinct Method

Creating Aggregate Physical Operator — createAggregate Internal Method

SchemaUtils Helper Object

checkColumnNameDuplication Method

checkSchemaColumnNameDuplication Method

isCaseSensitiveAnalysis Internal Method

PredicateHelper Scala Trait

Splitting Conjunctive Predicates — splitConjunctivePredicates Method

splitDisjunctivePredicates Method

replaceAlias Method

canEvaluate Method

canEvaluateWithinJoin Method

SubExprUtils Helper Object

Checking If Condition Expression Has Any Null-Aware Predicate Subqueries Inside Not — hasNullAwarePredicateWithinNot Method

StatFunctions Helper Object

calculateCov Method

crossTabulate Method

multipleApproxQuantiles Method

pearsonCorrelation Method

Calculating Statistics For Dataset — summary Method

CatalystTypeConverters Helper Object

createToCatalystConverter Method

convertToCatalyst Method

RDDConversions Helper Object

productToRowRdd Method

Converting Scala Objects In Rows to Values Of Catalyst Types — rowToRowRdd Method

欢迎关注：spark技术分享

关注公众号：spark技术分享

QQ咨询

回顶部

Metadata of struct Function — `registryEntry` Property

Creating CreateNamedStruct Expression — `apply` Method

`serializerFor` Object Method

`serializerFor` Internal Method

`localTypeOf` Object Method

`getClassNameFromType` Object Method

`definedByConstructorParams` Object Method

`planAggregateWithOneDistinct` Method

Creating Physical Plan with Two Aggregate Physical Operators for Partial and Final Aggregations — `planAggregateWithoutDistinct` Method

Creating Aggregate Physical Operator — `createAggregate` Internal Method

`checkColumnNameDuplication` Method

`checkSchemaColumnNameDuplication` Method

`isCaseSensitiveAnalysis` Internal Method

Splitting Conjunctive Predicates — `splitConjunctivePredicates` Method

`splitDisjunctivePredicates` Method

`replaceAlias` Method

`canEvaluate` Method

`canEvaluateWithinJoin` Method

Checking If Condition Expression Has Any Null-Aware Predicate Subqueries Inside Not — `hasNullAwarePredicateWithinNot` Method

`calculateCov` Method

`crossTabulate` Method

`multipleApproxQuantiles` Method

`pearsonCorrelation` Method

Calculating Statistics For Dataset — `summary` Method

`createToCatalystConverter` Method

`convertToCatalyst` Method

`productToRowRdd` Method

Converting Scala Objects In Rows to Values Of Catalyst Types — `rowToRowRdd` Method