关注 spark技术分享,
撸spark源码 玩spark最佳实践

DeserializeToObject

admin阅读(2183)

DeserializeToObject Unary Logical Operator

DeserializeToObject is a unary logical operator that takes the input row from the input child logical plan and turns it into the input outputObjAttr attribute using the given deserializer expression.

DeserializeToObject is a ObjectProducer which produces domain objects as output. DeserializeToObject‘s output is a single-field safe row containing the produced object.

Note
DeserializeToObject is the result of CatalystSerde.deserialize.

DescribeTableCommand

admin阅读(1328)

DescribeTableCommand Logical Command

DescribeTableCommand is a logical command that executes a DESCRIBE TABLE SQL statement.

DescribeTableCommand is created exclusively when SparkSqlAstBuilder is requested to parse DESCRIBE TABLE SQL statement (with no column specified).

DescribeTableCommand uses the following output schema:

  • col_name as the name of the column

  • data_type as the data type of the column

  • comment as the comment of the column

Executing Logical Command — run Method

Note
run is part of the RunnableCommand Contract to execute (run) a logical command.

run uses the SessionCatalog (of the SessionState of the input SparkSession) and branches off per the type of the table to display.

For a temporary view, run requests the SessionCatalog to lookupRelation to access the schema and describeSchema.

For all other table types, run does the following:

Describing Detailed Partition and Storage Information — describeFormattedDetailedPartitionInfo Internal Method

describeFormattedDetailedPartitionInfo simply adds the following entries (rows) to the input mutable buffer:

  1. A new line

  2. # Detailed Partition Information

  3. Database with the database of the given table

  4. Table with the table of the given tableIdentifier

  5. Partition specification (of the CatalogTablePartition)

  6. A new line

  7. # Storage Information

  8. Bucketing specification of the table (if defined)

  9. Storage specification of the table

Note
describeFormattedDetailedPartitionInfo is used exclusively when DescribeTableCommand is requested to describeDetailedPartitionInfo with a non-empty partitionSpec and the isExtended flag on.

Describing Detailed Table Information — describeFormattedTableInfo Internal Method

describeFormattedTableInfo…​FIXME

Note
describeFormattedTableInfo is used exclusively when DescribeTableCommand is requested to run for a non-temporary table and the isExtended flag on.

describeDetailedPartitionInfo Internal Method

describeDetailedPartitionInfo…​FIXME

Note
describeDetailedPartitionInfo is used exclusively when DescribeTableCommand is requested to run with a non-empty partitionSpec.

Creating DescribeTableCommand Instance

DescribeTableCommand takes the following when created:

  • TableIdentifier

  • TablePartitionSpec

  • isExtended flag

DescribeTableCommand initializes the internal registries and counters.

describeSchema Internal Method

describeSchema…​FIXME

Note
describeSchema is used when…​FIXME

Describing Partition Information — describePartitionInfo Internal Method

describePartitionInfo…​FIXME

Note
describePartitionInfo is used when…​FIXME

DescribeColumnCommand

admin阅读(1466)

DescribeColumnCommand Logical Command for DESCRIBE TABLE SQL Command with Column

DescribeColumnCommand is a logical command for DESCRIBE TABLE SQL command with a single column only (i.e. no PARTITION specification).

DescribeColumnCommand defines the output schema with the following columns:

  • info_name with “name of the column info” comment

  • info_value with “value of the column info” comment

Note
DescribeColumnCommand is described by describeTable labeled alternative in statement expression in SqlBase.g4 and parsed using SparkSqlParser.

Executing Logical Command (Describing Column with Optional Statistics) — run Method

Note
run is part of RunnableCommand Contract to execute (run) a logical command.

run resolves the column name in table and makes sure that it is a “flat” field (i.e. not of a nested data type).

run requests the SessionCatalog for the table metadata.

Note
run uses the input SparkSession to access SessionState that in turn is used to access the SessionCatalog.

run takes the column statistics from the table statistics if available.

Note
Column statistics are available (in the table statistics) only after ANALYZE TABLE FOR COLUMNS SQL command was run.

run adds comment metadata if available for the column.

run gives the following rows (in that order):

  1. col_name

  2. data_type

  3. comment

If DescribeColumnCommand command was executed with EXTENDED or FORMATTED option, run gives the following additional rows (in that order):

  1. min

  2. max

  3. num_nulls

  4. distinct_count

  5. avg_col_len

  6. max_col_len

  7. histogram

run gives NULL for the value of the comment and statistics if not available.

histogramDescription Internal Method

histogramDescription…​FIXME

Note
histogramDescription is used exclusively when DescribeColumnCommand is executed with EXTENDED or FORMATTED option turned on.

Creating DescribeColumnCommand Instance

DescribeColumnCommand takes the following when created:

DataSourceV2Relation

admin阅读(1680)

DataSourceV2Relation Leaf Logical Operator

DataSourceV2Relation is…​FIXME

CreateViewCommand

admin阅读(1445)

CreateViewCommand Logical Command

CreateViewCommand is created to represent the following:

Caution
FIXME What’s the difference between CreateTempViewUsing?

CreateViewCommand works with different view types.

Table 1. CreateViewCommand Behaviour Per View Type
View Type Description / Side Effect

LocalTempView

A session-scoped local temporary view that is available until the session, that has created it, is stopped.

When executed, CreateViewCommand requests the current SessionCatalog to create a temporary view.

GlobalTempView

A cross-session global temporary view that is available until the Spark application stops.

When executed, CreateViewCommand requests the current SessionCatalog to create a global view.

PersistedView

A cross-session persisted view that is available until dropped.

When executed, CreateViewCommand checks if the table exists. If it does and replace is enabled CreateViewCommand requests the current SessionCatalog to alter a table. Otherwise, when the table does not exist, CreateViewCommand requests the current SessionCatalog to create it.

CreateViewCommand returns the child logical query plan when requested for the inner nodes (that should be shown as an inner nested tree of this node).

Creating CatalogTable — prepareTable Internal Method

prepareTable…​FIXME

Note
prepareTable is used exclusively when CreateViewCommand logical command is executed.

Executing Logical Command — run Method

Note
run is part of RunnableCommand Contract to execute (run) a logical command.

run requests the input SparkSession for the SessionState that is in turn requested to execute the child logical plan (which simply creates a QueryExecution).

Note

run uses a common idiom in Spark SQL to make sure that a logical plan can be analyzed, i.e.

run requests the input SparkSession for the SessionState that is in turn requested for the SessionCatalog.

run then branches off per the ViewType:

run throws an AnalysisException for persisted views when they already exist, the allowExisting flag is off and the table type is not a view.

run throws an AnalysisException for persisted views when they already exist and the allowExisting and replace flags are off.

run throws an AnalysisException if the userSpecifiedColumns are defined and their numbers is different from the number of output schema attributes of the analyzed logical plan.

Creating CreateViewCommand Instance

CreateViewCommand takes the following when created:

  • TableIdentifier

  • User-defined columns (as Seq[(String, Option[String])])

  • Optional comment

  • Properties (as Map[String, String])

  • Optional DDL statement

  • Child logical plan

  • allowExisting flag

  • replace flag

  • ViewType

verifyTemporaryObjectsNotExists Internal Method

verifyTemporaryObjectsNotExists…​FIXME

Note
verifyTemporaryObjectsNotExists is used exclusively when CreateViewCommand logical command is executed.

aliasPlan Internal Method

aliasPlan…​FIXME

Note
aliasPlan is used when CreateViewCommand logical command is executed (and prepareTable).

CreateTempViewUsing

admin阅读(1771)

CreateTempViewUsing Logical Command

CreateTempViewUsing is a logical command for creating or replacing a temporary view (global or not) using a data source.

CreateTempViewUsing is created to represent CREATE TEMPORARY VIEW … USING SQL statements.

Executing Logical Command — run Method

Note
run is part of RunnableCommand Contract to execute (run) a logical command.

run creates a DataSource and requests it to resolve itself (i.e. create a BaseRelation).

run then requests the input SparkSession to create a DataFrame from the BaseRelation that is used to get the analyzed logical plan (that is the view definition of the temporary table).

Depending on the global flag, run requests the SessionCatalog to createGlobalTempView (global flag is on) or createTempView (global flag is off).

run throws an AnalysisException when executed with hive provider.

Creating CreateTempViewUsing Instance

CreateTempViewUsing takes the following when created:

  • TableIdentifier

  • Optional user-defined schema (as StructType)

  • replace flag

  • global flag

  • Name of the data source provider

  • Options (as Map[String, String])

argString Method

Note
argString is part of the TreeNode Contract to…​FIXME.

argString…​FIXME

CreateTableCommand

admin阅读(2391)

CreateTableCommand Logical Command

CreateTableCommand is a logical command that FIXME.

Executing Logical Command — run Method

Note
run is part of RunnableCommand Contract to execute (run) a logical command.

run…​FIXME

CreateTable

admin阅读(2943)

CreateTable Logical Operator

CreateTable is a logical operator that represents (is created for) the following:

CreateTable requires that the table provider of the CatalogTable is defined or throws an AssertionError:

CreateTable can never be resolved and is replaced (resolved) with a logical command at analysis phase in the following rules:

Creating CreateTable Instance

CreateTable takes the following when created:

CreateTable initializes the internal registries and counters.

CreateHiveTableAsSelectCommand

admin阅读(1218)

CreateHiveTableAsSelectCommand Logical Command

CreateHiveTableAsSelectCommand is a logical command that FIXME.

Executing Logical Command — run Method

Note
run is part of RunnableCommand Contract to execute (run) a logical command.

run…​FIXME

CreateDataSourceTableCommand

admin阅读(1411)

CreateDataSourceTableCommand Logical Command

CreateDataSourceTableCommand is a logical command that creates a new table (in a session-scoped SessionCatalog).

CreateDataSourceTableCommand is created exclusively when DataSourceAnalysis posthoc logical resolution rule resolves a CreateTable logical operator for a non-Hive table provider with no query.

CreateDataSourceTableCommand takes a table metadata and ignoreIfExists flag.

Executing Logical Command — run Method

Note
run is part of RunnableCommand Contract to execute (run) a logical command.

run creates a new table in a session-scoped SessionCatalog.

Note
run uses the input SparkSession to access SessionState that in turn is used to access the current SessionCatalog.

Internally, run creates a BaseRelation to access the table’s schema.

Caution
FIXME
Note
run accepts tables only (not views) with the provider defined.

关注公众号:spark技术分享

联系我们联系我们