spark-sql-spark技术分享-第51页

SessionStateBuilder

2012-03-19admin阅读(3858)

SessionStateBuilder

SessionStateBuilder is…FIXME

BaseSessionStateBuilder — Generic Builder of SessionState

2012-03-18admin阅读(4843)

BaseSessionStateBuilder — Generic Builder of SessionState

BaseSessionStateBuilder is the contract of builder objects that coordinate construction of a new SessionState.

Table 1. BaseSessionStateBuilders
BaseSessionStateBuilder	Description
SessionStateBuilder
HiveSessionStateBuilder

BaseSessionStateBuilder is created when SparkSession is requested for a SessionState.



scala> :type spark
org.apache.spark.sql.SparkSession

scala> :type spark.sessionState
org.apache.spark.sql.internal.SessionState

scala> :type spark

org.apache.spark.sql.SparkSession

scala> :type spark.sessionState

org.apache.spark.sql.internal.SessionState

BaseSessionStateBuilder requires that implementations define newBuilder method that SparkSession uses (indirectly) when requested for the SessionState (per spark.sql.catalogImplementation internal configuration property).



newBuilder: (SparkSession, Option[SessionState]) => BaseSessionStateBuilder

newBuilder: (SparkSession, Option[SessionState]) => BaseSessionStateBuilder

Note	`BaseSessionStateBuilder` and spark.sql.catalogImplementation configuration property allow for Hive and non-Hive Spark deployments.

BaseSessionStateBuilder holds properties that (together with newBuilder) are used to create a SessionState.

Name Description

analyzer

Logical analyzer

catalog

SessionCatalog

Used to create Analyzer, Optimizer and a SessionState itself

Note	HiveSessionStateBuilder manages its own Hive-aware HiveSessionCatalog.

conf

SQLConf

customOperatorOptimizationRules

Custom operator optimization rules to add to the base Operator Optimization batch.

When requested for the custom rules, customOperatorOptimizationRules simply requests the SparkSessionExtensions to buildOptimizerRules.

experimentalMethods

ExperimentalMethods

extensions

SparkSessionExtensions

functionRegistry

FunctionRegistry

listenerManager

ExecutionListenerManager

optimizer

SparkOptimizer (that is downcast to the base Optimizer) that is created with the SessionCatalog and the ExperimentalMethods.

Note that the SparkOptimizer adds the customOperatorOptimizationRules to the operator optimization rules.

optimizer is used when BaseSessionStateBuilder is requested to create a SessionState (for the optimizerBuilder function to create an Optimizer when requested for the Optimizer).

planner

SparkPlanner

resourceLoader

SessionResourceLoader

sqlParser

ParserInterface

streamingQueryManager

Spark Structured Streaming’s StreamingQueryManager

udfRegistration

UDFRegistration

Note

BaseSessionStateBuilder defines a type alias NewBuilder for a function to create a BaseSessionStateBuilder.



type NewBuilder = (SparkSession, Option[SessionState]) => BaseSessionStateBuilder

type NewBuilder = (SparkSession, Option[SessionState]) => BaseSessionStateBuilder

Note	`BaseSessionStateBuilder` is an experimental and unstable API.

Creating Function to Build SessionState — `createClone` Method



createClone: (SparkSession, SessionState) => SessionState

createClone: (SparkSession, SessionState) => SessionState

createClone gives a function of SparkSession and SessionState that executes newBuilder followed by build.

Note	`createClone` is used exclusively when `BaseSessionStateBuilder` is requested for a SessionState

Creating SessionState Instance — `build` Method



build(): SessionState

build(): SessionState

build creates a SessionState with the following:

SharedState of SparkSession
SQLConf
ExperimentalMethods
FunctionRegistry
UDFRegistration
SessionCatalog
ParserInterface
Analyzer
Optimizer
SparkPlanner
StreamingQueryManager
ExecutionListenerManager
SessionResourceLoader
createQueryExecution
createClone

Note	`build` is used when: `SparkSession` is requested for the SessionState (and builds it using a class name per spark.sql.catalogImplementation configuration property `BaseSessionStateBuilder` is requested to create a clone of a `SessionState`

Creating BaseSessionStateBuilder Instance

BaseSessionStateBuilder takes the following when created:

SparkSession
Optional SessionState

Getting Function to Create QueryExecution For LogicalPlan — `createQueryExecution` Method



createQueryExecution: LogicalPlan => QueryExecution

createQueryExecution: LogicalPlan => QueryExecution

createQueryExecution simply returns a function that takes a LogicalPlan and creates a QueryExecution with the SparkSession and the logical plan.

Note	`createQueryExecution` is used exclusively when `BaseSessionStateBuilder` is requested to create a SessionState instance.

SessionState

2012-03-17admin阅读(5066)

SessionState — State Separation Layer Between SparkSessions

SessionState is the state separation layer between Spark SQL sessions, including SQL configuration, tables, functions, UDFs, SQL parser, and everything else that depends on a SQLConf.

SessionState is available as the sessionState property of a SparkSession.



scala> :type spark
org.apache.spark.sql.SparkSession

scala> :type spark.sessionState
org.apache.spark.sql.internal.SessionState

scala> :type spark

org.apache.spark.sql.SparkSession

scala> :type spark.sessionState

org.apache.spark.sql.internal.SessionState

SessionState is created when SparkSession is requested to instantiateSessionState (when requested for the SessionState per spark.sql.catalogImplementation configuration property).

Figure 1. Creating SessionState

Note

When requested for the SessionState, SparkSession uses spark.sql.catalogImplementation configuration property to load and create a BaseSessionStateBuilder that is then requested to create a SessionState instance.

There are two BaseSessionStateBuilders available:

(default) SessionStateBuilder for in-memory catalog
HiveSessionStateBuilder for hive catalog

hive catalog is set when the SparkSession was created with the Hive support enabled (using Builder.enableHiveSupport).

Table 1. SessionState’s (Lazily-Initialized) Attributes
Name	Type	Description
`analyzer`	Analyzer	Spark Analyzer Initialized lazily (i.e. only when requested the first time) using the analyzerBuilder factory function. Used when…FIXME
`catalog`	SessionCatalog	Metastore of tables and databases Used when…FIXME
`conf`	SQLConf	FIXME Used when…FIXME
`experimentalMethods`	ExperimentalMethods	FIXME Used when…FIXME
`functionRegistry`	FunctionRegistry	FIXME Used when…FIXME
`functionResourceLoader`	`FunctionResourceLoader`	FIXME Used when…FIXME
`listenerManager`	ExecutionListenerManager	FIXME Used when…FIXME
`optimizer`	Optimizer	Logical query plan optimizer Used exclusively when `QueryExecution` creates an optimized logical plan.
`resourceLoader`	`SessionResourceLoader`	FIXME Used when…FIXME
`sqlParser`	ParserInterface	FIXME Used when…FIXME
`streamingQueryManager`	`StreamingQueryManager`	Used to manage streaming queries in Spark Structured Streaming
`udfRegistration`	UDFRegistration	Interface to register user-defined functions. Used when…FIXME

Note	`SessionState` is a `private[sql]` class and, given the package `org.apache.spark.sql.internal`, `SessionState` should be considered internal.

Creating SessionState Instance

SessionState takes the following when created:

SharedState
SQLConf
ExperimentalMethods
FunctionRegistry
UDFRegistration
catalogBuilder function to create a SessionCatalog (i.e. () ⇒ SessionCatalog)
ParserInterface
analyzerBuilder function to create an Analyzer (i.e. () ⇒ Analyzer)
optimizerBuilder function to create an Optimizer (i.e. () ⇒ Optimizer)
SparkPlanner
Spark Structured Streaming’s StreamingQueryManager
ExecutionListenerManager
resourceLoaderBuilder function to create a SessionResourceLoader (i.e. () ⇒ SessionResourceLoader)
createQueryExecution function to create a QueryExecution given a LogicalPlan (i.e. LogicalPlan ⇒ QueryExecution)
createClone function to clone the SessionState given a SparkSession (i.e. (SparkSession, SessionState) ⇒ SessionState)

`apply` Factory Methods

Caution

FIXME



apply(sparkSession: SparkSession): SessionState (1)
apply(sparkSession: SparkSession, sqlConf: SQLConf): SessionState

apply(sparkSession: SparkSession): SessionState (1)

apply(sparkSession: SparkSession, sqlConf: SQLConf): SessionState

Passes sparkSession to the other apply with a new SQLConf

Note	`apply` is used when `SparkSession` is requested for `SessionState`.

`clone` Method

Caution

FIXME

Note	`clone` is used when…

`createAnalyzer` Internal Method



createAnalyzer(
  sparkSession: SparkSession,
  catalog: SessionCatalog,
  sqlConf: SQLConf): Analyzer

createAnalyzer(

sparkSession: SparkSession,

catalog: SessionCatalog,

sqlConf: SQLConf): Analyzer

createAnalyzer creates a logical query plan Analyzer with rules specific to a non-Hive SessionState.

Table 2. Analyzer’s Evaluation Rules for non-Hive SessionState (in the order of execution)
Method	Rules	Description
extendedResolutionRules	FindDataSourceTable	Replaces InsertIntoTable (with `CatalogRelation`) and `CatalogRelation` logical plans with LogicalRelation.
extendedResolutionRules	ResolveSQLOnFile
postHocResolutionRules	PreprocessTableCreation
	PreprocessTableInsertion
	DataSourceAnalysis
extendedCheckRules	PreWriteCheck
extendedCheckRules	HiveOnlyCheck

Note	`createAnalyzer` is used when `SessionState` is created or cloned.

“Executing” Logical Plan (Creating QueryExecution For LogicalPlan) — `executePlan` Method



executePlan(plan: LogicalPlan): QueryExecution

executePlan(plan: LogicalPlan): QueryExecution

executePlan simply executes the createQueryExecution function on the input logical plan (that simply creates a QueryExecution with the current SparkSession and the input logical plan).

`refreshTable` Method

refreshTable is…

`addJar` Method

addJar is…

`analyze` Method

analyze is…

Creating New Hadoop Configuration — `newHadoopConf` Method



newHadoopConf(): Configuration
newHadoopConf(hadoopConf: Configuration, sqlConf: SQLConf): Configuration

newHadoopConf(): Configuration

newHadoopConf(hadoopConf: Configuration, sqlConf: SQLConf): Configuration

newHadoopConf returns a Hadoop Configuration (with the SparkContext.hadoopConfiguration and all the configuration properties of the SQLConf).

Note	`newHadoopConf` is used by `ScriptTransformation`, `ParquetRelation`, `StateStoreRDD`, and `SessionState` itself, and few other places.

Creating New Hadoop Configuration With Extra Options — `newHadoopConfWithOptions` Method



newHadoopConfWithOptions(options: Map[String, String]): Configuration

newHadoopConfWithOptions(options: Map[String, String]): Configuration

newHadoopConfWithOptions creates a new Hadoop Configuration with the input options set (except path and paths options that are skipped).

Note	`newHadoopConfWithOptions` is used when: `TextBasedFileFormat` is requested to say whether it is splitable or not `FileSourceScanExec` is requested for the input RDD `InsertIntoHadoopFsRelationCommand` is requested to run `PartitioningAwareFileIndex` is requested for the Hadoop Configuration

HiveMetastoreCatalog — Legacy SessionCatalog for Converting Hive Metastore Relations to Data Source Relations

2012-03-16admin阅读(3966)

HiveMetastoreCatalog — Legacy SessionCatalog for Converting Hive Metastore Relations to Data Source Relations

HiveMetastoreCatalog is a legacy session-scoped catalog of relational entities that HiveSessionCatalog uses exclusively for converting Hive metastore relations to data source relations (when RelationConversions logical evaluation rule is executed).

HiveMetastoreCatalog is created exclusively when HiveSessionStateBuilder is requested for SessionCatalog (and creates a HiveSessionCatalog).

Figure 1. HiveMetastoreCatalog, HiveSessionCatalog and HiveSessionStateBuilder

HiveMetastoreCatalog takes a SparkSession when created.

Converting HiveTableRelation to LogicalRelation — `convertToLogicalRelation` Method



convertToLogicalRelation(
  relation: HiveTableRelation,
  options: Map[String, String],
  fileFormatClass: Class[_ <: FileFormat],
  fileType: String): LogicalRelation

convertToLogicalRelation(

relation: HiveTableRelation,

options: Map[String, String],

fileFormatClass: Class[_ <: FileFormat],

fileType: String): LogicalRelation

convertToLogicalRelation…FIXME

Note	`convertToLogicalRelation` is used exclusively when `RelationConversions` logical evaluation rule is requested to convert a HiveTableRelation to a LogicalRelation for `parquet`, `native` and `hive` ORC storage formats.

`inferIfNeeded` Internal Method



inferIfNeeded(
  relation: HiveTableRelation,
  options: Map[String, String],
  fileFormat: FileFormat,
  fileIndexOpt: Option[FileIndex] = None): CatalogTable

inferIfNeeded(

relation: HiveTableRelation,

options: Map[String, String],

fileFormat: FileFormat,

fileIndexOpt: Option[FileIndex] = None): CatalogTable

inferIfNeeded…FIXME

Note	`inferIfNeeded` is used exclusively when `HiveMetastoreCatalog` is requested to convertToLogicalRelation.

HiveSessionCatalog — Hive-Specific Catalog of Relational Entities

2012-03-15admin阅读(1850)

HiveSessionCatalog — Hive-Specific Catalog of Relational Entities

HiveSessionCatalog is a session-scoped catalog of relational entities that is used when SparkSession was created with Hive support enabled.

Figure 1. HiveSessionCatalog and HiveSessionStateBuilder

HiveSessionCatalog is available as catalog property of SessionState when SparkSession was created with Hive support enabled (that in the end sets spark.sql.catalogImplementation internal configuration property to hive).



import org.apache.spark.sql.internal.StaticSQLConf
val catalogType = spark.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION.key)
scala> println(catalogType)
hive

// You could also use the property key by name
scala> spark.conf.get("spark.sql.catalogImplementation")
res1: String = hive

// Since Hive is enabled HiveSessionCatalog is the implementation
scala> spark.sessionState.catalog
res2: org.apache.spark.sql.catalyst.catalog.SessionCatalog = org.apache.spark.sql.hive.HiveSessionCatalog@1ae3d0a8

import org.apache.spark.sql.internal.StaticSQLConf

val catalogType = spark.conf.get(StaticSQLConf.CATALOG_IMPLEMENTATION.key)

scala> println(catalogType)

hive

// You could also use the property key by name

scala> spark.conf.get("spark.sql.catalogImplementation")

res1: String = hive

// Since Hive is enabled HiveSessionCatalog is the implementation

scala> spark.sessionState.catalog

res2: org.apache.spark.sql.catalyst.catalog.SessionCatalog = org.apache.spark.sql.hive.HiveSessionCatalog@1ae3d0a8

HiveSessionCatalog is created exclusively when HiveSessionStateBuilder is requested for the SessionCatalog.

HiveSessionCatalog uses the legacy HiveMetastoreCatalog (which is another session-scoped catalog of relational entities) exclusively to allow RelationConversions logical evaluation rule to convert Hive metastore relations to data source relations when executed.

Creating HiveSessionCatalog Instance

HiveSessionCatalog takes the following when created:

`lookupFunction0` Internal Method



lookupFunction0(name: FunctionIdentifier, children: Seq[Expression]): Expression

lookupFunction0(name: FunctionIdentifier, children: Seq[Expression]): Expression

lookupFunction0…FIXME

Note	`lookupFunction0` is used when…FIXME

BucketSpec — Bucketing Specification of Table

2012-03-14admin阅读(1807)

BucketSpec — Bucketing Specification of Table

BucketSpec is the bucketing specification of a table, i.e. the metadata of the bucketing of a table.

BucketSpec includes the following:

Number of buckets
Bucket column names – the names of the columns used for buckets (at least one)
Sort column names – the names of the columns used to sort data in buckets

The number of buckets has to be between 0 and 100000 exclusive (or an AnalysisException is thrown).

BucketSpec is created when:

DataFrameWriter is requested to saveAsTable (and does getBucketSpec)
HiveExternalCatalog is requested to getBucketSpecFromTableProperties and tableMetaToTableProps
HiveClientImpl is requested to retrieve a table metadata
SparkSqlAstBuilder is requested to visitBucketSpec (for CREATE TABLE SQL statement with CLUSTERED BY and INTO n BUCKETS with optional SORTED BY clauses)

BucketSpec uses the following text representation (i.e. toString):



[numBuckets] buckets, bucket columns: [[bucketColumnNames]], sort columns: [[sortColumnNames]]

[numBuckets] buckets, bucket columns: [[bucketColumnNames]], sort columns: [[sortColumnNames]]



import org.apache.spark.sql.catalyst.catalog.BucketSpec
val bucketSpec = BucketSpec(
  numBuckets = 8,
  bucketColumnNames = Seq("col1"),
  sortColumnNames = Seq("col2"))
scala> println(bucketSpec)
8 buckets, bucket columns: [col1], sort columns: [col2]

import org.apache.spark.sql.catalyst.catalog.BucketSpec

val bucketSpec = BucketSpec(

numBuckets = 8,

bucketColumnNames = Seq("col1"),

sortColumnNames = Seq("col2"))

scala> println(bucketSpec)

8 buckets, bucket columns: [col1], sort columns: [col2]

Converting Bucketing Specification to LinkedHashMap — `toLinkedHashMap` Method



toLinkedHashMap: mutable.LinkedHashMap[String, String]

toLinkedHashMap: mutable.LinkedHashMap[String, String]

toLinkedHashMap converts the bucketing specification to a collection of pairs (LinkedHashMap[String, String]) with the following fields and their values:

Num Buckets with the numBuckets
Bucket Columns with the bucketColumnNames
Sort Columns with the sortColumnNames

toLinkedHashMap quotes the column names.



scala> println(bucketSpec.toLinkedHashMap)
Map(Num Buckets -> 8, Bucket Columns -> [`col1`], Sort Columns -> [`col2`])

scala> println(bucketSpec.toLinkedHashMap)

Map(Num Buckets -> 8, Bucket Columns -> [`col1`], Sort Columns -> [`col2`])

Note	`toLinkedHashMap` is used when: `CatalogTable` is requested for toLinkedHashMap `DescribeTableCommand` logical command is executed with a non-empty partitionSpec and the isExtended flag on (that uses describeFormattedDetailedPartitionInfo).

CatalogTablePartition — Partition Specification of Table

2012-03-13admin阅读(1695)

CatalogTablePartition — Partition Specification of Table

CatalogTablePartition is the partition specification of a table, i.e. the metadata of the partitions of a table.

CatalogTablePartition is created when:

HiveClientImpl is requested to retrieve a table partition metadata
AlterTableAddPartitionCommand and AlterTableRecoverPartitionsCommand logical commands are executed

CatalogTablePartition can hold the table statistics that…FIXME

The readable text representation of a CatalogTablePartition (aka simpleString) is…FIXME

Note	`simpleString` is used exclusively when `ShowTablesCommand` is executed (with a partition specification).

CatalogTablePartition uses the following text representation (i.e. toString)…FIXME

Creating CatalogTablePartition Instance

CatalogTablePartition takes the following when created:

Partition specification
CatalogStorageFormat
Parameters (default: an empty collection)
Table statistics (default: None)

Converting Partition Specification to LinkedHashMap — `toLinkedHashMap` Method



toLinkedHashMap: mutable.LinkedHashMap[String, String]

toLinkedHashMap: mutable.LinkedHashMap[String, String]

toLinkedHashMap converts the partition specification to a collection of pairs (LinkedHashMap[String, String]) with the following fields and their values:

Partition Values with the spec
Storage specification (of the given CatalogStorageFormat)
Partition Parameters with the parameters (if not empty)
Partition Statistics with the CatalogStatistics (if available)

Note	`toLinkedHashMap` is used when: `DescribeTableCommand` logical command is executed (with the isExtended flag on and a non-empty partitionSpec). `CatalogTablePartition` is requested for either a simple or a catalog text representation

`location` Method



location: URI

location: URI

location simply returns the location URI of the CatalogStorageFormat or throws an AnalysisException:



Partition [[specString]] did not specify locationUri

Partition [[specString]] did not specify locationUri

Note	`location` is used when…FIXME

CatalogStorageFormat — Storage Specification of Table or Partition

2012-03-12admin阅读(2344)

CatalogStorageFormat — Storage Specification of Table or Partition

CatalogStorageFormat is the storage specification of a partition or a table, i.e. the metadata that includes the following:

Location URI
Input format
Output format
SerDe
compressed flag
Properties (as Map[String, String])

CatalogStorageFormat is created when:

HiveClientImpl is requested for metadata of a table or table partition
SparkSqlAstBuilder is requested to parse Hive-specific CREATE TABLE or INSERT OVERWRITE DIRECTORY SQL statements

CatalogStorageFormat uses the following text representation (i.e. toString)…FIXME

Converting Storage Specification to LinkedHashMap — `toLinkedHashMap` Method



toLinkedHashMap: mutable.LinkedHashMap[String, String]

toLinkedHashMap: mutable.LinkedHashMap[String, String]

toLinkedHashMap…FIXME

Note	`toLinkedHashMap` is used when: `CatalogStorageFormat` is requested for a text representation `CatalogTablePartition` is requested for toLinkedHashMap `CatalogTable` is requested for toLinkedHashMap `DescribeTableCommand` is requested to run

CatalogTable — Table Specification (Native Table Metadata)

2012-03-11admin阅读(2083)

CatalogTable — Table Specification (Native Table Metadata)

CatalogTable is the table specification, i.e. the metadata of a table that is stored in a session-scoped catalog of relational entities (i.e. SessionCatalog).



scala> :type spark.sessionState.catalog
org.apache.spark.sql.catalyst.catalog.SessionCatalog

// Using high-level user-friendly catalog interface
scala> spark.catalog.listTables.filter($"name" === "t1").show
+----+--------+-----------+---------+-----------+
|name|database|description|tableType|isTemporary|
+----+--------+-----------+---------+-----------+
|  t1| default|       null|  MANAGED|      false|
+----+--------+-----------+---------+-----------+

// Using low-level internal SessionCatalog interface to access CatalogTables
val t1Tid = spark.sessionState.sqlParser.parseTableIdentifier("t1")
val t1Metadata = spark.sessionState.catalog.getTempViewOrPermanentTableMetadata(t1Tid)
scala> :type t1Metadata
org.apache.spark.sql.catalyst.catalog.CatalogTable

scala> :type spark.sessionState.catalog

org.apache.spark.sql.catalyst.catalog.SessionCatalog

// Using high-level user-friendly catalog interface

scala> spark.catalog.listTables.filter($"name" === "t1").show

+----+--------+-----------+---------+-----------+

+----+--------+-----------+---------+-----------+

+----+--------+-----------+---------+-----------+

// Using low-level internal SessionCatalog interface to access CatalogTables

val t1Tid = spark.sessionState.sqlParser.parseTableIdentifier("t1")

val t1Metadata = spark.sessionState.catalog.getTempViewOrPermanentTableMetadata(t1Tid)

scala> :type t1Metadata

org.apache.spark.sql.catalyst.catalog.CatalogTable

CatalogTable is created when:

SessionCatalog is requested for a table metadata
HiveClientImpl is requested for looking up a table in a metastore
DataFrameWriter is requested to create a table
InsertIntoHiveDirCommand logical command is executed
SparkSqlAstBuilder does visitCreateTable and visitCreateHiveTable
CreateTableLikeCommand logical command is executed
CreateViewCommand logical command is executed (and prepareTable)
CatalogImpl is requested to createTable

The readable text representation of a CatalogTable (aka simpleString) is…FIXME

Note	`simpleString` is used exclusively when `ShowTablesCommand` logical command is executed (with a partition specification).

CatalogTable uses the following text representation (i.e. toString)…FIXME

CatalogTable is created with the optional bucketing specification that is used for the following:

CatalogImpl is requested to list the columns of a table
FindDataSourceTable logical evaluation rule is requested to readDataSourceTable (when executed for data source tables)
CreateTableLikeCommand logical command is executed
DescribeTableCommand logical command is requested to describe detailed partition and storage information (when executed)
ShowCreateTableCommand logical command is executed
CreateDataSourceTableCommand and CreateDataSourceTableAsSelectCommand logical commands are executed
CatalogTable is requested to convert itself to LinkedHashMap
HiveExternalCatalog is requested to doCreateTable, tableMetaToTableProps, doAlterTable, restoreHiveSerdeTable and restoreDataSourceTable
HiveClientImpl is requested to retrieve a table metadata if available and toHiveTable
InsertIntoHiveTable logical command is processInsert (when executed)
DataFrameWriter is requested to create a table (via saveAsTable)
SparkSqlAstBuilder is requested to visitCreateTable and visitCreateHiveTable

Table Statistics for Query Planning (Auto Broadcast Joins and Cost-Based Optimization)

You manage a table metadata using the catalog interface (aka metastore). Among the management tasks is to get the statistics of a table (that are used for cost-based query optimization).



scala> t1Metadata.stats.foreach(println)
CatalogStatistics(714,Some(2),Map(p1 -> ColumnStat(2,Some(0),Some(1),0,4,4,None), id -> ColumnStat(2,Some(0),Some(1),0,4,4,None)))

scala> t1Metadata.stats.map(_.simpleString).foreach(println)
714 bytes, 2 rows

scala> t1Metadata.stats.foreach(println)

CatalogStatistics(714,Some(2),Map(p1 -> ColumnStat(2,Some(0),Some(1),0,4,4,None), id -> ColumnStat(2,Some(0),Some(1),0,4,4,None)))

scala> t1Metadata.stats.map(_.simpleString).foreach(println)

714 bytes, 2 rows

Note	The CatalogStatistics are optional when `CatalogTable` is created.

Caution

FIXME When are stats specified? What if there are not?

Unless CatalogStatistics are available in a table metadata (in a catalog) for a non-streaming file data source table, DataSource creates a HadoopFsRelation with the table size specified by spark.sql.defaultSizeInBytes internal property (default: Long.MaxValue) for query planning of joins (and possibly to auto broadcast the table).

Internally, Spark alters table statistics using ExternalCatalog.doAlterTableStats.

Unless CatalogStatistics are available in a table metadata (in a catalog) for HiveTableRelation (and hive provider) DetermineTableStats logical resolution rule can compute the table size using HDFS (if spark.sql.statistics.fallBackToHdfs property is turned on) or assume spark.sql.defaultSizeInBytes (that effectively disables table broadcasting).

When requested to look up a table in a metastore, HiveClientImpl reads table or partition statistics directly from a Hive metastore.

You can use AnalyzeColumnCommand, AnalyzePartitionCommand, AnalyzeTableCommand commands to record statistics in a catalog.

The table statistics can be automatically updated (after executing commands like AlterTableAddPartitionCommand) when spark.sql.statistics.size.autoUpdate.enabled property is turned on.

You can use DESCRIBE SQL command to show the histogram of a column if stored in a catalog.

`dataSchema` Method



dataSchema: StructType

dataSchema: StructType

dataSchema…FIXME

Note	`dataSchema` is used when…FIXME

`partitionSchema` Method



partitionSchema: StructType

partitionSchema: StructType

partitionSchema…FIXME

Note	`partitionSchema` is used when…FIXME

Converting Table Specification to LinkedHashMap — `toLinkedHashMap` Method



toLinkedHashMap: mutable.LinkedHashMap[String, String]

toLinkedHashMap: mutable.LinkedHashMap[String, String]

toLinkedHashMap converts the table specification to a collection of pairs (LinkedHashMap[String, String]) with the following fields and their values:

Database with the database of the TableIdentifier
Table with the table of the TableIdentifier
Owner with the owner (if defined)
Created Time with the createTime
Created By with Spark and the createVersion
Type with the name of the CatalogTableType
Provider with the provider (if defined)
Bucket specification (of the BucketSpec if defined)
Comment with the comment (if defined)
View Text, View Default Database and View Query Output Columns for VIEW table type
Table Properties with the tableProperties (if not empty)
Statistics with the CatalogStatistics (if defined)
Storage specification (of the CatalogStorageFormat if defined)
Partition Provider with Catalog if the tracksPartitionsInCatalog flag is on
Partition Columns with the partitionColumns (if not empty)
Schema with the schema (if not empty)

Note	`toLinkedHashMap` is used when: `DescribeTableCommand` is requested to describeFormattedTableInfo (when `DescribeTableCommand` is requested to run for a non-temporary table and the isExtended flag on) `CatalogTable` is requested for either a simple or a catalog text representation

Creating CatalogTable Instance

CatalogTable takes the following when created:

TableIdentifier
CatalogTableType (i.e. EXTERNAL, MANAGED or VIEW)
CatalogStorageFormat
Schema
Name of the table provider (optional)
Partition column names
Optional Bucketing specification (default: None)
Owner
Create time
Last access time
Create version
Properties
Optional table statistics
Optional view text
Optional comment
Unsupported features
tracksPartitionsInCatalog flag
schemaPreservesCase flag
Ignored properties

`database` Method



database: String

database: String

database simply returns the database (of the TableIdentifier) or throws an AnalysisException:



table [identifier] did not specify database

table [identifier] did not specify database

Note	`database` is used when…FIXME

SessionCatalog — Session-Scoped Catalog of Relational Entities

2012-03-10admin阅读(1808)

SessionCatalog — Session-Scoped Catalog of Relational Entities

SessionCatalog is the catalog (registry) of relational entities, i.e. databases, tables, views, partitions, and functions (in a SparkSession).

Figure 1. SessionCatalog and Spark SQL Services

SessionCatalog uses the ExternalCatalog for the metadata of permanent entities (i.e. tables).

Note	`SessionCatalog` is a layer over ExternalCatalog in a SparkSession which allows for different metastores (i.e. `in-memory` or hive) to be used.

SessionCatalog is available through SessionState (of a SparkSession).



scala> :type spark
org.apache.spark.sql.SparkSession

scala> :type spark.sessionState.catalog
org.apache.spark.sql.catalyst.catalog.SessionCatalog

scala> :type spark

org.apache.spark.sql.SparkSession

scala> :type spark.sessionState.catalog

org.apache.spark.sql.catalyst.catalog.SessionCatalog

SessionCatalog is created when BaseSessionStateBuilder is requested for the SessionCatalog (when SessionState is requested for it).

Amongst the notable usages of SessionCatalog is to create an Analyzer or a SparkOptimizer.

Table 1. SessionCatalog’s Internal Properties (e.g. Registries, Counters and Flags)
Name	Description
`currentDb`	FIXME Used when…FIXME
`tableRelationCache`	A cache of fully-qualified table names to table relation plans (i.e. `LogicalPlan`). Used when `SessionCatalog` refreshes a table
`tempViews`	Registry of temporary views (i.e. non-global temporary tables)

`requireTableExists` Internal Method



requireTableExists(name: TableIdentifier): Unit

requireTableExists(name: TableIdentifier): Unit

requireTableExists…FIXME

Note	`requireTableExists` is used when…FIXME

`databaseExists` Method



databaseExists(db: String): Boolean

databaseExists(db: String): Boolean

databaseExists…FIXME

Note	`databaseExists` is used when…FIXME

`listTables` Method



listTables(db: String): Seq[TableIdentifier] (1)
listTables(db: String, pattern: String): Seq[TableIdentifier]

listTables(db: String): Seq[TableIdentifier] (1)

listTables(db: String, pattern: String): Seq[TableIdentifier]

Uses "*" as the pattern

listTables…FIXME

Note	`listTables` is used when: `ShowTablesCommand` logical command is requested to run `SessionCatalog` is requested to reset (for testing) `CatalogImpl` is requested to listTables (for testing)

Checking Whether Table Is Temporary View — `isTemporaryTable` Method



isTemporaryTable(name: TableIdentifier): Boolean

isTemporaryTable(name: TableIdentifier): Boolean

isTemporaryTable…FIXME

Note	`isTemporaryTable` is used when…FIXME

`alterPartitions` Method



alterPartitions(tableName: TableIdentifier, parts: Seq[CatalogTablePartition]): Unit

alterPartitions(tableName: TableIdentifier, parts: Seq[CatalogTablePartition]): Unit

alterPartitions…FIXME

Note	`alterPartitions` is used when…FIXME

`listPartitions` Method



listPartitions(
  tableName: TableIdentifier,
  partialSpec: Option[TablePartitionSpec] = None): Seq[CatalogTablePartition]

listPartitions(

tableName: TableIdentifier,

partialSpec: Option[TablePartitionSpec] = None): Seq[CatalogTablePartition]

listPartitions…FIXME

Note	`listPartitions` is used when…FIXME

`alterTable` Method



alterTable(tableDefinition: CatalogTable): Unit

alterTable(tableDefinition: CatalogTable): Unit

alterTable…FIXME

Note

alterTable is used when AlterTableSetPropertiesCommand, AlterTableUnsetPropertiesCommand, AlterTableChangeColumnCommand, AlterTableSerDePropertiesCommand, AlterTableRecoverPartitionsCommand, AlterTableSetLocationCommand, AlterViewAsCommand (for permanent views) logical commands are executed.

Altering Table Statistics in Metastore (and Invalidating Internal Cache) — `alterTableStats` Method



alterTableStats(identifier: TableIdentifier, newStats: Option[CatalogStatistics]): Unit

alterTableStats(identifier: TableIdentifier, newStats: Option[CatalogStatistics]): Unit

alterTableStats requests ExternalCatalog to alter the statistics of the table (per identifier) followed by invalidating the table relation cache.

alterTableStats reports a NoSuchDatabaseException if the database does not exist.

alterTableStats reports a NoSuchTableException if the table does not exist.

Note

alterTableStats is used when the following logical commands are executed:

AnalyzeTableCommand, AnalyzeColumnCommand, AlterTableAddPartitionCommand, TruncateTableCommand
(indirectly through CommandUtils when requested for updating existing table statistics) InsertIntoHiveTable, InsertIntoHadoopFsRelationCommand, AlterTableDropPartitionCommand, AlterTableSetLocationCommand and LoadDataCommand

`tableExists` Method



tableExists(name: TableIdentifier): Boolean

tableExists(name: TableIdentifier): Boolean

tableExists…FIXME

Note	`tableExists` is used when…FIXME

`functionExists` Method



functionExists(name: FunctionIdentifier): Boolean

functionExists(name: FunctionIdentifier): Boolean

functionExists…FIXME

Note	`functionExists` is used in: LookupFunctions logical rule (to make sure that UnresolvedFunction can be resolved, i.e. is registered with `SessionCatalog`) `CatalogImpl` to check if a function exists in a database …

`listFunctions` Method



listFunctions(db: String): Seq[(FunctionIdentifier, String)]
listFunctions(db: String, pattern: String): Seq[(FunctionIdentifier, String)]

listFunctions(db: String): Seq[(FunctionIdentifier, String)]

listFunctions(db: String, pattern: String): Seq[(FunctionIdentifier, String)]

listFunctions…FIXME

Note	`listFunctions` is used when…FIXME

Invalidating Table Relation Cache (aka Refreshing Table) — `refreshTable` Method



refreshTable(name: TableIdentifier): Unit

refreshTable(name: TableIdentifier): Unit

refreshTable…FIXME

Note	`refreshTable` is used when…FIXME

`loadFunctionResources` Method



loadFunctionResources(resources: Seq[FunctionResource]): Unit

loadFunctionResources(resources: Seq[FunctionResource]): Unit

loadFunctionResources…FIXME

Note	`loadFunctionResources` is used when…FIXME

Altering (Updating) Temporary View (Logical Plan) — `alterTempViewDefinition` Method



alterTempViewDefinition(name: TableIdentifier, viewDefinition: LogicalPlan): Boolean

alterTempViewDefinition(name: TableIdentifier, viewDefinition: LogicalPlan): Boolean

alterTempViewDefinition alters the temporary view by updating an in-memory temporary table (when a database is not specified and the table has already been registered) or a global temporary table (when a database is specified and it is for global temporary tables).

Note	“Temporary table” and “temporary view” are synonyms.

alterTempViewDefinition returns true when an update could be executed and finished successfully.

Note	`alterTempViewDefinition` is used exclusively when `AlterViewAsCommand` logical command is executed.

Creating (Registering) Or Replacing Local Temporary View — `createTempView` Method



createTempView(
  name: String,
  tableDefinition: LogicalPlan,
  overrideIfExists: Boolean): Unit

createTempView(

tableDefinition: LogicalPlan,

overrideIfExists: Boolean): Unit

createTempView…FIXME

Note	`createTempView` is used when…FIXME

Creating (Registering) Or Replacing Global Temporary View — `createGlobalTempView` Method



createGlobalTempView(
  name: String,
  viewDefinition: LogicalPlan,
  overrideIfExists: Boolean): Unit

createGlobalTempView(

viewDefinition: LogicalPlan,

overrideIfExists: Boolean): Unit

createGlobalTempView simply requests the GlobalTempViewManager to create a global temporary view.

Note	`createGlobalTempView` is used when: `CreateViewCommand` logical command is executed (for a global temporary view, i.e. when the view type is GlobalTempView) `CreateTempViewUsing` logical command is executed (for a global temporary view, i.e. when the global flag is on)

`createTable` Method



createTable(tableDefinition: CatalogTable, ignoreIfExists: Boolean): Unit

createTable(tableDefinition: CatalogTable, ignoreIfExists: Boolean): Unit

createTable…FIXME

Note	`createTable` is used when…FIXME

Creating SessionCatalog Instance

SessionCatalog takes the following when created:

ExternalCatalog
GlobalTempViewManager
FunctionResourceLoader
FunctionRegistry
CatalystConf
Hadoop’s Configuration
ParserInterface

SessionCatalog initializes the internal registries and counters.

Finding Function by Name (Using FunctionRegistry) — `lookupFunction` Method



lookupFunction(
  name: FunctionIdentifier,
  children: Seq[Expression]): Expression

lookupFunction(

children: Seq[Expression]): Expression

lookupFunction finds a function by name.

For a function with no database defined that exists in FunctionRegistry, lookupFunction requests FunctionRegistry to find the function (by its unqualified name, i.e. with no database).

If the name function has the database defined or does not exist in FunctionRegistry, lookupFunction uses the fully-qualified function name to check if the function exists in FunctionRegistry (by its fully-qualified name, i.e. with a database).

For other cases, lookupFunction requests ExternalCatalog to find the function and loads its resources. It then creates a corresponding temporary function and looks up the function again.

Note	`lookupFunction` is used when: `ResolveFunctions` logical resolution rule is executed (and resolves UnresolvedGenerator or UnresolvedFunction expressions) `HiveSessionCatalog` is requested to lookupFunction0

Finding Relation (Table or View) in Catalogs — `lookupRelation` Method



lookupRelation(name: TableIdentifier): LogicalPlan

lookupRelation(name: TableIdentifier): LogicalPlan

lookupRelation finds the name table in the catalogs (i.e. GlobalTempViewManager, ExternalCatalog or registry of temporary views) and gives a SubqueryAlias per table type.



scala> :type spark.sessionState.catalog
org.apache.spark.sql.catalyst.catalog.SessionCatalog

import spark.sessionState.{catalog => c}
import org.apache.spark.sql.catalyst.TableIdentifier

// Global temp view
val db = spark.sharedState.globalTempViewManager.database
// Make the example reproducible (and so "replace")
spark.range(1).createOrReplaceGlobalTempView("gv1")
val gv1 = TableIdentifier(table = "gv1", database = Some(db))
val plan = c.lookupRelation(gv1)
scala> println(plan.numberedTreeString)
00 SubqueryAlias gv1
01 +- Range (0, 1, step=1, splits=Some(8))

val metastore = spark.sharedState.externalCatalog

// Regular table
val db = spark.catalog.currentDatabase
metastore.dropTable(db, table = "t1", ignoreIfNotExists = true, purge = true)
sql("CREATE TABLE t1 (id LONG) USING parquet")
val t1 = TableIdentifier(table = "t1", database = Some(db))
val plan = c.lookupRelation(t1)
scala> println(plan.numberedTreeString)
00 'SubqueryAlias t1
01 +- 'UnresolvedCatalogRelation `default`.`t1`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe

// Regular view (not temporary view!)
// Make the example reproducible
metastore.dropTable(db, table = "v1", ignoreIfNotExists = true, purge = true)
import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable, CatalogTableType}
val v1 = TableIdentifier(table = "v1", database = Some(db))
import org.apache.spark.sql.types.StructType
val schema = new StructType().add($"id".long)
val storage = CatalogStorageFormat(locationUri = None, inputFormat = None, outputFormat = None, serde = None, compressed = false, properties = Map())
val tableDef = CatalogTable(
  identifier = v1,
  tableType = CatalogTableType.VIEW,
  storage,
  schema,
  viewText = Some("SELECT 1") /** Required or RuntimeException reported */)
metastore.createTable(tableDef, ignoreIfExists = false)
val plan = c.lookupRelation(v1)
scala> println(plan.numberedTreeString)
00 'SubqueryAlias v1
01 +- View (`default`.`v1`, [id#77L])
02    +- 'Project [unresolvedalias(1, None)]
03       +- OneRowRelation

// Temporary view
spark.range(1).createOrReplaceTempView("v2")
val v2 = TableIdentifier(table = "v2", database = None)
val plan = c.lookupRelation(v2)
scala> println(plan.numberedTreeString)
00 SubqueryAlias v2
01 +- Range (0, 1, step=1, splits=Some(8))

scala> :type spark.sessionState.catalog

org.apache.spark.sql.catalyst.catalog.SessionCatalog

import spark.sessionState.{catalog => c}

import org.apache.spark.sql.catalyst.TableIdentifier

// Global temp view

val db = spark.sharedState.globalTempViewManager.database

// Make the example reproducible (and so "replace")

spark.range(1).createOrReplaceGlobalTempView("gv1")

val gv1 = TableIdentifier(table = "gv1", database = Some(db))

val plan = c.lookupRelation(gv1)

scala> println(plan.numberedTreeString)

00 SubqueryAlias gv1

01 +- Range (0, 1, step=1, splits=Some(8))

val metastore = spark.sharedState.externalCatalog

// Regular table

val db = spark.catalog.currentDatabase

metastore.dropTable(db, table = "t1", ignoreIfNotExists = true, purge = true)

sql("CREATE TABLE t1 (id LONG) USING parquet")

val t1 = TableIdentifier(table = "t1", database = Some(db))

val plan = c.lookupRelation(t1)

scala> println(plan.numberedTreeString)

00 'SubqueryAlias t1

01 +- 'UnresolvedCatalogRelation `default`.`t1`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe

// Regular view (not temporary view!)

// Make the example reproducible

metastore.dropTable(db, table = "v1", ignoreIfNotExists = true, purge = true)

import org.apache.spark.sql.catalyst.catalog.{CatalogStorageFormat, CatalogTable, CatalogTableType}

val v1 = TableIdentifier(table = "v1", database = Some(db))

import org.apache.spark.sql.types.StructType

val schema = new StructType().add($"id".long)

val storage = CatalogStorageFormat(locationUri = None, inputFormat = None, outputFormat = None, serde = None, compressed = false, properties = Map())

val tableDef = CatalogTable(

identifier = v1,

tableType = CatalogTableType.VIEW,

storage,

schema,

viewText = Some("SELECT 1") /** Required or RuntimeException reported */)

metastore.createTable(tableDef, ignoreIfExists = false)

val plan = c.lookupRelation(v1)

scala> println(plan.numberedTreeString)

00 'SubqueryAlias v1

01 +- View (`default`.`v1`, [id#77L])

02 +- 'Project [unresolvedalias(1, None)]

03 +- OneRowRelation

// Temporary view

spark.range(1).createOrReplaceTempView("v2")

val v2 = TableIdentifier(table = "v2", database = None)

val plan = c.lookupRelation(v2)

scala> println(plan.numberedTreeString)

00 SubqueryAlias v2

01 +- Range (0, 1, step=1, splits=Some(8))

Internally, lookupRelation looks up the name table using:

GlobalTempViewManager when the database name of the table matches the name of GlobalTempViewManager
1. Gives SubqueryAlias or reports a NoSuchTableException
ExternalCatalog when the database name of the table is specified explicitly or the registry of temporary views does not contain the table
1. Gives SubqueryAlias with View when the table is a view (aka temporary table)
2. Gives SubqueryAlias with UnresolvedCatalogRelation otherwise
The registry of temporary views
1. Gives SubqueryAlias with the logical plan per the table as registered in the registry of temporary views

Note	`lookupRelation` considers default to be the name of the database if the `name` table does not specify the database explicitly.

Note	`lookupRelation` is used when: `DescribeTableCommand` logical command is executed `ResolveRelations` logical evaluation rule is requested to lookupTableFromCatalog

Retrieving Table Metadata from External Catalog (Metastore) — `getTableMetadata` Method



getTableMetadata(name: TableIdentifier): CatalogTable

getTableMetadata(name: TableIdentifier): CatalogTable

getTableMetadata simply requests external catalog (metastore) for the table metadata.

Before requesting the external metastore, getTableMetadata makes sure that the database and table (of the input TableIdentifier) both exist. If either does not exist, getTableMetadata reports a NoSuchDatabaseException or NoSuchTableException, respectively.

Retrieving Table Metadata — `getTempViewOrPermanentTableMetadata` Method



getTempViewOrPermanentTableMetadata(name: TableIdentifier): CatalogTable

getTempViewOrPermanentTableMetadata(name: TableIdentifier): CatalogTable

Internally, getTempViewOrPermanentTableMetadata branches off per database.

When a database name is not specified, getTempViewOrPermanentTableMetadata finds a local temporary view and creates a CatalogTable (with VIEW table type and an undefined storage) or retrieves the table metadata from an external catalog.

With the database name of the GlobalTempViewManager, getTempViewOrPermanentTableMetadata requests GlobalTempViewManager for the global view definition and creates a CatalogTable (with the name of GlobalTempViewManager in table identifier, VIEW table type and an undefined storage) or reports a NoSuchTableException.

With the database name not of GlobalTempViewManager, getTempViewOrPermanentTableMetadata simply retrieves the table metadata from an external catalog.

Note

getTempViewOrPermanentTableMetadata is used when:

CatalogImpl is requested for converting TableIdentifier to Table, listing the columns of a table (as Dataset) and refreshing a table (i.e. the analyzed logical plan of the table query and re-caching it)
AlterTableAddColumnsCommand, CreateTableLikeCommand, DescribeColumnCommand, ShowColumnsCommand and ShowTablesCommand logical commands are requested to run (executed)

Reporting NoSuchDatabaseException When Specified Database Does Not Exist — `requireDbExists` Internal Method



requireDbExists(db: String): Unit

requireDbExists(db: String): Unit

requireDbExists reports a NoSuchDatabaseException if the specified database does not exist. Otherwise, requireDbExists does nothing.

`reset` Method



reset(): Unit

reset(): Unit

reset…FIXME

Note	`reset` is used exclusively in the Spark SQL internal tests.

Dropping Global Temporary View — `dropGlobalTempView` Method



dropGlobalTempView(name: String): Boolean

dropGlobalTempView(name: String): Boolean

dropGlobalTempView simply requests the GlobalTempViewManager to remove the name global temporary view.

Note	`dropGlobalTempView` is used when…FIXME

Dropping Table — `dropTable` Method



dropTable(
  name: TableIdentifier,
  ignoreIfNotExists: Boolean,
  purge: Boolean): Unit

dropTable(

ignoreIfNotExists: Boolean,

purge: Boolean): Unit

dropTable…FIXME

Note

dropTable is used when:

CreateViewCommand logical command is executed
DropTableCommand logical command is executed
DataFrameWriter is requested to save a DataFrame to a table (with Overwrite save mode)
CreateHiveTableAsSelectCommand logical command is executed
SessionCatalog is requested to reset

Getting Global Temporary View (Definition) — `getGlobalTempView` Method



getGlobalTempView(name: String): Option[LogicalPlan]

getGlobalTempView(name: String): Option[LogicalPlan]

getGlobalTempView…FIXME

Note	`getGlobalTempView` is used when…FIXME

`registerFunction` Method



registerFunction(
  funcDefinition: CatalogFunction,
  overrideIfExists: Boolean,
  functionBuilder: Option[FunctionBuilder] = None): Unit

registerFunction(

funcDefinition: CatalogFunction,

overrideIfExists: Boolean,

functionBuilder: Option[FunctionBuilder] = None): Unit

registerFunction…FIXME

Note	`registerFunction` is used when: `SessionCatalog` is requested to lookupFunction `HiveSessionCatalog` is requested to lookupFunction0 `CreateFunctionCommand` logical command is executed

`lookupFunctionInfo` Method



lookupFunctionInfo(name: FunctionIdentifier): ExpressionInfo

lookupFunctionInfo(name: FunctionIdentifier): ExpressionInfo

lookupFunctionInfo…FIXME

Note	`lookupFunctionInfo` is used when…FIXME

`alterTableDataSchema` Method



alterTableDataSchema(
  identifier: TableIdentifier,
  newDataSchema: StructType): Unit

alterTableDataSchema(

identifier: TableIdentifier,

newDataSchema: StructType): Unit

alterTableDataSchema…FIXME

Note	`alterTableDataSchema` is used when…FIXME

上一页
1
···
48
49
50
51
52
53
54
...
下一页
共 58 页

spark-sql 第51页

SessionStateBuilder

BaseSessionStateBuilder — Generic Builder of SessionState

Creating Function to Build SessionState — createClone Method

Creating SessionState Instance — build Method

Creating BaseSessionStateBuilder Instance

Getting Function to Create QueryExecution For LogicalPlan — createQueryExecution Method

SessionState — State Separation Layer Between SparkSessions

Creating SessionState Instance

apply Factory Methods

clone Method

createAnalyzer Internal Method

“Executing” Logical Plan (Creating QueryExecution For LogicalPlan) — executePlan Method

refreshTable Method

addJar Method

analyze Method

Creating New Hadoop Configuration — newHadoopConf Method

Creating New Hadoop Configuration With Extra Options — newHadoopConfWithOptions Method

HiveMetastoreCatalog — Legacy SessionCatalog for Converting Hive Metastore Relations to Data Source Relations

Converting HiveTableRelation to LogicalRelation — convertToLogicalRelation Method

inferIfNeeded Internal Method

HiveSessionCatalog — Hive-Specific Catalog of Relational Entities

Creating HiveSessionCatalog Instance

lookupFunction0 Internal Method

BucketSpec — Bucketing Specification of Table

Converting Bucketing Specification to LinkedHashMap — toLinkedHashMap Method

CatalogTablePartition — Partition Specification of Table

Creating CatalogTablePartition Instance

Converting Partition Specification to LinkedHashMap — toLinkedHashMap Method

location Method

CatalogStorageFormat — Storage Specification of Table or Partition

Converting Storage Specification to LinkedHashMap — toLinkedHashMap Method

CatalogTable — Table Specification (Native Table Metadata)

Table Statistics for Query Planning (Auto Broadcast Joins and Cost-Based Optimization)

dataSchema Method

partitionSchema Method

Converting Table Specification to LinkedHashMap — toLinkedHashMap Method

Creating CatalogTable Instance

database Method

SessionCatalog — Session-Scoped Catalog of Relational Entities

requireTableExists Internal Method

databaseExists Method

listTables Method

Checking Whether Table Is Temporary View — isTemporaryTable Method

alterPartitions Method

listPartitions Method

alterTable Method

Altering Table Statistics in Metastore (and Invalidating Internal Cache) — alterTableStats Method

tableExists Method

functionExists Method

listFunctions Method

Invalidating Table Relation Cache (aka Refreshing Table) — refreshTable Method

loadFunctionResources Method

Altering (Updating) Temporary View (Logical Plan) — alterTempViewDefinition Method

Creating (Registering) Or Replacing Local Temporary View — createTempView Method

Creating (Registering) Or Replacing Global Temporary View — createGlobalTempView Method

createTable Method

Creating SessionCatalog Instance

Finding Function by Name (Using FunctionRegistry) — lookupFunction Method

Finding Relation (Table or View) in Catalogs — lookupRelation Method

Retrieving Table Metadata from External Catalog (Metastore) — getTableMetadata Method

Retrieving Table Metadata — getTempViewOrPermanentTableMetadata Method

Reporting NoSuchDatabaseException When Specified Database Does Not Exist — requireDbExists Internal Method

reset Method

Dropping Global Temporary View — dropGlobalTempView Method

Dropping Table — dropTable Method

Getting Global Temporary View (Definition) — getGlobalTempView Method

registerFunction Method

lookupFunctionInfo Method

alterTableDataSchema Method

欢迎关注：spark技术分享

关注公众号：spark技术分享

QQ咨询

回顶部

Creating Function to Build SessionState — `createClone` Method

Creating SessionState Instance — `build` Method

Getting Function to Create QueryExecution For LogicalPlan — `createQueryExecution` Method

`apply` Factory Methods

`clone` Method

`createAnalyzer` Internal Method

“Executing” Logical Plan (Creating QueryExecution For LogicalPlan) — `executePlan` Method

`refreshTable` Method

`addJar` Method

`analyze` Method

Creating New Hadoop Configuration — `newHadoopConf` Method

Creating New Hadoop Configuration With Extra Options — `newHadoopConfWithOptions` Method

Converting HiveTableRelation to LogicalRelation — `convertToLogicalRelation` Method

`inferIfNeeded` Internal Method

`lookupFunction0` Internal Method

Converting Bucketing Specification to LinkedHashMap — `toLinkedHashMap` Method

Converting Partition Specification to LinkedHashMap — `toLinkedHashMap` Method

`location` Method

Converting Storage Specification to LinkedHashMap — `toLinkedHashMap` Method

`dataSchema` Method

`partitionSchema` Method

Converting Table Specification to LinkedHashMap — `toLinkedHashMap` Method

`database` Method

`requireTableExists` Internal Method

`databaseExists` Method

`listTables` Method

Checking Whether Table Is Temporary View — `isTemporaryTable` Method

`alterPartitions` Method

`listPartitions` Method

`alterTable` Method

Altering Table Statistics in Metastore (and Invalidating Internal Cache) — `alterTableStats` Method

`tableExists` Method

`functionExists` Method

`listFunctions` Method

Invalidating Table Relation Cache (aka Refreshing Table) — `refreshTable` Method

`loadFunctionResources` Method

Altering (Updating) Temporary View (Logical Plan) — `alterTempViewDefinition` Method

Creating (Registering) Or Replacing Local Temporary View — `createTempView` Method

Creating (Registering) Or Replacing Global Temporary View — `createGlobalTempView` Method

`createTable` Method

Finding Function by Name (Using FunctionRegistry) — `lookupFunction` Method

Finding Relation (Table or View) in Catalogs — `lookupRelation` Method

Retrieving Table Metadata from External Catalog (Metastore) — `getTableMetadata` Method

Retrieving Table Metadata — `getTempViewOrPermanentTableMetadata` Method

Reporting NoSuchDatabaseException When Specified Database Does Not Exist — `requireDbExists` Internal Method

`reset` Method

Dropping Global Temporary View — `dropGlobalTempView` Method

Dropping Table — `dropTable` Method

Getting Global Temporary View (Definition) — `getGlobalTempView` Method

`registerFunction` Method

`lookupFunctionInfo` Method

`alterTableDataSchema` Method