SessionState-spark技术分享

SessionState — State Separation Layer Between SparkSessions

SessionState is the state separation layer between Spark SQL sessions, including SQL configuration, tables, functions, UDFs, SQL parser, and everything else that depends on a SQLConf.

SessionState is available as the sessionState property of a SparkSession.



scala> :type spark
org.apache.spark.sql.SparkSession

scala> :type spark.sessionState
org.apache.spark.sql.internal.SessionState

scala> :type spark

org.apache.spark.sql.SparkSession

scala> :type spark.sessionState

org.apache.spark.sql.internal.SessionState

SessionState is created when SparkSession is requested to instantiateSessionState (when requested for the SessionState per spark.sql.catalogImplementation configuration property).

Figure 1. Creating SessionState

Note

When requested for the SessionState, SparkSession uses spark.sql.catalogImplementation configuration property to load and create a BaseSessionStateBuilder that is then requested to create a SessionState instance.

There are two BaseSessionStateBuilders available:

(default) SessionStateBuilder for in-memory catalog
HiveSessionStateBuilder for hive catalog

hive catalog is set when the SparkSession was created with the Hive support enabled (using Builder.enableHiveSupport).

Table 1. SessionState’s (Lazily-Initialized) Attributes
Name	Type	Description
`analyzer`	Analyzer	Spark Analyzer Initialized lazily (i.e. only when requested the first time) using the analyzerBuilder factory function. Used when…FIXME
`catalog`	SessionCatalog	Metastore of tables and databases Used when…FIXME
`conf`	SQLConf	FIXME Used when…FIXME
`experimentalMethods`	ExperimentalMethods	FIXME Used when…FIXME
`functionRegistry`	FunctionRegistry	FIXME Used when…FIXME
`functionResourceLoader`	`FunctionResourceLoader`	FIXME Used when…FIXME
`listenerManager`	ExecutionListenerManager	FIXME Used when…FIXME
`optimizer`	Optimizer	Logical query plan optimizer Used exclusively when `QueryExecution` creates an optimized logical plan.
`resourceLoader`	`SessionResourceLoader`	FIXME Used when…FIXME
`sqlParser`	ParserInterface	FIXME Used when…FIXME
`streamingQueryManager`	`StreamingQueryManager`	Used to manage streaming queries in Spark Structured Streaming
`udfRegistration`	UDFRegistration	Interface to register user-defined functions. Used when…FIXME

Note	`SessionState` is a `private[sql]` class and, given the package `org.apache.spark.sql.internal`, `SessionState` should be considered internal.

Creating SessionState Instance

SessionState takes the following when created:

SharedState
SQLConf
ExperimentalMethods
FunctionRegistry
UDFRegistration
catalogBuilder function to create a SessionCatalog (i.e. () ⇒ SessionCatalog)
ParserInterface
analyzerBuilder function to create an Analyzer (i.e. () ⇒ Analyzer)
optimizerBuilder function to create an Optimizer (i.e. () ⇒ Optimizer)
SparkPlanner
Spark Structured Streaming’s StreamingQueryManager
ExecutionListenerManager
resourceLoaderBuilder function to create a SessionResourceLoader (i.e. () ⇒ SessionResourceLoader)
createQueryExecution function to create a QueryExecution given a LogicalPlan (i.e. LogicalPlan ⇒ QueryExecution)
createClone function to clone the SessionState given a SparkSession (i.e. (SparkSession, SessionState) ⇒ SessionState)

`apply` Factory Methods

Caution

FIXME



apply(sparkSession: SparkSession): SessionState (1)
apply(sparkSession: SparkSession, sqlConf: SQLConf): SessionState

apply(sparkSession: SparkSession): SessionState (1)

apply(sparkSession: SparkSession, sqlConf: SQLConf): SessionState

Passes sparkSession to the other apply with a new SQLConf

Note	`apply` is used when `SparkSession` is requested for `SessionState`.

`clone` Method

Caution

FIXME

Note	`clone` is used when…

`createAnalyzer` Internal Method



createAnalyzer(
  sparkSession: SparkSession,
  catalog: SessionCatalog,
  sqlConf: SQLConf): Analyzer

createAnalyzer(

sparkSession: SparkSession,

catalog: SessionCatalog,

sqlConf: SQLConf): Analyzer

createAnalyzer creates a logical query plan Analyzer with rules specific to a non-Hive SessionState.

Table 2. Analyzer’s Evaluation Rules for non-Hive SessionState (in the order of execution)
Method	Rules	Description
extendedResolutionRules	FindDataSourceTable	Replaces InsertIntoTable (with `CatalogRelation`) and `CatalogRelation` logical plans with LogicalRelation.
extendedResolutionRules	ResolveSQLOnFile
postHocResolutionRules	PreprocessTableCreation
	PreprocessTableInsertion
	DataSourceAnalysis
extendedCheckRules	PreWriteCheck
extendedCheckRules	HiveOnlyCheck

Note	`createAnalyzer` is used when `SessionState` is created or cloned.

“Executing” Logical Plan (Creating QueryExecution For LogicalPlan) — `executePlan` Method



executePlan(plan: LogicalPlan): QueryExecution

executePlan(plan: LogicalPlan): QueryExecution

executePlan simply executes the createQueryExecution function on the input logical plan (that simply creates a QueryExecution with the current SparkSession and the input logical plan).

`refreshTable` Method

refreshTable is…

`addJar` Method

addJar is…

`analyze` Method

analyze is…

Creating New Hadoop Configuration — `newHadoopConf` Method



newHadoopConf(): Configuration
newHadoopConf(hadoopConf: Configuration, sqlConf: SQLConf): Configuration

newHadoopConf(): Configuration

newHadoopConf(hadoopConf: Configuration, sqlConf: SQLConf): Configuration

newHadoopConf returns a Hadoop Configuration (with the SparkContext.hadoopConfiguration and all the configuration properties of the SQLConf).

Note	`newHadoopConf` is used by `ScriptTransformation`, `ParquetRelation`, `StateStoreRDD`, and `SessionState` itself, and few other places.

Creating New Hadoop Configuration With Extra Options — `newHadoopConfWithOptions` Method



newHadoopConfWithOptions(options: Map[String, String]): Configuration

newHadoopConfWithOptions(options: Map[String, String]): Configuration

newHadoopConfWithOptions creates a new Hadoop Configuration with the input options set (except path and paths options that are skipped).

Note	`newHadoopConfWithOptions` is used when: `TextBasedFileFormat` is requested to say whether it is splitable or not `FileSourceScanExec` is requested for the input RDD `InsertIntoHadoopFsRelationCommand` is requested to run `PartitioningAwareFileIndex` is requested for the Hadoop Configuration

SessionState

SessionState — State Separation Layer Between SparkSessions

Creating SessionState Instance

`apply` Factory Methods

`clone` Method

`createAnalyzer` Internal Method

“Executing” Logical Plan (Creating QueryExecution For LogicalPlan) — `executePlan` Method

`refreshTable` Method

`addJar` Method

`analyze` Method

Creating New Hadoop Configuration — `newHadoopConf` Method

Creating New Hadoop Configuration With Extra Options — `newHadoopConfWithOptions` Method

相关推荐

欢迎关注：spark技术分享

热门标签

近期文章

分类目录

关注公众号：spark技术分享

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部

SessionState — State Separation Layer Between SparkSessions

Creating SessionState Instance

apply Factory Methods

clone Method

createAnalyzer Internal Method

“Executing” Logical Plan (Creating QueryExecution For LogicalPlan) — executePlan Method

refreshTable Method

addJar Method

analyze Method

Creating New Hadoop Configuration — newHadoopConf Method

Creating New Hadoop Configuration With Extra Options — newHadoopConfWithOptions Method

相关推荐

欢迎关注：spark技术分享

热门标签

近期文章

分类目录

关注公众号：spark技术分享

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部

`apply` Factory Methods

`clone` Method

`createAnalyzer` Internal Method

“Executing” Logical Plan (Creating QueryExecution For LogicalPlan) — `executePlan` Method

`refreshTable` Method

`addJar` Method

`analyze` Method

Creating New Hadoop Configuration — `newHadoopConf` Method

Creating New Hadoop Configuration With Extra Options — `newHadoopConfWithOptions` Method