SharedState — State Shared Across SparkSessions
SharedState
holds the shared state across multiple SparkSessions.
Name | Type | Description |
---|---|---|
|
||
|
|
|
|
|
|
|
||
|
SharedState
is available as the sharedState property of a SparkSession
.
1 2 3 4 5 6 7 8 9 |
scala> :type spark org.apache.spark.sql.SparkSession scala> :type spark.sharedState org.apache.spark.sql.internal.SharedState |
SharedState
is shared across SparkSessions
.
1 2 3 4 5 6 |
scala> spark.newSession.sharedState == spark.sharedState res1: Boolean = true |
SharedState
is created exclusively when accessed using sharedState property of SparkSession
.
Tip
|
Enable Add the following line to
Refer to Logging. |
warehousePath
Property
1 2 3 4 5 |
warehousePath: String |
warehousePath
is the warehouse path with the value of:
-
hive.metastore.warehouse.dir if defined and spark.sql.warehouse.dir is not
-
spark.sql.warehouse.dir if
hive.metastore.warehouse.dir
is undefined
You should see the following INFO message in the logs when SharedState
is created:
1 2 3 4 5 |
INFO Warehouse path is '[warehousePath]'. |
warehousePath
is used exclusively when SharedState
initializes ExternalCatalog (and creates the default database in the metastore).
While initialized, warehousePath
does the following:
-
Loads
hive-site.xml
if available on CLASSPATH, i.e. adds it as a configuration resource to Hadoop’s Configuration (ofSparkContext
). -
Removes
hive.metastore.warehouse.dir
fromSparkConf
(ofSparkContext
) and leaves it off if defined using any of the Hadoop configuration resources. -
Sets spark.sql.warehouse.dir or hive.metastore.warehouse.dir in the Hadoop configuration (of
SparkContext
)-
If
hive.metastore.warehouse.dir
has been defined in any of the Hadoop configuration resources but spark.sql.warehouse.dir has not,spark.sql.warehouse.dir
becomes the value ofhive.metastore.warehouse.dir
.You should see the following INFO message in the logs:
12345spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('[hiveWarehouseDir]'). -
Otherwise, the Hadoop configuration’s
hive.metastore.warehouse.dir
is set tospark.sql.warehouse.dir
You should see the following INFO message in the logs:
12345Setting hive.metastore.warehouse.dir ('[hiveWarehouseDir]') to the value of spark.sql.warehouse.dir ('[sparkWarehouseDir]').
-
externalCatalog
Property
1 2 3 4 5 |
externalCatalog: ExternalCatalog |
externalCatalog
is created reflectively per spark.sql.catalogImplementation internal configuration property (with the current Hadoop’s Configuration as SparkContext.hadoopConfiguration
):
-
HiveExternalCatalog for
hive
-
InMemoryCatalog for
in-memory
(default)
While initialized:
-
Creates the default database (with
default database
description and warehousePath location) if it doesn’t exist. -
Registers a
ExternalCatalogEventListener
that propagates external catalog events to the Spark listener bus.
externalCatalogClassName
Internal Method
1 2 3 4 5 |
externalCatalogClassName(conf: SparkConf): String |
externalCatalogClassName
gives the name of the class of the ExternalCatalog per spark.sql.catalogImplementation, i.e.
Note
|
externalCatalogClassName is used exclusively when SharedState is requested for the ExternalCatalog.
|
Accessing Management Interface of Global Temporary Views — globalTempViewManager
Property
1 2 3 4 5 |
globalTempViewManager: GlobalTempViewManager |
When accessed for the very first time, globalTempViewManager
gets the name of the global temporary view database (as the value of spark.sql.globalTempDatabase internal static configuration property).
In the end, globalTempViewManager
creates a new GlobalTempViewManager (with the database name).
globalTempViewManager
throws a SparkException
when the global temporary view database exist in the ExternalCatalog.
1 2 3 4 5 |
[globalTempDB] is a system preserved database, please rename your existing database to resolve the name conflict, or set a different value for spark.sql.globalTempDatabase, and launch your Spark application again. |
Note
|
globalTempViewManager is used when BaseSessionStateBuilder and HiveSessionStateBuilder are requested for the SessionCatalog.
|