SharedState — State Shared Across SparkSessions
SharedState holds the shared state across multiple SparkSessions.
| Name | Type | Description |
|---|---|---|
|
|
||
|
|
|
|
|
|
|
|
|
|
||
|
|
SharedState is available as the sharedState property of a SparkSession.
|
1 2 3 4 5 6 7 8 9 |
scala> :type spark org.apache.spark.sql.SparkSession scala> :type spark.sharedState org.apache.spark.sql.internal.SharedState |
SharedState is shared across SparkSessions.
|
1 2 3 4 5 6 |
scala> spark.newSession.sharedState == spark.sharedState res1: Boolean = true |
SharedState is created exclusively when accessed using sharedState property of SparkSession.
|
Tip
|
Enable Add the following line to
Refer to Logging. |
warehousePath Property
|
1 2 3 4 5 |
warehousePath: String |
warehousePath is the warehouse path with the value of:
-
hive.metastore.warehouse.dir if defined and spark.sql.warehouse.dir is not
-
spark.sql.warehouse.dir if
hive.metastore.warehouse.diris undefined
You should see the following INFO message in the logs when SharedState is created:
|
1 2 3 4 5 |
INFO Warehouse path is '[warehousePath]'. |
warehousePath is used exclusively when SharedState initializes ExternalCatalog (and creates the default database in the metastore).
While initialized, warehousePath does the following:
-
Loads
hive-site.xmlif available on CLASSPATH, i.e. adds it as a configuration resource to Hadoop’s Configuration (ofSparkContext). -
Removes
hive.metastore.warehouse.dirfromSparkConf(ofSparkContext) and leaves it off if defined using any of the Hadoop configuration resources. -
Sets spark.sql.warehouse.dir or hive.metastore.warehouse.dir in the Hadoop configuration (of
SparkContext)-
If
hive.metastore.warehouse.dirhas been defined in any of the Hadoop configuration resources but spark.sql.warehouse.dir has not,spark.sql.warehouse.dirbecomes the value ofhive.metastore.warehouse.dir.You should see the following INFO message in the logs:
12345spark.sql.warehouse.dir is not set, but hive.metastore.warehouse.dir is set. Setting spark.sql.warehouse.dir to the value of hive.metastore.warehouse.dir ('[hiveWarehouseDir]'). -
Otherwise, the Hadoop configuration’s
hive.metastore.warehouse.diris set tospark.sql.warehouse.dirYou should see the following INFO message in the logs:
12345Setting hive.metastore.warehouse.dir ('[hiveWarehouseDir]') to the value of spark.sql.warehouse.dir ('[sparkWarehouseDir]').
-
externalCatalog Property
|
1 2 3 4 5 |
externalCatalog: ExternalCatalog |
externalCatalog is created reflectively per spark.sql.catalogImplementation internal configuration property (with the current Hadoop’s Configuration as SparkContext.hadoopConfiguration):
-
HiveExternalCatalog for
hive -
InMemoryCatalog for
in-memory(default)
While initialized:
-
Creates the default database (with
default databasedescription and warehousePath location) if it doesn’t exist. -
Registers a
ExternalCatalogEventListenerthat propagates external catalog events to the Spark listener bus.
externalCatalogClassName Internal Method
|
1 2 3 4 5 |
externalCatalogClassName(conf: SparkConf): String |
externalCatalogClassName gives the name of the class of the ExternalCatalog per spark.sql.catalogImplementation, i.e.
|
Note
|
externalCatalogClassName is used exclusively when SharedState is requested for the ExternalCatalog.
|
Accessing Management Interface of Global Temporary Views — globalTempViewManager Property
|
1 2 3 4 5 |
globalTempViewManager: GlobalTempViewManager |
When accessed for the very first time, globalTempViewManager gets the name of the global temporary view database (as the value of spark.sql.globalTempDatabase internal static configuration property).
In the end, globalTempViewManager creates a new GlobalTempViewManager (with the database name).
globalTempViewManager throws a SparkException when the global temporary view database exist in the ExternalCatalog.
|
1 2 3 4 5 |
[globalTempDB] is a system preserved database, please rename your existing database to resolve the name conflict, or set a different value for spark.sql.globalTempDatabase, and launch your Spark application again. |
|
Note
|
globalTempViewManager is used when BaseSessionStateBuilder and HiveSessionStateBuilder are requested for the SessionCatalog.
|
spark技术分享