关注 spark技术分享,
撸spark源码 玩spark最佳实践

SharedState — State Shared Across SparkSessions

SharedState — State Shared Across SparkSessions

SharedState holds the shared state across multiple SparkSessions.

Table 1. SharedState’s Properties
Name Type Description

cacheManager

CacheManager

externalCatalog

ExternalCatalog

Metastore of permanent relational entities, i.e. databases, tables, partitions, and functions.

Note
externalCatalog is initialized lazily on the first access.

globalTempViewManager

GlobalTempViewManager

Management interface of global temporary views

jarClassLoader

NonClosableMutableURLClassLoader

sparkContext

SparkContext

Spark Core’s SparkContext

statusStore

SQLAppStatusStore

warehousePath

String

Warehouse path

SharedState is available as the sharedState property of a SparkSession.

SharedState is shared across SparkSessions.

SharedState is created exclusively when accessed using sharedState property of SparkSession.

Tip

Enable INFO logging level for org.apache.spark.sql.internal.SharedState logger to see what happens inside.

Add the following line to conf/log4j.properties:

Refer to Logging.

warehousePath Property

warehousePath is the warehouse path with the value of:

  1. hive.metastore.warehouse.dir if defined and spark.sql.warehouse.dir is not

  2. spark.sql.warehouse.dir if hive.metastore.warehouse.dir is undefined

You should see the following INFO message in the logs when SharedState is created:

warehousePath is used exclusively when SharedState initializes ExternalCatalog (and creates the default database in the metastore).

While initialized, warehousePath does the following:

  1. Loads hive-site.xml if available on CLASSPATH, i.e. adds it as a configuration resource to Hadoop’s Configuration (of SparkContext).

  2. Removes hive.metastore.warehouse.dir from SparkConf (of SparkContext) and leaves it off if defined using any of the Hadoop configuration resources.

  3. Sets spark.sql.warehouse.dir or hive.metastore.warehouse.dir in the Hadoop configuration (of SparkContext)

    1. If hive.metastore.warehouse.dir has been defined in any of the Hadoop configuration resources but spark.sql.warehouse.dir has not, spark.sql.warehouse.dir becomes the value of hive.metastore.warehouse.dir.

      You should see the following INFO message in the logs:

    2. Otherwise, the Hadoop configuration’s hive.metastore.warehouse.dir is set to spark.sql.warehouse.dir

      You should see the following INFO message in the logs:

externalCatalog Property

externalCatalog is created reflectively per spark.sql.catalogImplementation internal configuration property (with the current Hadoop’s Configuration as SparkContext.hadoopConfiguration):

While initialized:

  1. Creates the default database (with default database description and warehousePath location) if it doesn’t exist.

  2. Registers a ExternalCatalogEventListener that propagates external catalog events to the Spark listener bus.

externalCatalogClassName Internal Method

externalCatalogClassName gives the name of the class of the ExternalCatalog per spark.sql.catalogImplementation, i.e.

Note
externalCatalogClassName is used exclusively when SharedState is requested for the ExternalCatalog.

Accessing Management Interface of Global Temporary Views — globalTempViewManager Property

When accessed for the very first time, globalTempViewManager gets the name of the global temporary view database (as the value of spark.sql.globalTempDatabase internal static configuration property).

In the end, globalTempViewManager creates a new GlobalTempViewManager (with the database name).

globalTempViewManager throws a SparkException when the global temporary view database exist in the ExternalCatalog.

Note
globalTempViewManager is used when BaseSessionStateBuilder and HiveSessionStateBuilder are requested for the SessionCatalog.
赞(0) 打赏
未经允许不得转载:spark技术分享 » SharedState — State Shared Across SparkSessions
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏