关注 spark技术分享,
撸spark源码 玩spark最佳实践

HiveClientImpl — The One and Only HiveClient

HiveClientImpl — The One and Only HiveClient

HiveClientImpl is the only available HiveClient in Spark SQL that does/uses…​FIXME

HiveClientImpl is created exclusively when IsolatedClientLoader is requested to create a new Hive client. When created, HiveClientImpl is given the location of the default database for the Hive metastore warehouse (i.e. warehouseDir that is the value of hive.metastore.warehouse.dir Hive-specific Hadoop configuration property).

Note
The location of the default database for the Hive metastore warehouse is /user/hive/warehouse by default.
Note
You may be interested in SPARK-19664 put ‘hive.metastore.warehouse.dir’ in hadoopConf place if you use Spark before 2.1 (which you should not really as it is not supported anymore).
Note
The Hadoop configuration is what HiveExternalCatalog was given when created (which is the default Hadoop configuration from Spark Core’s SparkContext.hadoopConfiguration with the Spark properties with spark.hadoop prefix).
Tip

Enable DEBUG logging level for org.apache.spark.sql.hive.client.HiveClientImpl logger to see what happens inside.

Add the following line to conf/log4j.properties:

Refer to Logging.

renamePartitions Method

Note
renamePartitions is part of HiveClient Contract to…​FIXME.

renamePartitions…​FIXME

alterPartitions Method

Note
alterPartitions is part of HiveClient Contract to…​FIXME.

alterPartitions…​FIXME

client Internal Method

client…​FIXME

Note
client is used…​FIXME

getPartitions Method

Note
getPartitions is part of HiveClient Contract to…​FIXME.

getPartitions…​FIXME

getPartitionsByFilter Method

Note
getPartitionsByFilter is part of HiveClient Contract to…​FIXME.

getPartitionsByFilter…​FIXME

getPartitionOption Method

Note
getPartitionOption is part of HiveClient Contract to…​FIXME.

getPartitionOption…​FIXME

Creating HiveClientImpl Instance

HiveClientImpl takes the following when created:

  • HiveVersion

  • Location of the default database for the Hive metastore warehouse if defined (aka warehouseDir)

  • SparkConf

  • Hadoop configuration

  • Extra configuration

  • Initial ClassLoader

  • IsolatedClientLoader

HiveClientImpl initializes the internal registries and counters.

Retrieving Table Metadata If Available — getTableOption Method

Note
getTableOption is part of HiveClient Contract to…​FIXME.

When executed, getTableOption prints out the following DEBUG message to the logs:

getTableOption requests Hive client to retrieve the metadata of the table and creates a CatalogTable.

Creating Table Statistics from Hive’s Table or Partition Parameters — readHiveStats Internal Method

readHiveStats creates a CatalogStatistics from the input Hive table or partition parameters (if available and greater than 0).

Table 1. Table Statistics and Hive Parameters
Hive Parameter Table Statistics

totalSize

sizeInBytes

rawDataSize

sizeInBytes

numRows

rowCount

Note
totalSize Hive parameter has a higher precedence over rawDataSize for sizeInBytes table statistic.
Note
readHiveStats is used when HiveClientImpl is requested for the metadata of a table or table partition.

Retrieving Table Partition Metadata (Converting Table Partition Metadata from Hive Format to Spark SQL Format) — fromHivePartition Method

fromHivePartition simply creates a CatalogTablePartition with the following:

Note
fromHivePartition is used when HiveClientImpl is requested for getPartitionOption, getPartitions and getPartitionsByFilter.

Converting Native Table Metadata to Hive’s Table — toHiveTable Method

toHiveTable simply creates a new Hive Table and copies the properties from the input CatalogTable.

Note

toHiveTable is used when:

赞(0) 打赏
未经允许不得转载:spark技术分享 » HiveClientImpl — The One and Only HiveClient
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏