DescribeTableCommand-spark技术分享

DescribeTableCommand Logical Command

DescribeTableCommand is a logical command that executes a DESCRIBE TABLE SQL statement.

DescribeTableCommand is created exclusively when SparkSqlAstBuilder is requested to parse DESCRIBE TABLE SQL statement (with no column specified).

DescribeTableCommand uses the following output schema:

col_name as the name of the column
data_type as the data type of the column
comment as the comment of the column



spark.range(1).createOrReplaceTempView("demo")

// DESC view
scala> sql("DESC EXTENDED demo").show
+--------+---------+-------+
|col_name|data_type|comment|
+--------+---------+-------+
|      id|   bigint|   null|
+--------+---------+-------+

// DESC table
// Make the demo reproducible
spark.sharedState.externalCatalog.dropTable(
  db = "default",
  table = "bucketed",
  ignoreIfNotExists = true,
  purge = true)
spark.range(10).write.bucketBy(5, "id").saveAsTable("bucketed")
assert(spark.catalog.tableExists("bucketed"))

// EXTENDED to include Detailed Table Information
// Note no partitions used
// Could also be FORMATTED
scala> sql("DESC EXTENDED bucketed").show(numRows = 50, truncate = false)
+----------------------------+-----------------------------------------------------------------------------+-------+
|col_name                    |data_type                                                                    |comment|
+----------------------------+-----------------------------------------------------------------------------+-------+
|id                          |bigint                                                                       |null   |
|                            |                                                                             |       |
|# Detailed Table Information|                                                                             |       |
|Database                    |default                                                                      |       |
|Table                       |bucketed                                                                     |       |
|Owner                       |jacek                                                                        |       |
|Created Time                |Sun Sep 30 20:57:22 CEST 2018                                                |       |
|Last Access                 |Thu Jan 01 01:00:00 CET 1970                                                 |       |
|Created By                  |Spark 2.3.1                                                                  |       |
|Type                        |MANAGED                                                                      |       |
|Provider                    |parquet                                                                      |       |
|Num Buckets                 |5                                                                            |       |
|Bucket Columns              |[`id`]                                                                       |       |
|Sort Columns                |[]                                                                           |       |
|Table Properties            |[transient_lastDdlTime=1538333842]                                           |       |
|Statistics                  |3740 bytes                                                                   |       |
|Location                    |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/bucketed|       |
|Serde Library               |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                           |       |
|InputFormat                 |org.apache.hadoop.mapred.SequenceFileInputFormat                             |       |
|OutputFormat                |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                    |       |
|Storage Properties          |[serialization.format=1]                                                     |       |
+----------------------------+-----------------------------------------------------------------------------+-------+

// Make the demo reproducible
val tableName = "partitioned_bucketed_sorted"
val partCol = "part"
spark.sharedState.externalCatalog.dropTable(
  db = "default",
  table = tableName,
  ignoreIfNotExists = true,
  purge = true)
spark.range(10)
  .withColumn("part", $"id" % 2) // extra column for partitions
  .write
  .partitionBy(partCol)
  .bucketBy(5, "id")
  .sortBy("id")
  .saveAsTable(tableName)
assert(spark.catalog.tableExists(tableName))
scala> sql(s"DESC EXTENDED $tableName").show(numRows = 50, truncate = false)
+----------------------------+------------------------------------------------------------------------------------------------+-------+
|col_name                    |data_type                                                                                       |comment|
+----------------------------+------------------------------------------------------------------------------------------------+-------+
|id                          |bigint                                                                                          |null   |
|part                        |bigint                                                                                          |null   |
|# Partition Information     |                                                                                                |       |
|# col_name                  |data_type                                                                                       |comment|
|part                        |bigint                                                                                          |null   |
|                            |                                                                                                |       |
|# Detailed Table Information|                                                                                                |       |
|Database                    |default                                                                                         |       |
|Table                       |partitioned_bucketed_sorted                                                                     |       |
|Owner                       |jacek                                                                                           |       |
|Created Time                |Mon Oct 01 10:05:32 CEST 2018                                                                   |       |
|Last Access                 |Thu Jan 01 01:00:00 CET 1970                                                                    |       |
|Created By                  |Spark 2.3.1                                                                                     |       |
|Type                        |MANAGED                                                                                         |       |
|Provider                    |parquet                                                                                         |       |
|Num Buckets                 |5                                                                                               |       |
|Bucket Columns              |[`id`]                                                                                          |       |
|Sort Columns                |[`id`]                                                                                          |       |
|Table Properties            |[transient_lastDdlTime=1538381132]                                                              |       |
|Location                    |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted|       |
|Serde Library               |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                                              |       |
|InputFormat                 |org.apache.hadoop.mapred.SequenceFileInputFormat                                                |       |
|OutputFormat                |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                                       |       |
|Storage Properties          |[serialization.format=1]                                                                        |       |
|Partition Provider          |Catalog                                                                                         |       |
+----------------------------+------------------------------------------------------------------------------------------------+-------+

scala> sql(s"DESCRIBE EXTENDED $tableName PARTITION ($partCol=1)").show(numRows = 50, truncate = false)
+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-------+
|col_name                        |data_type                                                                                                                      |comment|
+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-------+
|id                              |bigint                                                                                                                         |null   |
|part                            |bigint                                                                                                                         |null   |
|# Partition Information         |                                                                                                                               |       |
|# col_name                      |data_type                                                                                                                      |comment|
|part                            |bigint                                                                                                                         |null   |
|                                |                                                                                                                               |       |
|# Detailed Partition Information|                                                                                                                               |       |
|Database                        |default                                                                                                                        |       |
|Table                           |partitioned_bucketed_sorted                                                                                                    |       |
|Partition Values                |[part=1]                                                                                                                       |       |
|Location                        |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted/part=1                        |       |
|Serde Library                   |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                                                                             |       |
|InputFormat                     |org.apache.hadoop.mapred.SequenceFileInputFormat                                                                               |       |
|OutputFormat                    |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                                                                      |       |
|Storage Properties              |[path=file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted, serialization.format=1]|       |
|Partition Parameters            |{totalSize=1870, numFiles=5, transient_lastDdlTime=1538381329}                                                                 |       |
|Partition Statistics            |1870 bytes                                                                                                                     |       |
|                                |                                                                                                                               |       |
|# Storage Information           |                                                                                                                               |       |
|Num Buckets                     |5                                                                                                                              |       |
|Bucket Columns                  |[`id`]                                                                                                                         |       |
|Sort Columns                    |[`id`]                                                                                                                         |       |
|Location                        |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted                               |       |
|Serde Library                   |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe                                                                             |       |
|InputFormat                     |org.apache.hadoop.mapred.SequenceFileInputFormat                                                                               |       |
|OutputFormat                    |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat                                                                      |       |
|Storage Properties              |[serialization.format=1]                                                                                                       |       |
+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-------+

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

spark.range(1).createOrReplaceTempView("demo")

// DESC view

scala> sql("DESC EXTENDED demo").show

+--------+---------+-------+

|col_name|data_type|comment|

+--------+---------+-------+

| id| bigint| null|

+--------+---------+-------+

// DESC table

// Make the demo reproducible

spark.sharedState.externalCatalog.dropTable(

db = "default",

table = "bucketed",

ignoreIfNotExists = true,

purge = true)

spark.range(10).write.bucketBy(5, "id").saveAsTable("bucketed")

assert(spark.catalog.tableExists("bucketed"))

// EXTENDED to include Detailed Table Information

// Note no partitions used

// Could also be FORMATTED

scala> sql("DESC EXTENDED bucketed").show(numRows = 50, truncate = false)

+----------------------------+-----------------------------------------------------------------------------+-------+

|col_name |data_type |comment|

+----------------------------+-----------------------------------------------------------------------------+-------+

|id |bigint |null |

| | | |

|# Detailed Table Information| | |

|Database |default | |

|Table |bucketed | |

|Owner |jacek | |

|Created Time |Sun Sep 30 20:57:22 CEST 2018 | |

|Last Access |Thu Jan 01 01:00:00 CET 1970 | |

|Created By |Spark 2.3.1 | |

|Type |MANAGED | |

|Provider |parquet | |

|Num Buckets |5 | |

|Bucket Columns |[`id`] | |

|Sort Columns |[] | |

|Table Properties |[transient_lastDdlTime=1538333842] | |

|Statistics |3740 bytes | |

|Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/bucketed| |

|Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |

|InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | |

|OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | |

|Storage Properties |[serialization.format=1] | |

+----------------------------+-----------------------------------------------------------------------------+-------+

// Make the demo reproducible

val tableName = "partitioned_bucketed_sorted"

val partCol = "part"

spark.sharedState.externalCatalog.dropTable(

db = "default",

table = tableName,

ignoreIfNotExists = true,

purge = true)

spark.range(10)

.withColumn("part", $"id" % 2) // extra column for partitions

.write

.partitionBy(partCol)

.bucketBy(5, "id")

.sortBy("id")

.saveAsTable(tableName)

assert(spark.catalog.tableExists(tableName))

scala> sql(s"DESC EXTENDED $tableName").show(numRows = 50, truncate = false)

+----------------------------+------------------------------------------------------------------------------------------------+-------+

|col_name |data_type |comment|

+----------------------------+------------------------------------------------------------------------------------------------+-------+

|id |bigint |null |

|part |bigint |null |

|# Partition Information | | |

|# col_name |data_type |comment|

|part |bigint |null |

| | | |

|# Detailed Table Information| | |

|Database |default | |

|Table |partitioned_bucketed_sorted | |

|Owner |jacek | |

|Created Time |Mon Oct 01 10:05:32 CEST 2018 | |

|Last Access |Thu Jan 01 01:00:00 CET 1970 | |

|Created By |Spark 2.3.1 | |

|Type |MANAGED | |

|Provider |parquet | |

|Num Buckets |5 | |

|Bucket Columns |[`id`] | |

|Sort Columns |[`id`] | |

|Table Properties |[transient_lastDdlTime=1538381132] | |

|Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted| |

|Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |

|InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | |

|OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | |

|Storage Properties |[serialization.format=1] | |

|Partition Provider |Catalog | |

+----------------------------+------------------------------------------------------------------------------------------------+-------+

scala> sql(s"DESCRIBE EXTENDED $tableName PARTITION ($partCol=1)").show(numRows = 50, truncate = false)

+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-------+

|col_name |data_type |comment|

+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-------+

|id |bigint |null |

|part |bigint |null |

|# Partition Information | | |

|# col_name |data_type |comment|

|part |bigint |null |

| | | |

|# Detailed Partition Information| | |

|Database |default | |

|Table |partitioned_bucketed_sorted | |

|Partition Values |[part=1] | |

|Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted/part=1 | |

|Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |

|InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | |

|OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | |

|Storage Properties |[path=file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted, serialization.format=1]| |

|Partition Parameters |{totalSize=1870, numFiles=5, transient_lastDdlTime=1538381329} | |

|Partition Statistics |1870 bytes | |

| | | |

|# Storage Information | | |

|Num Buckets |5 | |

|Bucket Columns |[`id`] | |

|Sort Columns |[`id`] | |

|Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted | |

|Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | |

|InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | |

|OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | |

|Storage Properties |[serialization.format=1] | |

+--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-------+

Executing Logical Command — `run` Method



run(sparkSession: SparkSession): Seq[Row]

run(sparkSession: SparkSession): Seq[Row]

Note	`run` is part of the RunnableCommand Contract to execute (run) a logical command.

run uses the SessionCatalog (of the SessionState of the input SparkSession) and branches off per the type of the table to display.

For a temporary view, run requests the SessionCatalog to lookupRelation to access the schema and describeSchema.

For all other table types, run does the following:

Requests the SessionCatalog to retrieve the table metadata from the external catalog (metastore) (as a CatalogTable) and describeSchema (with the schema)
describePartitionInfo
describeDetailedPartitionInfo if the TablePartitionSpec is available or describeFormattedTableInfo when isExtended flag is on

Describing Detailed Partition and Storage Information — `describeFormattedDetailedPartitionInfo` Internal Method



describeFormattedDetailedPartitionInfo(
  tableIdentifier: TableIdentifier,
  table: CatalogTable,
  partition: CatalogTablePartition,
  buffer: ArrayBuffer[Row]): Unit

describeFormattedDetailedPartitionInfo(

tableIdentifier: TableIdentifier,

table: CatalogTable,

partition: CatalogTablePartition,

buffer: ArrayBuffer[Row]): Unit

describeFormattedDetailedPartitionInfo simply adds the following entries (rows) to the input mutable buffer:

A new line
# Detailed Partition Information
Database with the database of the given table
Table with the table of the given tableIdentifier
Partition specification (of the CatalogTablePartition)
A new line
# Storage Information
Bucketing specification of the table (if defined)
Storage specification of the table

Note	`describeFormattedDetailedPartitionInfo` is used exclusively when `DescribeTableCommand` is requested to describeDetailedPartitionInfo with a non-empty partitionSpec and the isExtended flag on.

Describing Detailed Table Information — `describeFormattedTableInfo` Internal Method



describeFormattedTableInfo(table: CatalogTable, buffer: ArrayBuffer[Row]): Unit

describeFormattedTableInfo(table: CatalogTable, buffer: ArrayBuffer[Row]): Unit

describeFormattedTableInfo…FIXME

Note	`describeFormattedTableInfo` is used exclusively when `DescribeTableCommand` is requested to run for a non-temporary table and the isExtended flag on.

`describeDetailedPartitionInfo` Internal Method



describeDetailedPartitionInfo(
  tableIdentifier: TableIdentifier,
  table: CatalogTable,
  partition: CatalogTablePartition,
  buffer: ArrayBuffer[Row]): Unit

describeDetailedPartitionInfo(

tableIdentifier: TableIdentifier,

table: CatalogTable,

partition: CatalogTablePartition,

buffer: ArrayBuffer[Row]): Unit

describeDetailedPartitionInfo…FIXME

Note	`describeDetailedPartitionInfo` is used exclusively when `DescribeTableCommand` is requested to run with a non-empty partitionSpec.

Creating DescribeTableCommand Instance

DescribeTableCommand takes the following when created:

TableIdentifier
TablePartitionSpec
isExtended flag

DescribeTableCommand initializes the internal registries and counters.

`describeSchema` Internal Method



describeSchema(
  schema: StructType,
  buffer: ArrayBuffer[Row],
  header: Boolean): Unit

describeSchema(

schema: StructType,

buffer: ArrayBuffer[Row],

header: Boolean): Unit

describeSchema…FIXME

Note	`describeSchema` is used when…FIXME

Describing Partition Information — `describePartitionInfo` Internal Method



describePartitionInfo(table: CatalogTable, buffer: ArrayBuffer[Row]): Unit

describePartitionInfo(table: CatalogTable, buffer: ArrayBuffer[Row]): Unit

describePartitionInfo…FIXME

Note	`describePartitionInfo` is used when…FIXME

DescribeTableCommand

DescribeTableCommand Logical Command

Executing Logical Command — `run` Method

Describing Detailed Partition and Storage Information — `describeFormattedDetailedPartitionInfo` Internal Method

Describing Detailed Table Information — `describeFormattedTableInfo` Internal Method

`describeDetailedPartitionInfo` Internal Method

Creating DescribeTableCommand Instance

`describeSchema` Internal Method

Describing Partition Information — `describePartitionInfo` Internal Method

相关推荐

欢迎关注：spark技术分享

热门标签

近期文章

分类目录

关注公众号：spark技术分享

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部

DescribeTableCommand Logical Command

Executing Logical Command — run Method

Describing Detailed Partition and Storage Information — describeFormattedDetailedPartitionInfo Internal Method

Describing Detailed Table Information — describeFormattedTableInfo Internal Method

describeDetailedPartitionInfo Internal Method

Creating DescribeTableCommand Instance

describeSchema Internal Method

Describing Partition Information — describePartitionInfo Internal Method

相关推荐

欢迎关注：spark技术分享

热门标签

近期文章

分类目录

关注公众号：spark技术分享

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部

Executing Logical Command — `run` Method

Describing Detailed Partition and Storage Information — `describeFormattedDetailedPartitionInfo` Internal Method

Describing Detailed Table Information — `describeFormattedTableInfo` Internal Method

`describeDetailedPartitionInfo` Internal Method

`describeSchema` Internal Method

Describing Partition Information — `describePartitionInfo` Internal Method