DescribeTableCommand Logical Command
DescribeTableCommand
is a logical command that executes a DESCRIBE TABLE
SQL statement.
DescribeTableCommand
is created exclusively when SparkSqlAstBuilder
is requested to parse DESCRIBE TABLE SQL statement (with no column specified).
DescribeTableCommand
uses the following output schema:
-
col_name
as the name of the column -
data_type
as the data type of the column -
comment
as the comment of the column
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
spark.range(1).createOrReplaceTempView("demo") // DESC view scala> sql("DESC EXTENDED demo").show +--------+---------+-------+ |col_name|data_type|comment| +--------+---------+-------+ | id| bigint| null| +--------+---------+-------+ // DESC table // Make the demo reproducible spark.sharedState.externalCatalog.dropTable( db = "default", table = "bucketed", ignoreIfNotExists = true, purge = true) spark.range(10).write.bucketBy(5, "id").saveAsTable("bucketed") assert(spark.catalog.tableExists("bucketed")) // EXTENDED to include Detailed Table Information // Note no partitions used // Could also be FORMATTED scala> sql("DESC EXTENDED bucketed").show(numRows = 50, truncate = false) +----------------------------+-----------------------------------------------------------------------------+-------+ |col_name |data_type |comment| +----------------------------+-----------------------------------------------------------------------------+-------+ |id |bigint |null | | | | | |# Detailed Table Information| | | |Database |default | | |Table |bucketed | | |Owner |jacek | | |Created Time |Sun Sep 30 20:57:22 CEST 2018 | | |Last Access |Thu Jan 01 01:00:00 CET 1970 | | |Created By |Spark 2.3.1 | | |Type |MANAGED | | |Provider |parquet | | |Num Buckets |5 | | |Bucket Columns |[`id`] | | |Sort Columns |[] | | |Table Properties |[transient_lastDdlTime=1538333842] | | |Statistics |3740 bytes | | |Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/bucketed| | |Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | | |Storage Properties |[serialization.format=1] | | +----------------------------+-----------------------------------------------------------------------------+-------+ // Make the demo reproducible val tableName = "partitioned_bucketed_sorted" val partCol = "part" spark.sharedState.externalCatalog.dropTable( db = "default", table = tableName, ignoreIfNotExists = true, purge = true) spark.range(10) .withColumn("part", $"id" % 2) // extra column for partitions .write .partitionBy(partCol) .bucketBy(5, "id") .sortBy("id") .saveAsTable(tableName) assert(spark.catalog.tableExists(tableName)) scala> sql(s"DESC EXTENDED $tableName").show(numRows = 50, truncate = false) +----------------------------+------------------------------------------------------------------------------------------------+-------+ |col_name |data_type |comment| +----------------------------+------------------------------------------------------------------------------------------------+-------+ |id |bigint |null | |part |bigint |null | |# Partition Information | | | |# col_name |data_type |comment| |part |bigint |null | | | | | |# Detailed Table Information| | | |Database |default | | |Table |partitioned_bucketed_sorted | | |Owner |jacek | | |Created Time |Mon Oct 01 10:05:32 CEST 2018 | | |Last Access |Thu Jan 01 01:00:00 CET 1970 | | |Created By |Spark 2.3.1 | | |Type |MANAGED | | |Provider |parquet | | |Num Buckets |5 | | |Bucket Columns |[`id`] | | |Sort Columns |[`id`] | | |Table Properties |[transient_lastDdlTime=1538381132] | | |Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted| | |Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | | |Storage Properties |[serialization.format=1] | | |Partition Provider |Catalog | | +----------------------------+------------------------------------------------------------------------------------------------+-------+ scala> sql(s"DESCRIBE EXTENDED $tableName PARTITION ($partCol=1)").show(numRows = 50, truncate = false) +--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-------+ |col_name |data_type |comment| +--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-------+ |id |bigint |null | |part |bigint |null | |# Partition Information | | | |# col_name |data_type |comment| |part |bigint |null | | | | | |# Detailed Partition Information| | | |Database |default | | |Table |partitioned_bucketed_sorted | | |Partition Values |[part=1] | | |Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted/part=1 | | |Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | | |Storage Properties |[path=file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted, serialization.format=1]| | |Partition Parameters |{totalSize=1870, numFiles=5, transient_lastDdlTime=1538381329} | | |Partition Statistics |1870 bytes | | | | | | |# Storage Information | | | |Num Buckets |5 | | |Bucket Columns |[`id`] | | |Sort Columns |[`id`] | | |Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted | | |Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | | |Storage Properties |[serialization.format=1] | | +--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-------+ |
Executing Logical Command — run
Method
1 2 3 4 5 |
run(sparkSession: SparkSession): Seq[Row] |
Note
|
run is part of the RunnableCommand Contract to execute (run) a logical command.
|
run
uses the SessionCatalog (of the SessionState of the input SparkSession) and branches off per the type of the table to display.
For a temporary view, run
requests the SessionCatalog
to lookupRelation to access the schema and describeSchema.
For all other table types, run
does the following:
-
Requests the
SessionCatalog
to retrieve the table metadata from the external catalog (metastore) (as a CatalogTable) and describeSchema (with the schema) -
describeDetailedPartitionInfo if the TablePartitionSpec is available or describeFormattedTableInfo when isExtended flag is on
Describing Detailed Partition and Storage Information — describeFormattedDetailedPartitionInfo
Internal Method
1 2 3 4 5 6 7 8 9 |
describeFormattedDetailedPartitionInfo( tableIdentifier: TableIdentifier, table: CatalogTable, partition: CatalogTablePartition, buffer: ArrayBuffer[Row]): Unit |
describeFormattedDetailedPartitionInfo
simply adds the following entries (rows) to the input mutable buffer:
-
A new line
-
# Detailed Partition Information
-
Database with the database of the given
table
-
Table with the table of the given
tableIdentifier
-
A new line
-
# Storage Information
-
Bucketing specification of the table (if defined)
-
Storage specification of the table
Note
|
describeFormattedDetailedPartitionInfo is used exclusively when DescribeTableCommand is requested to describeDetailedPartitionInfo with a non-empty partitionSpec and the isExtended flag on.
|
Describing Detailed Table Information — describeFormattedTableInfo
Internal Method
1 2 3 4 5 |
describeFormattedTableInfo(table: CatalogTable, buffer: ArrayBuffer[Row]): Unit |
describeFormattedTableInfo
…FIXME
Note
|
describeFormattedTableInfo is used exclusively when DescribeTableCommand is requested to run for a non-temporary table and the isExtended flag on.
|
describeDetailedPartitionInfo
Internal Method
1 2 3 4 5 6 7 8 9 |
describeDetailedPartitionInfo( tableIdentifier: TableIdentifier, table: CatalogTable, partition: CatalogTablePartition, buffer: ArrayBuffer[Row]): Unit |
describeDetailedPartitionInfo
…FIXME
Note
|
describeDetailedPartitionInfo is used exclusively when DescribeTableCommand is requested to run with a non-empty partitionSpec.
|
Creating DescribeTableCommand Instance
DescribeTableCommand
takes the following when created:
DescribeTableCommand
initializes the internal registries and counters.