DescribeTableCommand Logical Command
DescribeTableCommand is a logical command that executes a DESCRIBE TABLE SQL statement.
DescribeTableCommand is created exclusively when SparkSqlAstBuilder is requested to parse DESCRIBE TABLE SQL statement (with no column specified).
DescribeTableCommand uses the following output schema:
-
col_nameas the name of the column -
data_typeas the data type of the column -
commentas the comment of the column
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 |
spark.range(1).createOrReplaceTempView("demo") // DESC view scala> sql("DESC EXTENDED demo").show +--------+---------+-------+ |col_name|data_type|comment| +--------+---------+-------+ | id| bigint| null| +--------+---------+-------+ // DESC table // Make the demo reproducible spark.sharedState.externalCatalog.dropTable( db = "default", table = "bucketed", ignoreIfNotExists = true, purge = true) spark.range(10).write.bucketBy(5, "id").saveAsTable("bucketed") assert(spark.catalog.tableExists("bucketed")) // EXTENDED to include Detailed Table Information // Note no partitions used // Could also be FORMATTED scala> sql("DESC EXTENDED bucketed").show(numRows = 50, truncate = false) +----------------------------+-----------------------------------------------------------------------------+-------+ |col_name |data_type |comment| +----------------------------+-----------------------------------------------------------------------------+-------+ |id |bigint |null | | | | | |# Detailed Table Information| | | |Database |default | | |Table |bucketed | | |Owner |jacek | | |Created Time |Sun Sep 30 20:57:22 CEST 2018 | | |Last Access |Thu Jan 01 01:00:00 CET 1970 | | |Created By |Spark 2.3.1 | | |Type |MANAGED | | |Provider |parquet | | |Num Buckets |5 | | |Bucket Columns |[`id`] | | |Sort Columns |[] | | |Table Properties |[transient_lastDdlTime=1538333842] | | |Statistics |3740 bytes | | |Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/bucketed| | |Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | | |Storage Properties |[serialization.format=1] | | +----------------------------+-----------------------------------------------------------------------------+-------+ // Make the demo reproducible val tableName = "partitioned_bucketed_sorted" val partCol = "part" spark.sharedState.externalCatalog.dropTable( db = "default", table = tableName, ignoreIfNotExists = true, purge = true) spark.range(10) .withColumn("part", $"id" % 2) // extra column for partitions .write .partitionBy(partCol) .bucketBy(5, "id") .sortBy("id") .saveAsTable(tableName) assert(spark.catalog.tableExists(tableName)) scala> sql(s"DESC EXTENDED $tableName").show(numRows = 50, truncate = false) +----------------------------+------------------------------------------------------------------------------------------------+-------+ |col_name |data_type |comment| +----------------------------+------------------------------------------------------------------------------------------------+-------+ |id |bigint |null | |part |bigint |null | |# Partition Information | | | |# col_name |data_type |comment| |part |bigint |null | | | | | |# Detailed Table Information| | | |Database |default | | |Table |partitioned_bucketed_sorted | | |Owner |jacek | | |Created Time |Mon Oct 01 10:05:32 CEST 2018 | | |Last Access |Thu Jan 01 01:00:00 CET 1970 | | |Created By |Spark 2.3.1 | | |Type |MANAGED | | |Provider |parquet | | |Num Buckets |5 | | |Bucket Columns |[`id`] | | |Sort Columns |[`id`] | | |Table Properties |[transient_lastDdlTime=1538381132] | | |Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted| | |Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | | |Storage Properties |[serialization.format=1] | | |Partition Provider |Catalog | | +----------------------------+------------------------------------------------------------------------------------------------+-------+ scala> sql(s"DESCRIBE EXTENDED $tableName PARTITION ($partCol=1)").show(numRows = 50, truncate = false) +--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-------+ |col_name |data_type |comment| +--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-------+ |id |bigint |null | |part |bigint |null | |# Partition Information | | | |# col_name |data_type |comment| |part |bigint |null | | | | | |# Detailed Partition Information| | | |Database |default | | |Table |partitioned_bucketed_sorted | | |Partition Values |[part=1] | | |Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted/part=1 | | |Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | | |Storage Properties |[path=file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted, serialization.format=1]| | |Partition Parameters |{totalSize=1870, numFiles=5, transient_lastDdlTime=1538381329} | | |Partition Statistics |1870 bytes | | | | | | |# Storage Information | | | |Num Buckets |5 | | |Bucket Columns |[`id`] | | |Sort Columns |[`id`] | | |Location |file:/Users/jacek/dev/apps/spark-2.3.1-bin-hadoop2.7/spark-warehouse/partitioned_bucketed_sorted | | |Serde Library |org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe | | |InputFormat |org.apache.hadoop.mapred.SequenceFileInputFormat | | |OutputFormat |org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat | | |Storage Properties |[serialization.format=1] | | +--------------------------------+-------------------------------------------------------------------------------------------------------------------------------+-------+ |
Executing Logical Command — run Method
|
1 2 3 4 5 |
run(sparkSession: SparkSession): Seq[Row] |
|
Note
|
run is part of the RunnableCommand Contract to execute (run) a logical command.
|
run uses the SessionCatalog (of the SessionState of the input SparkSession) and branches off per the type of the table to display.
For a temporary view, run requests the SessionCatalog to lookupRelation to access the schema and describeSchema.
For all other table types, run does the following:
-
Requests the
SessionCatalogto retrieve the table metadata from the external catalog (metastore) (as a CatalogTable) and describeSchema (with the schema) -
describeDetailedPartitionInfo if the TablePartitionSpec is available or describeFormattedTableInfo when isExtended flag is on
Describing Detailed Partition and Storage Information — describeFormattedDetailedPartitionInfo Internal Method
|
1 2 3 4 5 6 7 8 9 |
describeFormattedDetailedPartitionInfo( tableIdentifier: TableIdentifier, table: CatalogTable, partition: CatalogTablePartition, buffer: ArrayBuffer[Row]): Unit |
describeFormattedDetailedPartitionInfo simply adds the following entries (rows) to the input mutable buffer:
-
A new line
-
# Detailed Partition Information
-
Database with the database of the given
table -
Table with the table of the given
tableIdentifier -
A new line
-
# Storage Information
-
Bucketing specification of the table (if defined)
-
Storage specification of the table
|
Note
|
describeFormattedDetailedPartitionInfo is used exclusively when DescribeTableCommand is requested to describeDetailedPartitionInfo with a non-empty partitionSpec and the isExtended flag on.
|
Describing Detailed Table Information — describeFormattedTableInfo Internal Method
|
1 2 3 4 5 |
describeFormattedTableInfo(table: CatalogTable, buffer: ArrayBuffer[Row]): Unit |
describeFormattedTableInfo…FIXME
|
Note
|
describeFormattedTableInfo is used exclusively when DescribeTableCommand is requested to run for a non-temporary table and the isExtended flag on.
|
describeDetailedPartitionInfo Internal Method
|
1 2 3 4 5 6 7 8 9 |
describeDetailedPartitionInfo( tableIdentifier: TableIdentifier, table: CatalogTable, partition: CatalogTablePartition, buffer: ArrayBuffer[Row]): Unit |
describeDetailedPartitionInfo…FIXME
|
Note
|
describeDetailedPartitionInfo is used exclusively when DescribeTableCommand is requested to run with a non-empty partitionSpec.
|
Creating DescribeTableCommand Instance
DescribeTableCommand takes the following when created:
DescribeTableCommand initializes the internal registries and counters.
spark技术分享