DescribeColumnCommand Logical Command for DESCRIBE TABLE SQL Command with Column
DescribeColumnCommand
is a logical command for DESCRIBE TABLE SQL command with a single column only (i.e. no PARTITION
specification).
1 2 3 4 5 |
[DESC|DESCRIBE] TABLE? [EXTENDED|FORMATTED] table_name column_name |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
// Make the example reproducible val tableName = "t1" import org.apache.spark.sql.catalyst.TableIdentifier val tableId = TableIdentifier(tableName) val sessionCatalog = spark.sessionState.catalog sessionCatalog.dropTable(tableId, ignoreIfNotExists = true, purge = true) val df = Seq((0, 0.0, "zero"), (1, 1.4, "one")).toDF("id", "p1", "p2") df.write.saveAsTable("t1") // DescribeColumnCommand represents DESC EXTENDED tableName colName SQL command val descExtSQL = "DESC EXTENDED t1 p1" val plan = spark.sql(descExtSQL).queryExecution.logical import org.apache.spark.sql.execution.command.DescribeColumnCommand val cmd = plan.asInstanceOf[DescribeColumnCommand] scala> println(cmd) DescribeColumnCommand `t1`, [p1], true scala> spark.sql(descExtSQL).show +--------------+----------+ | info_name|info_value| +--------------+----------+ | col_name| p1| | data_type| double| | comment| NULL| | min| NULL| | max| NULL| | num_nulls| NULL| |distinct_count| NULL| | avg_col_len| NULL| | max_col_len| NULL| | histogram| NULL| +--------------+----------+ // Run ANALYZE TABLE...FOR COLUMNS SQL command to compute the column statistics val allCols = df.columns.mkString(",") val analyzeTableSQL = s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR COLUMNS $allCols" spark.sql(analyzeTableSQL) scala> spark.sql(descExtSQL).show +--------------+----------+ | info_name|info_value| +--------------+----------+ | col_name| p1| | data_type| double| | comment| NULL| | min| 0.0| | max| 1.4| | num_nulls| 0| |distinct_count| 2| | avg_col_len| 8| | max_col_len| 8| | histogram| NULL| +--------------+----------+ |
DescribeColumnCommand
defines the output schema with the following columns:
-
info_name
with “name of the column info” comment -
info_value
with “value of the column info” comment
Note
|
DescribeColumnCommand is described by describeTable labeled alternative in statement expression in SqlBase.g4 and parsed using SparkSqlParser.
|
Executing Logical Command (Describing Column with Optional Statistics) — run
Method
1 2 3 4 5 |
run(session: SparkSession): Seq[Row] |
Note
|
run is part of RunnableCommand Contract to execute (run) a logical command.
|
run
resolves the column name in table and makes sure that it is a “flat” field (i.e. not of a nested data type).
run
requests the SessionCatalog
for the table metadata.
Note
|
run uses the input SparkSession to access SessionState that in turn is used to access the SessionCatalog.
|
run
takes the column statistics from the table statistics if available.
Note
|
Column statistics are available (in the table statistics) only after ANALYZE TABLE FOR COLUMNS SQL command was run. |
run
adds comment
metadata if available for the column.
run
gives the following rows (in that order):
-
col_name
-
data_type
-
comment
If DescribeColumnCommand
command was executed with EXTENDED or FORMATTED option, run
gives the following additional rows (in that order):
-
min
-
max
-
num_nulls
-
distinct_count
-
avg_col_len
-
max_col_len
run
gives NULL
for the value of the comment and statistics if not available.
histogramDescription
Internal Method
1 2 3 4 5 |
histogramDescription(histogram: Histogram): Seq[Row] |
histogramDescription
…FIXME
Note
|
histogramDescription is used exclusively when DescribeColumnCommand is executed with EXTENDED or FORMATTED option turned on.
|
Creating DescribeColumnCommand Instance
DescribeColumnCommand
takes the following when created:
-
isExtended
flag that indicates whether EXTENDED or FORMATTED option was used or not