DescribeColumnCommand Logical Command for DESCRIBE TABLE SQL Command with Column
DescribeColumnCommand is a logical command for DESCRIBE TABLE SQL command with a single column only (i.e. no PARTITION specification).
|
1 2 3 4 5 |
[DESC|DESCRIBE] TABLE? [EXTENDED|FORMATTED] table_name column_name |
|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 |
// Make the example reproducible val tableName = "t1" import org.apache.spark.sql.catalyst.TableIdentifier val tableId = TableIdentifier(tableName) val sessionCatalog = spark.sessionState.catalog sessionCatalog.dropTable(tableId, ignoreIfNotExists = true, purge = true) val df = Seq((0, 0.0, "zero"), (1, 1.4, "one")).toDF("id", "p1", "p2") df.write.saveAsTable("t1") // DescribeColumnCommand represents DESC EXTENDED tableName colName SQL command val descExtSQL = "DESC EXTENDED t1 p1" val plan = spark.sql(descExtSQL).queryExecution.logical import org.apache.spark.sql.execution.command.DescribeColumnCommand val cmd = plan.asInstanceOf[DescribeColumnCommand] scala> println(cmd) DescribeColumnCommand `t1`, [p1], true scala> spark.sql(descExtSQL).show +--------------+----------+ | info_name|info_value| +--------------+----------+ | col_name| p1| | data_type| double| | comment| NULL| | min| NULL| | max| NULL| | num_nulls| NULL| |distinct_count| NULL| | avg_col_len| NULL| | max_col_len| NULL| | histogram| NULL| +--------------+----------+ // Run ANALYZE TABLE...FOR COLUMNS SQL command to compute the column statistics val allCols = df.columns.mkString(",") val analyzeTableSQL = s"ANALYZE TABLE $tableName COMPUTE STATISTICS FOR COLUMNS $allCols" spark.sql(analyzeTableSQL) scala> spark.sql(descExtSQL).show +--------------+----------+ | info_name|info_value| +--------------+----------+ | col_name| p1| | data_type| double| | comment| NULL| | min| 0.0| | max| 1.4| | num_nulls| 0| |distinct_count| 2| | avg_col_len| 8| | max_col_len| 8| | histogram| NULL| +--------------+----------+ |
DescribeColumnCommand defines the output schema with the following columns:
-
info_namewith “name of the column info” comment -
info_valuewith “value of the column info” comment
|
Note
|
DescribeColumnCommand is described by describeTable labeled alternative in statement expression in SqlBase.g4 and parsed using SparkSqlParser.
|
Executing Logical Command (Describing Column with Optional Statistics) — run Method
|
1 2 3 4 5 |
run(session: SparkSession): Seq[Row] |
|
Note
|
run is part of RunnableCommand Contract to execute (run) a logical command.
|
run resolves the column name in table and makes sure that it is a “flat” field (i.e. not of a nested data type).
run requests the SessionCatalog for the table metadata.
|
Note
|
run uses the input SparkSession to access SessionState that in turn is used to access the SessionCatalog.
|
run takes the column statistics from the table statistics if available.
|
Note
|
Column statistics are available (in the table statistics) only after ANALYZE TABLE FOR COLUMNS SQL command was run. |
run adds comment metadata if available for the column.
run gives the following rows (in that order):
-
col_name -
data_type -
comment
If DescribeColumnCommand command was executed with EXTENDED or FORMATTED option, run gives the following additional rows (in that order):
-
min -
max -
num_nulls -
distinct_count -
avg_col_len -
max_col_len
run gives NULL for the value of the comment and statistics if not available.
histogramDescription Internal Method
|
1 2 3 4 5 |
histogramDescription(histogram: Histogram): Seq[Row] |
histogramDescription…FIXME
|
Note
|
histogramDescription is used exclusively when DescribeColumnCommand is executed with EXTENDED or FORMATTED option turned on.
|
Creating DescribeColumnCommand Instance
DescribeColumnCommand takes the following when created:
-
isExtendedflag that indicates whether EXTENDED or FORMATTED option was used or not
spark技术分享