AnalyzeTableCommand Logical Command — Computing Table-Level Statistics (Total Size and Row Count)
AnalyzeTableCommand
is a logical command that computes statistics (i.e. total size and row count) for a table and stores the stats in a metastore.
AnalyzeTableCommand
is created exclusively for ANALYZE TABLE with no PARTITION
specification and FOR COLUMNS
clause.
1 2 3 4 5 6 7 8 9 10 11 |
// Seq((0, 0, "zero"), (1, 1, "one")).toDF("id", "p1", "p2").write.partitionBy("p1", "p2").saveAsTable("t1") val sqlText = "ANALYZE TABLE t1 COMPUTE STATISTICS NOSCAN" val plan = spark.sql(sqlText).queryExecution.logical import org.apache.spark.sql.execution.command.AnalyzeTableCommand val cmd = plan.asInstanceOf[AnalyzeTableCommand] scala> println(cmd) AnalyzeTableCommand `t1`, false |
Executing Logical Command (Computing Table-Level Statistics and Altering Metastore) — run
Method
1 2 3 4 5 |
run(sparkSession: SparkSession): Seq[Row] |
Note
|
run is part of RunnableCommand Contract to execute (run) a logical command.
|
run
requests the session-specific SessionCatalog
for the metadata of the table and makes sure that it is not a view (aka temporary table).
Note
|
run uses the input SparkSession to access the session-specific SessionState that in turn gives access to the current SessionCatalog.
|
run
computes the total size and, without NOSCAN flag, the row count statistics of the table.
Note
|
run uses SparkSession to find the table in a metastore.
|
In the end, run
alters table statistics if different from the existing table statistics in metastore.
run
throws a AnalysisException
when executed on a view.
1 2 3 4 5 |
ANALYZE TABLE is not supported on views. |
Note
|
Row count statistics triggers a Spark job to count the number of rows in a table (that happens with
|
Creating AnalyzeTableCommand Instance
AnalyzeTableCommand
takes the following when created:
-
noscan
flag (enabled by default) that indicates whether NOSCAN option was used or not