AnalyzePartitionCommand Logical Command — Computing Partition-Level Statistics (Total Size and Row Count)
AnalyzePartitionCommand
is a logical command that computes statistics (i.e. total size and row count) for table partitions and stores the stats in a metastore.
AnalyzePartitionCommand
is created exclusively for ANALYZE TABLE with PARTITION
specification only (i.e. no FOR COLUMNS
clause).
1 2 3 4 5 6 7 8 9 10 11 |
// Seq((0, 0, "zero"), (1, 1, "one")).toDF("id", "p1", "p2").write.partitionBy("p1", "p2").saveAsTable("t1") val analyzeTable = "ANALYZE TABLE t1 PARTITION (p1, p2) COMPUTE STATISTICS" val plan = spark.sql(analyzeTable).queryExecution.logical import org.apache.spark.sql.execution.command.AnalyzePartitionCommand val cmd = plan.asInstanceOf[AnalyzePartitionCommand] scala> println(cmd) AnalyzePartitionCommand `t1`, Map(p1 -> None, p2 -> None), false |
Executing Logical Command (Computing Partition-Level Statistics and Altering Metastore) — run
Method
1 2 3 4 5 |
run(sparkSession: SparkSession): Seq[Row] |
Note
|
run is part of RunnableCommand Contract to execute (run) a logical command.
|
run
requests the session-specific SessionCatalog
for the metadata of the table and makes sure that it is not a view.
Note
|
run uses the input SparkSession to access the session-specific SessionState that in turn is used to access the current SessionCatalog.
|
run
getPartitionSpec.
run
requests the session-specific SessionCatalog
for the partitions per the partition specification.
run
finishes when the table has no partitions defined in a metastore.
run
computes row count statistics per partition unless noscan flag was enabled.
run
calculates total size (in bytes) (aka partition location size) for every table partition and creates a CatalogStatistics with the current statistics if different from the statistics recorded in the metastore (with a new row count statistic computed earlier).
In the end, run
alters table partition metadata for partitions with the statistics changed.
run
reports a NoSuchPartitionException
when partitions do not match the metastore.
run
reports an AnalysisException
when executed on a view.
1 2 3 4 5 |
ANALYZE TABLE is not supported on views. |
Computing Row Count Statistics Per Partition — calculateRowCountsPerPartition
Internal Method
1 2 3 4 5 6 7 8 |
calculateRowCountsPerPartition( sparkSession: SparkSession, tableMeta: CatalogTable, partitionValueSpec: Option[TablePartitionSpec]): Map[TablePartitionSpec, BigInt] |
calculateRowCountsPerPartition
…FIXME
Note
|
calculateRowCountsPerPartition is used exclusively when AnalyzePartitionCommand is executed.
|
getPartitionSpec
Internal Method
1 2 3 4 5 |
getPartitionSpec(table: CatalogTable): Option[TablePartitionSpec] |
getPartitionSpec
…FIXME
Note
|
getPartitionSpec is used exclusively when AnalyzePartitionCommand is executed.
|
Creating AnalyzePartitionCommand Instance
AnalyzePartitionCommand
takes the following when created:
-
noscan
flag (enabled by default) that indicates whether NOSCAN option was used or not