Statistics — Estimates of Plan Statistics and Query Hints
Statistics
holds the statistics estimates and query hints of a logical operator:
Note
|
Cost statistics, plan statistics or query statistics are all synonyms and used interchangeably. |
You can access statistics and query hints of a logical plan using stats property.
1 2 3 4 5 6 7 8 9 10 11 12 13 |
val q = spark.range(5).hint("broadcast").join(spark.range(1), "id") val plan = q.queryExecution.optimizedPlan val stats = plan.stats scala> :type stats org.apache.spark.sql.catalyst.plans.logical.Statistics scala> println(stats.simpleString) sizeInBytes=213.0 B, hints=none |
Note
|
Use ANALYZE TABLE COMPUTE STATISTICS SQL command to compute total size and row count statistics of a table. |
Note
|
Use ANALYZE TABLE COMPUTE STATISTICS FOR COLUMNS SQL Command to generate column (equi-height) histograms of a table. |
Note
|
Use Dataset.hint or SELECT SQL statement with hints to specify query hints.
|
Statistics
is created when:
-
Leaf logical operators (specifically) and logical operators (in general) are requested for statistics estimates
-
HiveTableRelation and LogicalRelation are requested for statistics estimates (through CatalogStatistics)
Note
|
row count estimate is used in CostBasedJoinReorder logical optimization when cost-based optimization is enabled. |
Note
|
CatalogStatistics is a “subset” of all possible
|
Statistics
comes with simpleString
method that is used for the readable text representation (that is toString
with Statistics prefix).
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import org.apache.spark.sql.catalyst.plans.logical.Statistics import org.apache.spark.sql.catalyst.plans.logical.HintInfo val stats = Statistics(sizeInBytes = 10, rowCount = Some(20), hints = HintInfo(broadcast = true)) scala> println(stats) Statistics(sizeInBytes=10.0 B, rowCount=20, hints=(broadcast)) scala> println(stats.simpleString) sizeInBytes=10.0 B, rowCount=20, hints=(broadcast) |