关注 spark技术分享,
撸spark源码 玩spark最佳实践

LogicalPlanStats — Statistics Estimates and Query Hints of Logical Operator

LogicalPlanStats — Statistics Estimates and Query Hints of Logical Operator

LogicalPlanStats adds statistics support to logical operators and is used for query planning (with or without cost-based optimization, e.g. CostBasedJoinReorder or JoinSelection, respectively).

With LogicalPlanStats every logical operator has statistics that are computed only once when requested and are cached until invalidated and requested again.

Depending on cost-based optimization being enabled or not, stats computes the statistics with FIXME or FIXME, respectively.

Note
Cost-based optimization is enabled when spark.sql.cbo.enabled configuration property is turned on, i.e. true, and is disabled by default.

Use EXPLAIN COST SQL command to explain a query with the statistics.

You can also access the statistics of a logical plan directly using stats method or indirectly requesting QueryExecution for text representation with statistics.

Note
The statistics of a Dataset are unaffected by caching it.
Note
LogicalPlanStats is a Scala trait with self: LogicalPlan as part of its definition. It is a very useful feature of Scala that restricts the set of classes that the trait could be used with (as well as makes the target subtype known at compile time).

Computing (and Caching) Statistics and Query Hints — stats Method

stats gets the statistics from statsCache if already computed. Otherwise, stats branches off per whether cost-based optimization is enabled or not.

Note

Cost-based optimization is enabled when spark.sql.cbo.enabled configuration property is turned on, i.e. true, and is disabled by default.


Use SQLConf.cboEnabled to access the current value of spark.sql.cbo.enabled property.

With cost-based optimization disabled stats requests SizeInBytesOnlyStatsPlanVisitor to compute the statistics.

With cost-based optimization enabled stats requests BasicStatsPlanVisitor to compute the statistics.

In the end, statsCache caches the statistics for later use.

Note

stats is used when:

Invalidating Statistics Cache (of All Operators in Logical Plan) — invalidateStatsCache Method

invalidateStatsCache clears statsCache of the current logical operators followed by requesting the child logical operators for the same.

赞(0) 打赏
未经允许不得转载:spark技术分享 » LogicalPlanStats — Statistics Estimates and Query Hints of Logical Operator
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏