关注 spark技术分享,
撸spark源码 玩spark最佳实践

Statistics — Estimates of Plan Statistics and Query Hints

Statistics — Estimates of Plan Statistics and Query Hints

Statistics holds the statistics estimates and query hints of a logical operator:

  • Total (output) size (in bytes)

  • Estimated number of rows (aka row count)

  • Column attribute statistics (aka column (equi-height) histograms)

  • Query hints

Note
Cost statistics, plan statistics or query statistics are all synonyms and used interchangeably.

You can access statistics and query hints of a logical plan using stats property.

Note
Use ANALYZE TABLE COMPUTE STATISTICS SQL command to compute total size and row count statistics of a table.
Note
Use Dataset.hint or SELECT SQL statement with hints to specify query hints.

Statistics is created when:

Note
row count estimate is used in CostBasedJoinReorder logical optimization when cost-based optimization is enabled.
Note

CatalogStatistics is a “subset” of all possible Statistics (as there are no concepts of attributes and query hints in metastore).

CatalogStatistics are statistics stored in an external catalog (usually a Hive metastore) and are often referred as Hive statistics while Statistics represents the Spark statistics.


Statistics comes with simpleString method that is used for the readable text representation (that is toString with Statistics prefix).

赞(0) 打赏
未经允许不得转载:spark技术分享 » Statistics — Estimates of Plan Statistics and Query Hints
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏