关注 spark技术分享,
撸spark源码 玩spark最佳实践

AnalyzeColumnCommand

AnalyzeColumnCommand Logical Command for ANALYZE TABLE…COMPUTE STATISTICS FOR COLUMNS SQL Command

AnalyzeColumnCommand is a logical command for ANALYZE TABLE with FOR COLUMNS clause (and no PARTITION specification).

AnalyzeColumnCommand can generate column histograms when spark.sql.statistics.histogram.enabled configuration property is turned on (which is disabled by default). AnalyzeColumnCommand supports column histograms for the following data types:

  • IntegralType

  • DecimalType

  • DoubleType

  • FloatType

  • DateType

  • TimestampType

Note
Histograms can provide better estimation accuracy. Currently, Spark only supports equi-height histogram. Note that collecting histograms takes extra cost. For example, collecting column statistics usually takes only one table scan, but generating equi-height histogram will cause an extra table scan.

Note
AnalyzeColumnCommand is described by analyze labeled alternative in statement expression in SqlBase.g4 and parsed using SparkSqlAstBuilder.
Note
AnalyzeColumnCommand is not supported on views.

Executing Logical Command — run Method

Note
run is part of RunnableCommand Contract to execute (run) a logical command.

run calculates the following statistics:

  • sizeInBytes

  • stats for each column

Caution
FIXME

Computing Statistics for Specified Columns — computeColumnStats Internal Method

computeColumnStats…​FIXME

Note
computeColumnStats is used exclusively when AnalyzeColumnCommand is executed.

computePercentiles Internal Method

computePercentiles…​FIXME

Note
computePercentiles is used exclusively when AnalyzeColumnCommand is executed (and computes column statistics).

Creating AnalyzeColumnCommand Instance

AnalyzeColumnCommand takes the following when created:

  • TableIdentifier

  • Column names

赞(0) 打赏
未经允许不得转载:spark技术分享 » AnalyzeColumnCommand
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏