关注 spark技术分享,
撸spark源码 玩spark最佳实践

BucketSpec — Bucketing Specification of Table

BucketSpec — Bucketing Specification of Table

BucketSpec is the bucketing specification of a table, i.e. the metadata of the bucketing of a table.

BucketSpec includes the following:

  • Number of buckets

  • Bucket column names – the names of the columns used for buckets (at least one)

  • Sort column names – the names of the columns used to sort data in buckets

The number of buckets has to be between 0 and 100000 exclusive (or an AnalysisException is thrown).

BucketSpec is created when:

  1. DataFrameWriter is requested to saveAsTable (and does getBucketSpec)

  2. HiveExternalCatalog is requested to getBucketSpecFromTableProperties and tableMetaToTableProps

  3. HiveClientImpl is requested to retrieve a table metadata

  4. SparkSqlAstBuilder is requested to visitBucketSpec (for CREATE TABLE SQL statement with CLUSTERED BY and INTO n BUCKETS with optional SORTED BY clauses)

BucketSpec uses the following text representation (i.e. toString):

Converting Bucketing Specification to LinkedHashMap — toLinkedHashMap Method

toLinkedHashMap converts the bucketing specification to a collection of pairs (LinkedHashMap[String, String]) with the following fields and their values:

toLinkedHashMap quotes the column names.

Note

toLinkedHashMap is used when:

赞(0) 打赏
未经允许不得转载:spark技术分享 » BucketSpec — Bucketing Specification of Table
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏