BucketSpec — Bucketing Specification of Table
BucketSpec is the bucketing specification of a table, i.e. the metadata of the bucketing of a table.
BucketSpec includes the following:
The number of buckets has to be between 0 and 100000 exclusive (or an AnalysisException is thrown).
BucketSpec is created when:
-
DataFrameWriteris requested to saveAsTable (and does getBucketSpec) -
HiveExternalCatalogis requested to getBucketSpecFromTableProperties and tableMetaToTableProps -
HiveClientImplis requested to retrieve a table metadata -
SparkSqlAstBuilderis requested to visitBucketSpec (forCREATE TABLESQL statement withCLUSTERED BYandINTO n BUCKETSwith optionalSORTED BYclauses)
BucketSpec uses the following text representation (i.e. toString):
|
1 2 3 4 5 |
[numBuckets] buckets, bucket columns: [[bucketColumnNames]], sort columns: [[sortColumnNames]] |
|
1 2 3 4 5 6 7 8 9 10 11 |
import org.apache.spark.sql.catalyst.catalog.BucketSpec val bucketSpec = BucketSpec( numBuckets = 8, bucketColumnNames = Seq("col1"), sortColumnNames = Seq("col2")) scala> println(bucketSpec) 8 buckets, bucket columns: [col1], sort columns: [col2] |
Converting Bucketing Specification to LinkedHashMap — toLinkedHashMap Method
|
1 2 3 4 5 |
toLinkedHashMap: mutable.LinkedHashMap[String, String] |
toLinkedHashMap converts the bucketing specification to a collection of pairs (LinkedHashMap[String, String]) with the following fields and their values:
-
Num Buckets with the numBuckets
-
Bucket Columns with the bucketColumnNames
-
Sort Columns with the sortColumnNames
toLinkedHashMap quotes the column names.
|
1 2 3 4 5 6 |
scala> println(bucketSpec.toLinkedHashMap) Map(Num Buckets -> 8, Bucket Columns -> [`col1`], Sort Columns -> [`col2`]) |
|
Note
|
|
spark技术分享