BucketSpec — Bucketing Specification of Table
BucketSpec
is the bucketing specification of a table, i.e. the metadata of the bucketing of a table.
BucketSpec
includes the following:
The number of buckets has to be between 0
and 100000
exclusive (or an AnalysisException
is thrown).
BucketSpec
is created when:
-
DataFrameWriter
is requested to saveAsTable (and does getBucketSpec) -
HiveExternalCatalog
is requested to getBucketSpecFromTableProperties and tableMetaToTableProps -
HiveClientImpl
is requested to retrieve a table metadata -
SparkSqlAstBuilder
is requested to visitBucketSpec (forCREATE TABLE
SQL statement withCLUSTERED BY
andINTO n BUCKETS
with optionalSORTED BY
clauses)
BucketSpec
uses the following text representation (i.e. toString
):
1 2 3 4 5 |
[numBuckets] buckets, bucket columns: [[bucketColumnNames]], sort columns: [[sortColumnNames]] |
1 2 3 4 5 6 7 8 9 10 11 |
import org.apache.spark.sql.catalyst.catalog.BucketSpec val bucketSpec = BucketSpec( numBuckets = 8, bucketColumnNames = Seq("col1"), sortColumnNames = Seq("col2")) scala> println(bucketSpec) 8 buckets, bucket columns: [col1], sort columns: [col2] |
Converting Bucketing Specification to LinkedHashMap — toLinkedHashMap
Method
1 2 3 4 5 |
toLinkedHashMap: mutable.LinkedHashMap[String, String] |
toLinkedHashMap
converts the bucketing specification to a collection of pairs (LinkedHashMap[String, String]
) with the following fields and their values:
-
Num Buckets with the numBuckets
-
Bucket Columns with the bucketColumnNames
-
Sort Columns with the sortColumnNames
toLinkedHashMap
quotes the column names.
1 2 3 4 5 6 |
scala> println(bucketSpec.toLinkedHashMap) Map(Num Buckets -> 8, Bucket Columns -> [`col1`], Sort Columns -> [`col2`]) |
Note
|
|