KeyValueGroupedDataset — Typed Grouping-spark技术分享

KeyValueGroupedDataset — Typed Grouping

KeyValueGroupedDataset is an experimental interface to calculate aggregates over groups of objects in a typed Dataset.

Note	RelationalGroupedDataset is used for untyped `Row`-based aggregates.

KeyValueGroupedDataset is created using Dataset.groupByKey operator.



val dataset: Dataset[Token] = ...
scala> val tokensByName = dataset.groupByKey(_.name)
tokensByName: org.apache.spark.sql.KeyValueGroupedDataset[String,Token] = org.apache.spark.sql.KeyValueGroupedDataset@1e3aad46

val dataset: Dataset[Token] = ...

scala> val tokensByName = dataset.groupByKey(_.name)

tokensByName: org.apache.spark.sql.KeyValueGroupedDataset[String,Token] = org.apache.spark.sql.KeyValueGroupedDataset@1e3aad46

Table 1. KeyValueGroupedDataset’s Aggregate Operators (KeyValueGroupedDataset API)
Operator	Description
`agg`
`cogroup`
`count`
`flatMapGroups`
`flatMapGroupsWithState`
`keys`
`keyAs`
`mapGroups`
`mapGroupsWithState`
`mapValues`
`reduceGroups`

KeyValueGroupedDataset holds keys that were used for the object.



scala> tokensByName.keys.show
+-----+
|value|
+-----+
|  aaa|
|  bbb|
+-----+

scala> tokensByName.keys.show

+-----+

|value|

+-----+

| aaa|

| bbb|

+-----+

`aggUntyped` Internal Method



aggUntyped(columns: TypedColumn[_, _]*): Dataset[_]

aggUntyped(columns: TypedColumn[_, _]*): Dataset[_]

aggUntyped…FIXME

Note	`aggUntyped` is used exclusively when KeyValueGroupedDataset.agg typed operator is used.

`logicalPlan` Internal Method



logicalPlan: AnalysisBarrier

logicalPlan: AnalysisBarrier

logicalPlan…FIXME

Note	`logicalPlan` is used when…FIXME

KeyValueGroupedDataset — Typed Grouping