关注 spark技术分享,
撸spark源码 玩spark最佳实践

KeyValueGroupedDataset — Streaming Aggregation

KeyValueGroupedDataset — Streaming Aggregation

KeyValueGroupedDataset represents a grouped dataset as a result of Dataset.groupByKey operator (that aggregates records by a grouping function).

KeyValueGroupedDataset is also created for KeyValueGroupedDataset.keyAs and KeyValueGroupedDataset.mapValues operators.

KeyValueGroupedDataset works for batch and streaming aggregations, but shines the most when used for streaming aggregation (with streaming Datasets).

The most prestigious use case of KeyValueGroupedDataset however is stateful streaming aggregation that allows for accumulating streaming state (by means of GroupState) using mapGroupsWithState and the more advanced flatMapGroupsWithState operators.

Table 1. KeyValueGroupedDataset’s Operators
Operator Description

agg

cogroup

count

flatMapGroups

flatMapGroupsWithState

Creates a new Dataset with FlatMapGroupsWithState logical operator

Note
The difference between flatMapGroupsWithState and mapGroupsWithState is the state function that generates zero or more elements (that are in turn the rows in the result streaming Dataset).

keyAs

mapGroups

mapGroupsWithState

Creates a new Dataset with FlatMapGroupsWithState logical operator

Note
The difference between mapGroupsWithState and flatMapGroupsWithState is the state function that generates exactly one element (that is in turn the row in the result Dataset).

mapValues

reduceGroups

Creating KeyValueGroupedDataset Instance

KeyValueGroupedDataset takes the following when created:

  • Encoder for keys

  • Encoder for values

  • QueryExecution

  • Data attributes

  • Grouping attributes

赞(0) 打赏
未经允许不得转载:spark技术分享 » KeyValueGroupedDataset — Streaming Aggregation
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏