关注 spark技术分享,
撸spark源码 玩spark最佳实践

window Function — Stream Time Windows

window Function — Stream Time Windows

window is a standard function that generates tumbling, sliding or delayed stream time window ranges (on a timestamp column).

  1. Creates a tumbling time window with slideDuration as windowDuration and 0 second for startTime

  2. Creates a sliding time window with 0 second for startTime

  3. Creates a delayed time window

Note

Tumbling windows are a series of fixed-sized, non-overlapping and contiguous time intervals.

Note

Tumbling windows group elements of a stream into finite sets where each set corresponds to an interval.

Tumbling windows discretize a stream into non-overlapping windows.

timeColumn should be of TimestampType, i.e. with java.sql.Timestamp values.

Tip
Use java.sql.Timestamp.from or java.sql.Timestamp.valueOf factory methods to create Timestamp instances.

windowDuration and slideDuration are strings specifying the width of the window for duration and sliding identifiers, respectively.

Tip
Use CalendarInterval for valid window identifiers.

There are a couple of rules governing the durations:

  1. The window duration must be greater than 0

  2. The slide duration must be greater than 0.

  3. The start time must be greater than or equal to 0.

  4. The slide duration must be less than or equal to the window duration.

  5. The start time must be less than the slide duration.

Note
Only one window expression is supported in a query.
Note
null values are filtered out in window expression.

Internally, window creates a Column with TimeWindow Catalyst expression under window alias.

Internally, TimeWindow Catalyst expression is simply a struct type with two fields, i.e. start and end, both of TimestampType type.

Note

TimeWindow time window Catalyst expression is planned (i.e. converted) in TimeWindowing logical optimization rule (i.e. Rule[LogicalPlan]) of the Spark SQL logical query plan analyzer.

Find more about the Spark SQL logical query plan analyzer in Mastering Apache Spark 2 gitbook.

Example — Traffic Sensor

Note
The example is borrowed from Introducing Stream Windows in Apache Flink.

The example shows how to use window function to model a traffic sensor that counts every 15 seconds the number of vehicles passing a certain location.

赞(0) 打赏
未经允许不得转载:spark技术分享 » window Function — Stream Time Windows
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏