关注 spark技术分享,
撸spark源码 玩spark最佳实践

current_timestamp Function For Processing Time in Streaming Queries

Demo: current_timestamp Function For Processing Time in Streaming Queries

The demo shows what happens when you use current_timestamp function in your structured queries.

Note

The main motivation was to answer the question How to achieve ingestion time? in Spark Structured Streaming.

You’re very welcome to upvote the question and answers at your earliest convenience. Thanks!

Event time is the time that each individual event occurred on its producing device. This time is typically embedded within the records before they enter Flink and that event timestamp can be extracted from the record.

That is exactly how event time is considered in withWatermark operator which you use to describe what column to use for event time. The column could be part of the input dataset or…​generated.

And that is the moment where my confusion starts.

In order to generate the event time column for withWatermark operator you could use current_timestamp or current_date standard functions.

Both are special for Spark Structured Streaming as StreamExecution replaces their underlying Catalyst expressions, CurrentTimestamp and CurrentDate respectively, with CurrentBatchTimestamp expression and the time of the current batch.

That seems to be closer to processing time than ingestion time given the definition from the Apache Flink documentation:

Processing time refers to the system time of the machine that is executing the respective operation.

Ingestion time is the time that events enter Flink.

What do you think?

赞(0) 打赏
未经允许不得转载:spark技术分享 » current_timestamp Function For Processing Time in Streaming Queries
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏