CumeDist-spark技术分享

CumeDist Declarative Window Aggregate Function Expression

CumeDist is a SizeBasedWindowFunction and a RowNumberLike expression that is used for the following:

cume_dist standard function (Dataset API)
cume_dist SQL function

CumeDist takes no input parameters when created.



import org.apache.spark.sql.catalyst.expressions.CumeDist
val cume_dist = CumeDist()

import org.apache.spark.sql.catalyst.expressions.CumeDist

val cume_dist = CumeDist()

CumeDist uses cume_dist for the user-facing name.



import org.apache.spark.sql.catalyst.expressions.CumeDist
val cume_dist = CumeDist()
scala> println(cume_dist)
cume_dist()

import org.apache.spark.sql.catalyst.expressions.CumeDist

val cume_dist = CumeDist()

scala> println(cume_dist)

cume_dist()

As an WindowFunction expression (indirectly), CumeDist requires the SpecifiedWindowFrame (with the RangeFrame frame type, the UnboundedPreceding lower and the CurrentRow upper frame boundaries) as the frame.

Note	The frame for `CumeDist` expression is range-based instead of row-based, because it has to return the same value for tie values in a window (equal values per `ORDER BY` specification).

As a DeclarativeAggregate expression (indirectly), CumeDist defines the evaluateExpression expression which returns the final value when CumeDist is evaluated. The value uses the formula rowNumber / n where rowNumber is the row number in a window frame (the number of values before and including the current row) divided by the number of rows in the window frame.



import org.apache.spark.sql.catalyst.expressions.CumeDist
val cume_dist = CumeDist()
scala> println(cume_dist.evaluateExpression.numberedTreeString)
00 (cast(rowNumber#0 as double) / cast(window__partition__size#1 as double))
01 :- cast(rowNumber#0 as double)
02 :  +- rowNumber#0: int
03 +- cast(window__partition__size#1 as double)
04    +- window__partition__size#1: int

import org.apache.spark.sql.catalyst.expressions.CumeDist

val cume_dist = CumeDist()

scala> println(cume_dist.evaluateExpression.numberedTreeString)

00 (cast(rowNumber#0 as double) / cast(window__partition__size#1 as double))

01 :- cast(rowNumber#0 as double)

02 : +- rowNumber#0: int

03 +- cast(window__partition__size#1 as double)

04 +- window__partition__size#1: int

CumeDist

CumeDist Declarative Window Aggregate Function Expression

相关推荐

欢迎关注：spark技术分享

热门标签

近期文章

分类目录

关注公众号：spark技术分享

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部