关注 spark技术分享,
撸spark源码 玩spark最佳实践

Column

Column

Column represents a column in a Dataset that holds a Catalyst Expression that produces a value per row.

Note
A Column is a value generator for every row in a Dataset.

A special column * references all columns in a Dataset.

With the implicits converstions imported, you can create “free” column references using Scala’s symbols.

Note
“Free” column references are Columns with no association to a Dataset.

You can also create free column references from $-prefixed strings.

Beside using the implicits conversions, you can create columns using col and column functions.

Finally, you can create a bound Column using the Dataset the column is supposed to be part of using Dataset.apply factory method or Dataset.col operator.

Note
You can use bound Column references only with the Datasets they have been created from.

You can reference nested columns using . (dot).

Table 1. Column Operators
Operator Description

as

Specifying type hint about the expected return value of the column

name

Note

Column has a reference to Catalyst’s Expression it was created for using expr method.

Tip
Read about typed column references in TypedColumn Expressions.

Specifying Type Hint — as Operator

as creates a TypedColumn (that gives a type hint about the expected return value of the column).

name Operator

name…​FIXME

Note
name is used when…​FIXME

Adding Column to Dataset — withColumn Method

withColumn method returns a new DataFrame with the new column col with colName name added.

Note
withColumn can replace an existing colName column.

You can add new columns do a Dataset using withColumn method.

Creating Column Instance For Catalyst Expression — apply Factory Method

like Operator

Caution
FIXME

Symbols As Column Names

Defining Windowing Column (Analytic Clause) — over Operator

over creates a windowing column (aka analytic clause) that allows to execute a aggregate function over a window (i.e. a group of records that are in some relation to the current record).

Tip
Read up on windowed aggregation in Spark SQL in Window Aggregate Functions.

cast Operator

cast method casts a column to a data type. It makes for type-safe maps with Row objects of the proper type (not Any).

cast uses CatalystSqlParser to parse the data type from its canonical string representation.

cast Example

generateAlias Method

generateAlias…​FIXME

Note

generateAlias is used when:

  • Column is requested to named

  • RelationalGroupedDataset is requested to alias

named Method

named…​FIXME

Note

named is used when the following operators are used:

赞(0) 打赏
未经允许不得转载:spark技术分享 » Column
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏