关注 spark技术分享,
撸spark源码 玩spark最佳实践

Untyped Transformations

Dataset API — Untyped Transformations

Untyped transformations are part of the Dataset API for transforming a Dataset to a DataFrame, a Column, a RelationalGroupedDataset, a DataFrameNaFunctions or a DataFrameStatFunctions (and hence untyped).

Note
Untyped transformations are the methods in the Dataset Scala class that are grouped in untypedrel group name, i.e. @group untypedrel.
Table 1. Dataset API’s Untyped Transformations
Transformation Description

agg

apply

Selects a column based on the column name (i.e. maps a Dataset onto a Column)

col

Selects a column based on the column name (i.e. maps a Dataset onto a Column)

colRegex

Selects a column based on the column name specified as a regex (i.e. maps a Dataset onto a Column)

crossJoin

cube

drop

groupBy

join

na

rollup

select

selectExpr

stat

withColumn

withColumnRenamed

agg Untyped Transformation

agg…​FIXME

apply Untyped Transformation

apply selects a column based on the column name (i.e. maps a Dataset onto a Column).

col Untyped Transformation

col selects a column based on the column name (i.e. maps a Dataset onto a Column).

Internally, col branches off per the input column name.

If the column name is * (a star), col simply creates a Column with ResolvedStar expression (with the schema output attributes of the analyzed logical plan of the QueryExecution).

Otherwise, col uses colRegex untyped transformation when spark.sql.parser.quotedRegexColumnNames configuration property is enabled.

In the case when the column name is not * and spark.sql.parser.quotedRegexColumnNames configuration property is disabled, col creates a Column with the column name resolved (as a NamedExpression).

colRegex Untyped Transformation

colRegex selects a column based on the column name specified as a regex (i.e. maps a Dataset onto a Column).

Note
colRegex is used in col when spark.sql.parser.quotedRegexColumnNames configuration property is enabled (and the column name is not *).

Internally, colRegex matches the input column name to different regular expressions (in the order):

  1. For column names with quotes without a qualifier, colRegex simply creates a Column with a UnresolvedRegex (with no table)

  2. For column names with quotes with a qualifier, colRegex simply creates a Column with a UnresolvedRegex (with a table specified)

  3. For other column names, colRegex (behaves like col and) creates a Column with the column name resolved (as a NamedExpression)

crossJoin Untyped Transformation

crossJoin…​FIXME

cube Untyped Transformation

cube…​FIXME

Dropping One or More Columns — drop Untyped Transformation

drop…​FIXME

groupBy Untyped Transformation

groupBy…​FIXME

join Untyped Transformation

join…​FIXME

na Untyped Transformation

na simply creates a DataFrameNaFunctions to work with missing data.

rollup Untyped Transformation

rollup…​FIXME

select Untyped Transformation

select…​FIXME

Projecting Columns using SQL Statements — selectExpr Untyped Transformation

selectExpr is like select, but accepts SQL statements.

Internally, it executes select with every expression in exprs mapped to Column (using SparkSqlParser.parseExpression).

stat Untyped Transformation

stat simply creates a DataFrameStatFunctions to work with statistic functions.

withColumn Untyped Transformation

withColumn…​FIXME

withColumnRenamed Untyped Transformation

withColumnRenamed…​FIXME

赞(0) 打赏
未经允许不得转载:spark技术分享 » Untyped Transformations
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏