关注 spark技术分享,
撸spark源码 玩spark最佳实践

Collection Functions

Standard Functions for Collections (Collection Functions)

Table 1. (Subset of) Standard Functions for Handling Collections
Name Description

array_contains

explode

explode_outer

Creates a new row for each element in the given array or map column.

If the array/map is null or empty then null is produced.

from_json

Extract data from arbitrary JSON-encoded values into a StructType or ArrayType of StructType elements with the specified schema

map_keys

map_values

posexplode

posexplode_outer

reverse

Returns a reversed string or an array with reverse order of elements

Note
Support for reversing arrays is new in 2.4.0.

size

Returns the size of the given array or map. Returns -1 if null.

reverse Collection Function

reverse…​FIXME

size Collection Function

size returns the size of the given array or map. Returns -1 if null.

Internally, size creates a Column with Size unary expression.

posexplode Collection Function

posexplode…​FIXME

posexplode_outer Collection Function

posexplode_outer…​FIXME

explode Collection Function

Caution
FIXME

Note
explode function is an equivalent of flatMap operator for Dataset.

explode_outer Collection Function

explode_outer generates a new row for each element in e array or map column.

Note
Unlike explode, explode_outer generates null when the array or map is null or empty.

Internally, explode_outer creates a Column with GeneratorOuter and Explode Catalyst expressions.

Extracting Data from Arbitrary JSON-Encoded Values — from_json Collection Function

  1. Calls <2> with StructType converted to DataType

  2. (fixme)

  3. Calls <1> with empty options

  4. Relays to the other from_json with empty options

  5. Uses schema as DataType in the JSON format or falls back to StructType in the DDL format

from_json parses a column with a JSON-encoded value into a StructType or ArrayType of StructType elements with the specified schema.

Note

A schema can be one of the following:

  1. DataType as a Scala object or in the JSON format

  2. StructType in the DDL format

Note
options controls how a JSON is parsed and contains the same options as the json format.

Internally, from_json creates a Column with JsonToStructs unary expression.

Note
from_json (creates a JsonToStructs that) uses a JSON parser in FAILFAST parsing mode that simply fails early when a corrupted/malformed record is found (and hence does not support columnNameOfCorruptRecord JSON option).

Note
from_json corresponds to SQL’s from_json.

array_contains Collection Function

array_contains creates a Column for a column argument as an array and the value of same type as the type of the elements of the array.

Internally, array_contains creates a Column with a ArrayContains expression.

array_contains corresponds to SQL’s array_contains.

Tip
Use SQL’s array_contains to use values from columns for the column and value arguments.

map_keys Collection Function

map_keys…​FIXME

map_values Collection Function

map_values…​FIXME

赞(0) 打赏
未经允许不得转载:spark技术分享 » Collection Functions
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏