关注 spark技术分享,
撸spark源码 玩spark最佳实践

JDBCRelation

JDBCRelation — Relation with Inserting or Overwriting Data, Column Pruning and Filter Pushdown

As a BaseRelation, JDBCRelation defines the schema of tuples (data) and the SQLContext.

As a InsertableRelation, JDBCRelation supports inserting or overwriting data.

JDBCRelation is created when:

When requested for a human-friendly text representation, JDBCRelation requests the JDBCOptions for the name of the table and the number of partitions (if defined).

spark sql JDBCRelation webui query details.png
Figure 1. JDBCRelation in web UI (Details for Query)

JDBCRelation uses the SparkSession to return a SQLContext.

JDBCRelation turns the needConversion flag off (to announce that buildScan returns an RDD[InternalRow] already and DataSourceStrategy execution planning strategy does not have to do the RDD conversion).

Creating JDBCRelation Instance

JDBCRelation takes the following when created:

Finding Unhandled Filter Predicates — unhandledFilters Method

Note
unhandledFilters is part of BaseRelation Contract to find unhandled Filter predicates.

unhandledFilters returns the Filter predicates in the input filters that could not be converted to a SQL expression (and are therefore unhandled by the JDBC data source natively).

Schema of Tuples (Data) — schema Property

Note
schema is part of BaseRelation Contract to return the schema of the tuples in a relation.

schema uses JDBCRDD to resolveTable given the JDBCOptions (that simply returns the Catalyst schema of the table, also known as the default table schema).

If customSchema JDBC option was defined, schema uses JdbcUtils to replace the data types in the default table schema.

Inserting or Overwriting Data to JDBC Table — insert Method

Note
insert is part of InsertableRelation Contract that inserts or overwrites data in a relation.

insert simply requests the input DataFrame for a DataFrameWriter that in turn is requested to save the data to a table using the JDBC data source (itself!) with the url, table and all options.

insert also requests the DataFrameWriter to set the save mode as Overwrite or Append per the input overwrite flag.

Note
insert uses a “trick” to reuse a code that is responsible for saving data to a JDBC table.

Building Distributed Data Scan with Column Pruning and Filter Pushdown — buildScan Method

Note
buildScan is part of PrunedFilteredScan Contract to build a distributed data scan (as a RDD[Row]) with support for column pruning and filter pushdown.

buildScan uses the JDBCRDD object to create a RDD[Row] for a distributed data scan.

赞(0) 打赏
未经允许不得转载:spark技术分享 » JDBCRelation
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏