JDBCRDD-spark技术分享

JDBCRDD

JDBCRDD is a RDD of internal binary rows that represents a structured query over a table in a database accessed via JDBC.

Note	`JDBCRDD` represents a “SELECT requiredColumns FROM table” query.

JDBCRDD is created exclusively when JDBCRDD is requested to scanTable (when JDBCRelation is requested to build a scan).

Table 1. JDBCRDD’s Internal Properties (e.g. Registries, Counters and Flags)
Name	Description
`columnList`	Column names Used when…FIXME
`filterWhereClause`	Filters as a SQL `WHERE` clause Used when…FIXME

Computing Partition (in TaskContext) — `compute` Method



compute(thePart: Partition, context: TaskContext): Iterator[InternalRow]

compute(thePart: Partition, context: TaskContext): Iterator[InternalRow]

Note	`compute` is part of Spark Core’s `RDD` Contract to compute a partition (in a `TaskContext`).

compute…FIXME

`resolveTable` Method



resolveTable(options: JDBCOptions): StructType

resolveTable(options: JDBCOptions): StructType

resolveTable…FIXME

Note	`resolveTable` is used exclusively when `JDBCRelation` is requested for the schema.

Creating RDD for Distributed Data Scan — `scanTable` Object Method



scanTable(
  sc: SparkContext,
  schema: StructType,
  requiredColumns: Array[String],
  filters: Array[Filter],
  parts: Array[Partition],
  options: JDBCOptions): RDD[InternalRow]

scanTable(

sc: SparkContext,

schema: StructType,

requiredColumns: Array[String],

filters: Array[Filter],

parts: Array[Partition],

options: JDBCOptions): RDD[InternalRow]

scanTable takes the url option.

scanTable finds the corresponding JDBC dialect (per the url option) and requests it to quote the column identifiers in the input requiredColumns.

scanTable uses the JdbcUtils object to createConnectionFactory and prune columns from the input schema to include the input requiredColumns only.

In the end, scanTable creates a new JDBCRDD.

Note	`scanTable` is used exclusively when `JDBCRelation` is requested to build a distributed data scan with column pruning and filter pushdown.

Creating JDBCRDD Instance

JDBCRDD takes the following when created:

SparkContext
Function to create a Connection (() ⇒ Connection)
Schema (StructType)
Array of column names
Array of Filter predicates
Array of Spark Core’s Partitions
Connection URL
JDBCOptions

JDBCRDD initializes the internal registries and counters.

`getPartitions` Method



getPartitions: Array[Partition]

getPartitions: Array[Partition]

Note	`getPartitions` is part of Spark Core’s `RDD` Contract to…FIXME

getPartitions simply returns the partitions (this JDBCRDD was created with).

`pruneSchema` Internal Method



pruneSchema(schema: StructType, columns: Array[String]): StructType

pruneSchema(schema: StructType, columns: Array[String]): StructType

pruneSchema…FIXME

Note	`pruneSchema` is used when…FIXME

Converting Filter Predicate to SQL Expression — `compileFilter` Object Method



compileFilter(f: Filter, dialect: JdbcDialect): Option[String]

compileFilter(f: Filter, dialect: JdbcDialect): Option[String]

compileFilter…FIXME

Note	`compileFilter` is used when: `JDBCRelation` is requested to find unhandled Filter predicates `JDBCRDD` is created

JDBCRDD

JDBCRDD

Computing Partition (in TaskContext) — `compute` Method

`resolveTable` Method

Creating RDD for Distributed Data Scan — `scanTable` Object Method

Creating JDBCRDD Instance

`getPartitions` Method

`pruneSchema` Internal Method

Converting Filter Predicate to SQL Expression — `compileFilter` Object Method

相关推荐

欢迎关注：spark技术分享

热门标签

近期文章

分类目录

关注公众号：spark技术分享

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部

JDBCRDD

Computing Partition (in TaskContext) — compute Method

resolveTable Method

Creating RDD for Distributed Data Scan — scanTable Object Method

Creating JDBCRDD Instance

getPartitions Method

pruneSchema Internal Method

Converting Filter Predicate to SQL Expression — compileFilter Object Method

相关推荐

欢迎关注：spark技术分享

热门标签

近期文章

分类目录

关注公众号：spark技术分享

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部

Computing Partition (in TaskContext) — `compute` Method

`resolveTable` Method

Creating RDD for Distributed Data Scan — `scanTable` Object Method

`getPartitions` Method

`pruneSchema` Internal Method

Converting Filter Predicate to SQL Expression — `compileFilter` Object Method