关注 spark技术分享,
撸spark源码 玩spark最佳实践

JdbcUtils Helper Object

JdbcUtils Helper Object

JdbcUtils is a Scala object with methods to support JDBCRDD, JDBCRelation and JdbcRelationProvider.

Table 1. JdbcUtils API
Name Description

createConnectionFactory

Used when:

createTable

dropTable

getCommonJDBCType

getCustomSchema

Replaces data types in a table schema

Used exclusively when JDBCRelation is created (and customSchema JDBC option was defined)

getInsertStatement

getSchema

Used when JDBCRDD is requested to resolveTable

getSchemaOption

Used when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table

resultSetToRows

Used when…​FIXME

resultSetToSparkInternalRows

Used when JDBCRDD is requested to compute a partition

schemaString

saveTable

tableExists

Used when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table

truncateTable

Used when…​FIXME

createConnectionFactory Method

createConnectionFactory…​FIXME

Note

createConnectionFactory is used when:

getCommonJDBCType Method

getCommonJDBCType…​FIXME

Note
getCommonJDBCType is used when…​FIXME

getCatalystType Internal Method

getCatalystType…​FIXME

Note
getCatalystType is used when…​FIXME

getSchemaOption Method

getSchemaOption…​FIXME

Note
getSchemaOption is used when…​FIXME

getSchema Method

getSchema…​FIXME

Note
getSchema is used when…​FIXME

resultSetToRows Method

resultSetToRows…​FIXME

Note
resultSetToRows is used when…​FIXME

resultSetToSparkInternalRows Method

resultSetToSparkInternalRows…​FIXME

Note
resultSetToSparkInternalRows is used when…​FIXME

schemaString Method

schemaString…​FIXME

Note
schemaString is used exclusively when JdbcUtils is requested to create a table.

parseUserSpecifiedCreateTableColumnTypes Internal Method

parseUserSpecifiedCreateTableColumnTypes…​FIXME

Note
parseUserSpecifiedCreateTableColumnTypes is used exclusively when JdbcUtils is requested to schemaString.

saveTable Method

saveTable takes the url, table, batchSize, isolationLevel options and createConnectionFactory.

saveTable getInsertStatement.

saveTable takes the numPartitions option and applies coalesce operator to the input DataFrame if the number of partitions of its RDD is less than the numPartitions option.

In the end, saveTable requests the possibly-repartitioned DataFrame for its RDD (it may have changed after the coalesce operator) and executes savePartition for every partition (using RDD.foreachPartition).

Note
saveTable is used exclusively when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table.

Replacing Data Types In Table Schema — getCustomSchema Method

getCustomSchema replaces the data type of the fields in the input tableSchema schema that are included in the input customSchema (if defined).

Internally, getCustomSchema branches off per the input customSchema.

If the input customSchema is undefined or empty, getCustomSchema simply returns the input tableSchema unchanged.

Otherwise, if the input customSchema is not empty, getCustomSchema requests CatalystSqlParser to parse it (i.e. create a new StructType for the given customSchema canonical schema representation).

getCustomSchema then uses SchemaUtils to checkColumnNameDuplication (in the column names of the user-defined customSchema schema with the input nameEquality).

In the end, getCustomSchema replaces the data type of the fields in the input tableSchema that are included in the input userSchema.

Note
getCustomSchema is used exclusively when JDBCRelation is created (and customSchema JDBC option was defined).

dropTable Method

dropTable…​FIXME

Note
dropTable is used when…​FIXME

Creating Table Using JDBC — createTable Method

createTable builds the table schema (given the input DataFrame with the url and createTableColumnTypes options).

createTable uses the table and createTableOptions options.

In the end, createTable concatenates all the above texts into a CREATE TABLE

([strSchema]) [createTableOptions] SQL DDL statement followed by executing it (using the input JDBC Connection).

Note
createTable is used exclusively when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table.

getInsertStatement Method

getInsertStatement…​FIXME

Note
getInsertStatement is used when…​FIXME

getJdbcType Internal Method

getJdbcType…​FIXME

Note
getJdbcType is used when…​FIXME

tableExists Method

tableExists…​FIXME

Note
tableExists is used exclusively when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table.

truncateTable Method

truncateTable…​FIXME

Note
truncateTable is used exclusively when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table.

Saving Rows (Per Partition) to Table — savePartition Method

savePartition creates a JDBC Connection using the input getConnection function.

savePartition tries to set the input isolationLevel if it is different than TRANSACTION_NONE and the database supports transactions.

savePartition then writes rows (in the input Iterator[Row]) using batches that are submitted after batchSize rows where added.

Note
savePartition is used exclusively when JdbcUtils is requested to saveTable.
赞(0) 打赏
未经允许不得转载:spark技术分享 » JdbcUtils Helper Object
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏