JdbcUtils Helper Object-spark技术分享

JdbcUtils Helper Object

JdbcUtils is a Scala object with methods to support JDBCRDD, JDBCRelation and JdbcRelationProvider.

Table 1. JdbcUtils API
Name	Description
createConnectionFactory	Used when: `JDBCRDD` is requested to scanTable and resolveTable `JdbcRelationProvider` is requested to write the rows of a structured query (a DataFrame) to a table
createTable
dropTable
getCommonJDBCType
getCustomSchema	Replaces data types in a table schema Used exclusively when `JDBCRelation` is created (and customSchema JDBC option was defined)
getInsertStatement
getSchema	Used when `JDBCRDD` is requested to resolveTable
getSchemaOption	Used when `JdbcRelationProvider` is requested to write the rows of a structured query (a DataFrame) to a table
resultSetToRows	Used when…FIXME
resultSetToSparkInternalRows	Used when `JDBCRDD` is requested to compute a partition
schemaString
saveTable
tableExists	Used when `JdbcRelationProvider` is requested to write the rows of a structured query (a DataFrame) to a table
truncateTable	Used when…FIXME

`createConnectionFactory` Method



createConnectionFactory(options: JDBCOptions): () => Connection

createConnectionFactory(options: JDBCOptions): () => Connection

createConnectionFactory…FIXME

Note	`createConnectionFactory` is used when: `JDBCRDD` is requested to scanTable (and in turn creates a JDBCRDD) and resolveTable `JdbcRelationProvider` is requested to create a BaseRelation `JdbcUtils` is requested to saveTable

`getCommonJDBCType` Method



getCommonJDBCType(dt: DataType): Option[JdbcType]

getCommonJDBCType(dt: DataType): Option[JdbcType]

getCommonJDBCType…FIXME

Note	`getCommonJDBCType` is used when…FIXME

`getCatalystType` Internal Method



getCatalystType(
  sqlType: Int,
  precision: Int,
  scale: Int,
  signed: Boolean): DataType

getCatalystType(

sqlType: Int,

precision: Int,

scale: Int,

signed: Boolean): DataType

getCatalystType…FIXME

Note	`getCatalystType` is used when…FIXME

`getSchemaOption` Method



getSchemaOption(conn: Connection, options: JDBCOptions): Option[StructType]

getSchemaOption(conn: Connection, options: JDBCOptions): Option[StructType]

getSchemaOption…FIXME

Note	`getSchemaOption` is used when…FIXME

`getSchema` Method



getSchema(
  resultSet: ResultSet,
  dialect: JdbcDialect,
  alwaysNullable: Boolean = false): StructType

getSchema(

resultSet: ResultSet,

dialect: JdbcDialect,

alwaysNullable: Boolean = false): StructType

getSchema…FIXME

Note	`getSchema` is used when…FIXME

`resultSetToRows` Method



resultSetToRows(resultSet: ResultSet, schema: StructType): Iterator[Row]

resultSetToRows(resultSet: ResultSet, schema: StructType): Iterator[Row]

resultSetToRows…FIXME

Note	`resultSetToRows` is used when…FIXME

`resultSetToSparkInternalRows` Method



resultSetToSparkInternalRows(
  resultSet: ResultSet,
  schema: StructType,
  inputMetrics: InputMetrics): Iterator[InternalRow]

resultSetToSparkInternalRows(

resultSet: ResultSet,

schema: StructType,

inputMetrics: InputMetrics): Iterator[InternalRow]

resultSetToSparkInternalRows…FIXME

Note	`resultSetToSparkInternalRows` is used when…FIXME

`schemaString` Method



schemaString(
  df: DataFrame,
  url: String,
  createTableColumnTypes: Option[String] = None): String

schemaString(

df: DataFrame,

url: String,

createTableColumnTypes: Option[String] = None): String

schemaString…FIXME

Note	`schemaString` is used exclusively when `JdbcUtils` is requested to create a table.

`parseUserSpecifiedCreateTableColumnTypes` Internal Method



parseUserSpecifiedCreateTableColumnTypes(
  df: DataFrame,
  createTableColumnTypes: String): Map[String, String]

parseUserSpecifiedCreateTableColumnTypes(

df: DataFrame,

createTableColumnTypes: String): Map[String, String]

parseUserSpecifiedCreateTableColumnTypes…FIXME

Note	`parseUserSpecifiedCreateTableColumnTypes` is used exclusively when `JdbcUtils` is requested to schemaString.

`saveTable` Method



saveTable(
  df: DataFrame,
  tableSchema: Option[StructType],
  isCaseSensitive: Boolean,
  options: JDBCOptions): Unit

saveTable(

df: DataFrame,

tableSchema: Option[StructType],

isCaseSensitive: Boolean,

options: JDBCOptions): Unit

saveTable takes the url, table, batchSize, isolationLevel options and createConnectionFactory.

saveTable getInsertStatement.

saveTable takes the numPartitions option and applies coalesce operator to the input DataFrame if the number of partitions of its RDD is less than the numPartitions option.

In the end, saveTable requests the possibly-repartitioned DataFrame for its RDD (it may have changed after the coalesce operator) and executes savePartition for every partition (using RDD.foreachPartition).

Note	`saveTable` is used exclusively when `JdbcRelationProvider` is requested to write the rows of a structured query (a DataFrame) to a table.

Replacing Data Types In Table Schema — `getCustomSchema` Method



getCustomSchema(
  tableSchema: StructType,
  customSchema: String,
  nameEquality: Resolver): StructType

getCustomSchema(

tableSchema: StructType,

customSchema: String,

nameEquality: Resolver): StructType

getCustomSchema replaces the data type of the fields in the input tableSchema schema that are included in the input customSchema (if defined).

Internally, getCustomSchema branches off per the input customSchema.

If the input customSchema is undefined or empty, getCustomSchema simply returns the input tableSchema unchanged.

Otherwise, if the input customSchema is not empty, getCustomSchema requests CatalystSqlParser to parse it (i.e. create a new StructType for the given customSchema canonical schema representation).

getCustomSchema then uses SchemaUtils to checkColumnNameDuplication (in the column names of the user-defined customSchema schema with the input nameEquality).

In the end, getCustomSchema replaces the data type of the fields in the input tableSchema that are included in the input userSchema.

Note	`getCustomSchema` is used exclusively when `JDBCRelation` is created (and customSchema JDBC option was defined).

`dropTable` Method



dropTable(conn: Connection, table: String): Unit

dropTable(conn: Connection, table: String): Unit

dropTable…FIXME

Note	`dropTable` is used when…FIXME

Creating Table Using JDBC — `createTable` Method



createTable(
  conn: Connection,
  df: DataFrame,
  options: JDBCOptions): Unit

createTable(

conn: Connection,

df: DataFrame,

options: JDBCOptions): Unit

createTable builds the table schema (given the input DataFrame with the url and createTableColumnTypes options).

createTable uses the table and createTableOptions options.

In the end, createTable concatenates all the above texts into a CREATE TABLE


 ([strSchema]) [createTableOptions]

SQL DDL statement followed by executing it (using the input JDBC Connection).

Note	`createTable` is used exclusively when `JdbcRelationProvider` is requested to write the rows of a structured query (a DataFrame) to a table.

`getInsertStatement` Method



getInsertStatement(
  table: String,
  rddSchema: StructType,
  tableSchema: Option[StructType],
  isCaseSensitive: Boolean,
  dialect: JdbcDialect): String

getInsertStatement(

table: String,

rddSchema: StructType,

tableSchema: Option[StructType],

isCaseSensitive: Boolean,

dialect: JdbcDialect): String

getInsertStatement…FIXME

Note	`getInsertStatement` is used when…FIXME

`getJdbcType` Internal Method



getJdbcType(dt: DataType, dialect: JdbcDialect): JdbcType

getJdbcType(dt: DataType, dialect: JdbcDialect): JdbcType

getJdbcType…FIXME

Note	`getJdbcType` is used when…FIXME

`tableExists` Method



tableExists(conn: Connection, options: JDBCOptions): Boolean

tableExists(conn: Connection, options: JDBCOptions): Boolean

tableExists…FIXME

Note	`tableExists` is used exclusively when `JdbcRelationProvider` is requested to write the rows of a structured query (a DataFrame) to a table.

`truncateTable` Method



truncateTable(conn: Connection, options: JDBCOptions): Unit

truncateTable(conn: Connection, options: JDBCOptions): Unit

truncateTable…FIXME

Note	`truncateTable` is used exclusively when `JdbcRelationProvider` is requested to write the rows of a structured query (a DataFrame) to a table.

Saving Rows (Per Partition) to Table — `savePartition` Method



savePartition(
  getConnection: () => Connection,
  table: String,
  iterator: Iterator[Row],
  rddSchema: StructType,
  insertStmt: String,
  batchSize: Int,
  dialect: JdbcDialect,
  isolationLevel: Int): Iterator[Byte]

savePartition(

getConnection: () => Connection,

table: String,

iterator: Iterator[Row],

rddSchema: StructType,

insertStmt: String,

batchSize: Int,

dialect: JdbcDialect,

isolationLevel: Int): Iterator[Byte]

savePartition creates a JDBC Connection using the input getConnection function.

savePartition tries to set the input isolationLevel if it is different than TRANSACTION_NONE and the database supports transactions.

savePartition then writes rows (in the input Iterator[Row]) using batches that are submitted after batchSize rows where added.

Note	`savePartition` is used exclusively when `JdbcUtils` is requested to saveTable.

JdbcUtils Helper Object