JdbcUtils Helper Object
JdbcUtils is a Scala object with methods to support JDBCRDD, JDBCRelation and JdbcRelationProvider.
| Name | Description |
|---|---|
|
Used when:
|
|
|
Replaces data types in a table schema Used exclusively when |
|
|
Used when |
|
|
Used when |
|
|
Used when…FIXME |
|
|
Used when |
|
|
Used when |
|
|
Used when…FIXME |
createConnectionFactory Method
|
1 2 3 4 5 |
createConnectionFactory(options: JDBCOptions): () => Connection |
createConnectionFactory…FIXME
|
Note
|
|
getCommonJDBCType Method
|
1 2 3 4 5 |
getCommonJDBCType(dt: DataType): Option[JdbcType] |
getCommonJDBCType…FIXME
|
Note
|
getCommonJDBCType is used when…FIXME
|
getCatalystType Internal Method
|
1 2 3 4 5 6 7 8 9 |
getCatalystType( sqlType: Int, precision: Int, scale: Int, signed: Boolean): DataType |
getCatalystType…FIXME
|
Note
|
getCatalystType is used when…FIXME
|
getSchemaOption Method
|
1 2 3 4 5 |
getSchemaOption(conn: Connection, options: JDBCOptions): Option[StructType] |
getSchemaOption…FIXME
|
Note
|
getSchemaOption is used when…FIXME
|
getSchema Method
|
1 2 3 4 5 6 7 8 |
getSchema( resultSet: ResultSet, dialect: JdbcDialect, alwaysNullable: Boolean = false): StructType |
getSchema…FIXME
|
Note
|
getSchema is used when…FIXME
|
resultSetToRows Method
|
1 2 3 4 5 |
resultSetToRows(resultSet: ResultSet, schema: StructType): Iterator[Row] |
resultSetToRows…FIXME
|
Note
|
resultSetToRows is used when…FIXME
|
resultSetToSparkInternalRows Method
|
1 2 3 4 5 6 7 8 |
resultSetToSparkInternalRows( resultSet: ResultSet, schema: StructType, inputMetrics: InputMetrics): Iterator[InternalRow] |
resultSetToSparkInternalRows…FIXME
|
Note
|
resultSetToSparkInternalRows is used when…FIXME
|
schemaString Method
|
1 2 3 4 5 6 7 8 |
schemaString( df: DataFrame, url: String, createTableColumnTypes: Option[String] = None): String |
schemaString…FIXME
|
Note
|
schemaString is used exclusively when JdbcUtils is requested to create a table.
|
parseUserSpecifiedCreateTableColumnTypes Internal Method
|
1 2 3 4 5 6 7 |
parseUserSpecifiedCreateTableColumnTypes( df: DataFrame, createTableColumnTypes: String): Map[String, String] |
parseUserSpecifiedCreateTableColumnTypes…FIXME
|
Note
|
parseUserSpecifiedCreateTableColumnTypes is used exclusively when JdbcUtils is requested to schemaString.
|
saveTable Method
|
1 2 3 4 5 6 7 8 9 |
saveTable( df: DataFrame, tableSchema: Option[StructType], isCaseSensitive: Boolean, options: JDBCOptions): Unit |
saveTable takes the url, table, batchSize, isolationLevel options and createConnectionFactory.
saveTable getInsertStatement.
saveTable takes the numPartitions option and applies coalesce operator to the input DataFrame if the number of partitions of its RDD is less than the numPartitions option.
In the end, saveTable requests the possibly-repartitioned DataFrame for its RDD (it may have changed after the coalesce operator) and executes savePartition for every partition (using RDD.foreachPartition).
|
Note
|
saveTable is used exclusively when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table.
|
Replacing Data Types In Table Schema — getCustomSchema Method
|
1 2 3 4 5 6 7 8 |
getCustomSchema( tableSchema: StructType, customSchema: String, nameEquality: Resolver): StructType |
getCustomSchema replaces the data type of the fields in the input tableSchema schema that are included in the input customSchema (if defined).
Internally, getCustomSchema branches off per the input customSchema.
If the input customSchema is undefined or empty, getCustomSchema simply returns the input tableSchema unchanged.
Otherwise, if the input customSchema is not empty, getCustomSchema requests CatalystSqlParser to parse it (i.e. create a new StructType for the given customSchema canonical schema representation).
getCustomSchema then uses SchemaUtils to checkColumnNameDuplication (in the column names of the user-defined customSchema schema with the input nameEquality).
In the end, getCustomSchema replaces the data type of the fields in the input tableSchema that are included in the input userSchema.
|
Note
|
getCustomSchema is used exclusively when JDBCRelation is created (and customSchema JDBC option was defined).
|
dropTable Method
|
1 2 3 4 5 |
dropTable(conn: Connection, table: String): Unit |
dropTable…FIXME
|
Note
|
dropTable is used when…FIXME
|
Creating Table Using JDBC — createTable Method
|
1 2 3 4 5 6 7 8 |
createTable( conn: Connection, df: DataFrame, options: JDBCOptions): Unit |
createTable builds the table schema (given the input DataFrame with the url and createTableColumnTypes options).
createTable uses the table and createTableOptions options.
In the end, createTable concatenates all the above texts into a CREATE TABLE
SQL DDL statement followed by executing it (using the input JDBC
([strSchema]) [createTableOptions]Connection).
|
Note
|
createTable is used exclusively when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table.
|
getInsertStatement Method
|
1 2 3 4 5 6 7 8 9 10 |
getInsertStatement( table: String, rddSchema: StructType, tableSchema: Option[StructType], isCaseSensitive: Boolean, dialect: JdbcDialect): String |
getInsertStatement…FIXME
|
Note
|
getInsertStatement is used when…FIXME
|
getJdbcType Internal Method
|
1 2 3 4 5 |
getJdbcType(dt: DataType, dialect: JdbcDialect): JdbcType |
getJdbcType…FIXME
|
Note
|
getJdbcType is used when…FIXME
|
tableExists Method
|
1 2 3 4 5 |
tableExists(conn: Connection, options: JDBCOptions): Boolean |
tableExists…FIXME
|
Note
|
tableExists is used exclusively when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table.
|
truncateTable Method
|
1 2 3 4 5 |
truncateTable(conn: Connection, options: JDBCOptions): Unit |
truncateTable…FIXME
|
Note
|
truncateTable is used exclusively when JdbcRelationProvider is requested to write the rows of a structured query (a DataFrame) to a table.
|
Saving Rows (Per Partition) to Table — savePartition Method
|
1 2 3 4 5 6 7 8 9 10 11 12 13 |
savePartition( getConnection: () => Connection, table: String, iterator: Iterator[Row], rddSchema: StructType, insertStmt: String, batchSize: Int, dialect: JdbcDialect, isolationLevel: Int): Iterator[Byte] |
savePartition creates a JDBC Connection using the input getConnection function.
savePartition tries to set the input isolationLevel if it is different than TRANSACTION_NONE and the database supports transactions.
savePartition then writes rows (in the input Iterator[Row]) using batches that are submitted after batchSize rows where added.
|
Note
|
savePartition is used exclusively when JdbcUtils is requested to saveTable.
|
spark技术分享