DataSource — Pluggable Data Source
DataSource is…FIXME
DataSource is created when…FIXME
|
Tip
|
Read DataSource — Pluggable Data Sources (for Spark SQL’s batch structured queries). |
| Name | Description |
|---|---|
|
|
java.lang.Class that corresponds to the className (that can be a fully-qualified class name or an alias of the data source) |
|
|
Used when:
|
Describing Name and Schema of Streaming Source — sourceSchema Internal Method
|
1 2 3 4 5 |
sourceSchema(): SourceInfo |
sourceSchema…FIXME
|
Note
|
sourceSchema is used exclusively when DataSource is requested SourceInfo.
|
Creating DataSource Instance
DataSource takes the following when created:
DataSource initializes the internal registries and counters.
createSource Method
|
1 2 3 4 5 |
createSource(metadataPath: String): Source |
createSource…FIXME
|
Note
|
createSource is used when…FIXME
|
Creating Streaming Sink — createSink Method
|
1 2 3 4 5 |
createSink(outputMode: OutputMode): Sink |
createSink creates a streaming sink for StreamSinkProvider or FileFormat data sources.
|
Tip
|
Find out more on FileFormat data sources in FileFormat — Data Sources to Read and Write Data In Files section in The Internals of Spark SQL book.
|
Internally, createSink creates a new instance of the providingClass and branches off per type:
-
For a StreamSinkProvider,
createSinksimply delegates the call and requests it to create a streaming sink -
For a
FileFormat,createSinkcreates a FileStreamSink whenpathoption is specified and the output mode is Append
createSink throws a IllegalArgumentException when path option is not specified for a FileFormat data source:
|
1 2 3 4 5 |
'path' is not specified |
createSink throws an AnalysisException when the given OutputMode is different from Append for a FileFormat data source:
|
1 2 3 4 5 |
Data source [className] does not support [outputMode] output mode |
createSink throws an UnsupportedOperationException for unsupported data source formats:
|
1 2 3 4 5 |
Data source [className] does not support streamed writing |
|
Note
|
createSink is used exclusively when DataStreamWriter is requested to create and start a streaming query.
|
spark技术分享