关注 spark技术分享,
撸spark源码 玩spark最佳实践

TextSocketSourceProvider

TextSocketSourceProvider

TextSocketSourceProvider is a StreamSourceProvider for TextSocketSource that read records from host and port.

TextSocketSourceProvider is a DataSourceRegister, too.

The short name of the data source is socket.

It requires two mandatory options (that you can set using option method):

  1. host which is the host name.

  2. port which is the port number. It must be an integer.

TextSocketSourceProvider also supports includeTimestamp option that is a boolean flag that you can use to include timestamps in the schema.

includeTimestamp Option

Caution
FIXME

createSource

createSource grabs the two mandatory options — host and port — and returns an TextSocketSource.

sourceSchema

sourceSchema returns textSocket as the name of the source and the schema that can be one of the two available schemas:

  1. SCHEMA_REGULAR (default) which is a schema with a single value field of String type.

  2. SCHEMA_TIMESTAMP when includeTimestamp flag option is set. It is not, i.e. false, by default. The schema are value field of StringType type and timestamp field of TimestampType type of format yyyy-MM-dd HH:mm:ss.

Tip
Read about schema.

Internally, it starts by printing out the following WARN message to the logs:

It then checks whether host and port parameters are defined and if not it throws a AnalysisException:

赞(0) 打赏
未经允许不得转载:spark技术分享 » TextSocketSourceProvider
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏