关注 spark技术分享,
撸spark源码 玩spark最佳实践

TextSocketSource

TextSocketSource

TextSocketSource is a streaming source that reads lines from a socket at the host and port (defined by parameters).

It uses lines internal in-memory buffer to keep all of the lines that were read from a socket forever.

Caution

This source is not for production use due to design contraints, e.g. infinite in-memory collection of lines read and no fault recovery.

It is designed only for tutorials and debugging.

lines Internal Buffer

lines is the internal buffer of all the lines TextSocketSource read from the socket.

Maximum Available Offset (getOffset method)

Note
getOffset is a part of the Streaming Source Contract.

TextSocketSource‘s offset can either be none or LongOffset of the number of lines in the internal lines buffer.

Schema (schema method)

TextSocketSource supports two schemas:

  1. A single value field of String type.

  2. value field of StringType type and timestamp field of TimestampType type of format yyyy-MM-dd HH:mm:ss.

Tip
Refer to sourceSchema for TextSocketSourceProvider.

Creating TextSocketSource Instance

When TextSocketSource is created (see TextSocketSourceProvider), it gets 4 parameters passed in:

  1. host

  2. port

  3. includeTimestamp flag

  4. SQLContext

Caution
It appears that the source did not get “renewed” to use SparkSession instead.

It opens a socket at given host and port parameters and reads a buffering character-input stream using the default charset and the default-sized input buffer (of 8192 bytes) line by line.

Caution
FIXME Review Java’s Charset.defaultCharset()

It starts a readThread daemon thread (called TextSocketSource(host, port)) to read lines from the socket. The lines are added to the internal lines buffer.

Stopping TextSocketSource (stop method)

When stopped, TextSocketSource closes the socket connection.

赞(0) 打赏
未经允许不得转载:spark技术分享 » TextSocketSource
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏