关注 spark技术分享,
撸spark源码 玩spark最佳实践

KafkaWriter Helper Object — Writing Structured Queries to Kafka

KafkaWriter Helper Object — Writing Structured Queries to Kafka

KafkaWriter is a Scala object that is used to write the rows of a batch (or a streaming) structured query to Apache Kafka.

spark sql KafkaWriter write webui.png
Figure 1. KafkaWriter (write) in web UI

KafkaWriter validates the schema of a structured query that it contains the following columns (output schema attributes):

  • Either topic of type StringType or the topic option are defined

  • Optional key of type StringType or BinaryType

  • Required value of type StringType or BinaryType

Writing Rows of Structured Query to Kafka Topic — write Method

write gets the output schema of the analyzed logical plan of the input QueryExecution.

In the end, write requests the QueryExecution for RDD[InternalRow] (that represents the structured query as an RDD) and executes the following function on every partition of the RDD (using RDD.foreachPartition operation):

  1. Creates a KafkaWriteTask (for the input kafkaParameters, the schema and the input topic)

  2. Requests the KafkaWriteTask to write the rows (of the partition) to Kafka topic

  3. Requests the KafkaWriteTask to close

Note

write is used when:

Validating Schema (Attributes) of Structured Query and Topic Option Availability — validateQuery Method

validateQuery makes sure that the following attributes are in the input schema (or their alternatives) and of the right data types:

  • Either topic attribute of type StringType or the topic option are defined

  • If key attribute is defined it is of type StringType or BinaryType

  • value attribute is of type StringType or BinaryType

If any of the requirements are not met, validateQuery throws an AnalysisException.

Note

validateQuery is used when:

赞(0) 打赏
未经允许不得转载:spark技术分享 » KafkaWriter Helper Object — Writing Structured Queries to Kafka
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏