关注 spark技术分享,
撸spark源码 玩spark最佳实践

KafkaWriteTask

KafkaWriteTask

KafkaWriteTask is used to write rows (from a structured query) to Apache Kafka.

KafkaWriteTask is created exclusively when KafkaWriter is requested to write the rows of a structured query to a Kafka topic.

KafkaWriteTask writes keys and values in their binary format (as JVM’s bytes) and so uses the raw-memory unsafe row format only (i.e. UnsafeRow). That is supposed to save time for reconstructing the rows to very tiny JVM objects (i.e. byte arrays).

Table 1. KafkaWriteTask’s Internal Properties
Name Description

callback

failedWrite

projection

UnsafeProjection

Created once when KafkaWriteTask is created.

Writing Rows to Kafka Asynchronously — execute Method

execute uses Apache Kafka’s Producer API to create a KafkaProducer and ProducerRecord for every row in iterator, and sends the rows to Kafka in batches asynchronously.

Internally, execute creates a KafkaProducer using Array[Byte] for the keys and values, and producerConfiguration for the producer’s configuration.

Note
execute creates a single KafkaProducer for all rows.

For every row in the iterator, execute uses the internal UnsafeProjection to project (aka convert) binary internal row format to a UnsafeRow object and take 0th, 1st and 2nd fields for a topic, key and value, respectively.

execute then creates a ProducerRecord and sends it to Kafka (using the KafkaProducer). execute registers a asynchronous Callback to monitor the writing.

Note

The send() method is asynchronous. When called it adds the record to a buffer of pending record sends and immediately returns. This allows the producer to batch together individual records for efficiency.

Creating UnsafeProjection — createProjection Internal Method

createProjection creates a UnsafeProjection with topic, key and value expressions and the inputSchema.

createProjection makes sure that the following holds (and reports an IllegalStateException otherwise):

  • topic was defined (either as the input topic or in inputSchema) and is of type StringType

  • Optional key is of type StringType or BinaryType if defined

  • value was defined (in inputSchema) and is of type StringType or BinaryType

createProjection casts key and value expressions to BinaryType in UnsafeProjection.

Note
createProjection is used exclusively when KafkaWriteTask is created (as projection).

close Method

close…​FIXME

Note
close is used when…​FIXME

Creating KafkaWriteTask Instance

KafkaWriteTask takes the following when created:

  • Kafka Producer configuration (as Map[String, Object])

  • Input schema (as Seq[Attribute])

  • Topic name

KafkaWriteTask initializes the internal registries and counters.

赞(0) 打赏
未经允许不得转载:spark技术分享 » KafkaWriteTask
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏