KafkaSourceRDD
KafkaSourceRDD is an RDD of Kafka’s ConsumerRecords (with keys and values being collections of bytes, i.e. Array[Byte]).
KafkaSourceRDD is created when:
Creating KafkaSourceRDD Instance
KafkaSourceRDD takes the following when created:
-
Collection of key-value settings for executors reading records from Kafka topics
-
Timeout (in milliseconds) to poll data from Kafka
Used when
KafkaSourceRDDis requested for records (for given offsets) and in turn requestsCachedKafkaConsumerto poll for Kafka’sConsumerRecords.
KafkaSourceRDD initializes the internal registries and counters.
spark技术分享