KafkaSourceRDD
KafkaSourceRDD
is an RDD
of Kafka’s ConsumerRecords (with keys and values being collections of bytes, i.e. Array[Byte]
).
KafkaSourceRDD
is created when:
Creating KafkaSourceRDD Instance
KafkaSourceRDD
takes the following when created:
-
Collection of key-value settings for executors reading records from Kafka topics
-
Timeout (in milliseconds) to poll data from Kafka
Used when
KafkaSourceRDD
is requested for records (for given offsets) and in turn requestsCachedKafkaConsumer
to poll for Kafka’sConsumerRecords
.
KafkaSourceRDD
initializes the internal registries and counters.