KafkaSourceRDD
KafkaSourceRDD is an RDD of Kafka’s ConsumerRecords (with keys and values being collections of bytes, i.e. Array[Byte]).
KafkaSourceRDD uses KafkaSourceRDDPartition for the partitions.
KafkaSourceRDD has a specialized API for the following RDD operators:
KafkaSourceRDD is created when:
-
KafkaRelationis requested to build a distributed data scan with column pruning (as a TableScan) -
(Spark Structured Streaming)
KafkaSourceis requested togetBatch
Creating KafkaSourceRDD Instance
KafkaSourceRDD takes the following when created:
-
Collection of key-value settings for executors reading records from Kafka topics
-
Collection of KafkaSourceRDDOffsetRanges
-
Timeout (in milliseconds) to poll data from Kafka
Used exclusively when
KafkaSourceRDDis requested to compute a RDD partition (and requests aKafkaDataConsumerfor aConsumerRecord)
KafkaSourceRDD initializes the internal registries and counters.
Computing Partition (in TaskContext) — compute Method
|
1 2 3 4 5 6 7 |
compute( thePart: Partition, context: TaskContext): Iterator[ConsumerRecord[Array[Byte], Array[Byte]]] |
|
Note
|
compute is part of Spark Core’s RDD Contract to compute a partition (in a TaskContext).
|
compute…FIXME
count Operator
|
1 2 3 4 5 |
count(): Long |
|
Note
|
count is part of Spark Core’s RDD Contract to…FIXME.
|
count…FIXME
countApprox Operator
|
1 2 3 4 5 |
countApprox(timeout: Long, confidence: Double): PartialResult[BoundedDouble] |
|
Note
|
countApprox is part of Spark Core’s RDD Contract to…FIXME.
|
countApprox…FIXME
isEmpty Operator
|
1 2 3 4 5 |
isEmpty(): Boolean |
|
Note
|
isEmpty is part of Spark Core’s RDD Contract to…FIXME.
|
isEmpty…FIXME
persist Operator
|
1 2 3 4 5 |
persist(newLevel: StorageLevel): this.type |
|
Note
|
persist is part of Spark Core’s RDD Contract to…FIXME.
|
persist…FIXME
getPartitions Method
|
1 2 3 4 5 |
getPartitions: Array[Partition] |
|
Note
|
getPartitions is part of Spark Core’s RDD Contract to…FIXME
|
getPreferredLocations Method
|
1 2 3 4 5 |
getPreferredLocations(split: Partition): Seq[String] |
|
Note
|
getPreferredLocations is part of the RDD Contract to…FIXME.
|
getPreferredLocations…FIXME
resolveRange Internal Method
|
1 2 3 4 5 6 7 |
resolveRange( consumer: KafkaDataConsumer, range: KafkaSourceRDDOffsetRange): KafkaSourceRDDOffsetRange |
resolveRange…FIXME
|
Note
|
resolveRange is used exclusively when KafkaSourceRDD is requested to compute a partition.
|
spark技术分享