KafkaSourceRDD-spark技术分享

KafkaSourceRDD

KafkaSourceRDD is an RDD of Kafka’s ConsumerRecords (with keys and values being collections of bytes, i.e. Array[Byte]).

KafkaSourceRDD uses KafkaSourceRDDPartition for the partitions.

KafkaSourceRDD has a specialized API for the following RDD operators:

count
countApprox
isEmpty
persist
take

KafkaSourceRDD is created when:

KafkaRelation is requested to build a distributed data scan with column pruning (as a TableScan)
(Spark Structured Streaming) KafkaSource is requested to getBatch

Creating KafkaSourceRDD Instance

KafkaSourceRDD takes the following when created:

SparkContext
Collection of key-value settings for executors reading records from Kafka topics
Collection of KafkaSourceRDDOffsetRanges
Timeout (in milliseconds) to poll data from Kafka

Used exclusively when KafkaSourceRDD is requested to compute a RDD partition (and requests a KafkaDataConsumer for a ConsumerRecord)
failOnDataLoss flag to control…FIXME
reuseKafkaConsumer flag to control…FIXME

KafkaSourceRDD initializes the internal registries and counters.

Computing Partition (in TaskContext) — `compute` Method



compute(
  thePart: Partition,
  context: TaskContext): Iterator[ConsumerRecord[Array[Byte], Array[Byte]]]

compute(

thePart: Partition,

context: TaskContext): Iterator[ConsumerRecord[Array[Byte], Array[Byte]]]

Note	`compute` is part of Spark Core’s `RDD` Contract to compute a partition (in a `TaskContext`).

compute…FIXME

`count` Operator



count(): Long

count(): Long

Note	`count` is part of Spark Core’s `RDD` Contract to…FIXME.

count…FIXME

`countApprox` Operator



countApprox(timeout: Long, confidence: Double): PartialResult[BoundedDouble]

countApprox(timeout: Long, confidence: Double): PartialResult[BoundedDouble]

Note	`countApprox` is part of Spark Core’s `RDD` Contract to…FIXME.

countApprox…FIXME

`isEmpty` Operator



isEmpty(): Boolean

isEmpty(): Boolean

Note	`isEmpty` is part of Spark Core’s `RDD` Contract to…FIXME.

isEmpty…FIXME

`persist` Operator



persist(newLevel: StorageLevel): this.type

persist(newLevel: StorageLevel): this.type

Note	`persist` is part of Spark Core’s `RDD` Contract to…FIXME.

persist…FIXME

`getPartitions` Method



getPartitions: Array[Partition]

getPartitions: Array[Partition]

Note	`getPartitions` is part of Spark Core’s `RDD` Contract to…FIXME

`getPreferredLocations` Method



getPreferredLocations(split: Partition): Seq[String]

getPreferredLocations(split: Partition): Seq[String]

Note	`getPreferredLocations` is part of the RDD Contract to…FIXME.

getPreferredLocations…FIXME

`resolveRange` Internal Method



resolveRange(
  consumer: KafkaDataConsumer,
  range: KafkaSourceRDDOffsetRange): KafkaSourceRDDOffsetRange

resolveRange(

consumer: KafkaDataConsumer,

range: KafkaSourceRDDOffsetRange): KafkaSourceRDDOffsetRange

resolveRange…FIXME

Note	`resolveRange` is used exclusively when `KafkaSourceRDD` is requested to compute a partition.

KafkaSourceRDD

KafkaSourceRDD

Creating KafkaSourceRDD Instance

Computing Partition (in TaskContext) — `compute` Method

`count` Operator

`countApprox` Operator

`isEmpty` Operator

`persist` Operator

`getPartitions` Method

`getPreferredLocations` Method

`resolveRange` Internal Method

相关推荐

欢迎关注：spark技术分享

热门标签

近期文章

分类目录

关注公众号：spark技术分享

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部

KafkaSourceRDD

Creating KafkaSourceRDD Instance

Computing Partition (in TaskContext) — compute Method

count Operator

countApprox Operator

isEmpty Operator

persist Operator

getPartitions Method

getPreferredLocations Method

resolveRange Internal Method

相关推荐

欢迎关注：spark技术分享

热门标签

近期文章

分类目录

关注公众号：spark技术分享

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部

Computing Partition (in TaskContext) — `compute` Method

`count` Operator

`countApprox` Operator

`isEmpty` Operator

`persist` Operator

`getPartitions` Method

`getPreferredLocations` Method

`resolveRange` Internal Method