关注 spark技术分享,
撸spark源码 玩spark最佳实践

KafkaSourceRDD

KafkaSourceRDD

KafkaSourceRDD is an RDD of Kafka’s ConsumerRecords (with keys and values being collections of bytes, i.e. Array[Byte]).

KafkaSourceRDD uses KafkaSourceRDDPartition for the partitions.

KafkaSourceRDD has a specialized API for the following RDD operators:

KafkaSourceRDD is created when:

Creating KafkaSourceRDD Instance

KafkaSourceRDD takes the following when created:

  • SparkContext

  • Collection of key-value settings for executors reading records from Kafka topics

  • Collection of KafkaSourceRDDOffsetRanges

  • Timeout (in milliseconds) to poll data from Kafka

    Used exclusively when KafkaSourceRDD is requested to compute a RDD partition (and requests a KafkaDataConsumer for a ConsumerRecord)

  • failOnDataLoss flag to control…​FIXME

  • reuseKafkaConsumer flag to control…​FIXME

KafkaSourceRDD initializes the internal registries and counters.

Computing Partition (in TaskContext) — compute Method

Note
compute is part of Spark Core’s RDD Contract to compute a partition (in a TaskContext).

compute…​FIXME

count Operator

Note
count is part of Spark Core’s RDD Contract to…​FIXME.

count…​FIXME

countApprox Operator

Note
countApprox is part of Spark Core’s RDD Contract to…​FIXME.

countApprox…​FIXME

isEmpty Operator

Note
isEmpty is part of Spark Core’s RDD Contract to…​FIXME.

isEmpty…​FIXME

persist Operator

Note
persist is part of Spark Core’s RDD Contract to…​FIXME.

persist…​FIXME

getPartitions Method

Note
getPartitions is part of Spark Core’s RDD Contract to…​FIXME

getPreferredLocations Method

Note
getPreferredLocations is part of the RDD Contract to…​FIXME.

getPreferredLocations…​FIXME

resolveRange Internal Method

resolveRange…​FIXME

Note
resolveRange is used exclusively when KafkaSourceRDD is requested to compute a partition.
赞(0) 打赏
未经允许不得转载:spark技术分享 » KafkaSourceRDD
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏