关注 spark技术分享,
撸spark源码 玩spark最佳实践

NewHadoopRDD

NewHadoopRDD

NewHadoopRDD is an RDD of K keys and V values.

  • SparkContext.newAPIHadoopFile

  • SparkContext.newAPIHadoopRDD

  • (indirectly) SparkContext.binaryFiles

  • (indirectly) SparkContext.wholeTextFiles

Note
NewHadoopRDD is the base RDD of BinaryFileRDD and WholeTextFileRDD.

getPreferredLocations Method

Caution
FIXME

Creating NewHadoopRDD Instance

NewHadoopRDD takes the following when created:

  • SparkContext

  • HDFS’ InputFormat[K, V]

  • K class name

  • V class name

  • transient HDFS’ Configuration

NewHadoopRDD initializes the internal registries and counters.

赞(0) 打赏
未经允许不得转载:spark技术分享 » NewHadoopRDD
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏