关注 spark技术分享,
撸spark源码 玩spark最佳实践

Kafka Data Source

Kafka Data Source

Spark SQL supports reading data from or writing data to one or more topics in Apache Kafka.

Note

Apache Kafka is a storage of records in a format-independent and fault-tolerant durable way.

Read up on Apache Kafka in the official documentation or in my other gitbook Mastering Apache Kafka.

Kafka Data Source supports options to get better performance of structured queries that use it.

Reading Data from Kafka Topics

As a Spark developer, you use DataFrameReader.format method to specify Apache Kafka as the external data source to load data from.

You use kafka (or org.apache.spark.sql.kafka010.KafkaSourceProvider) as the input data source format.

These one-liners create a DataFrame that represents the distributed process of loading data from one or many Kafka topics (with additional properties).

Writing Data to Kafka Topics

As a Spark developer,…​FIXME

赞(0) 打赏
未经允许不得转载:spark技术分享 » Kafka Data Source
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏