关注 spark技术分享,
撸spark源码 玩spark最佳实践

Datasets vs DataFrames vs RDDs

Datasets vs DataFrames vs RDDs

Many may have been asking yourself why they should be using Datasets rather than the foundation of all Spark – RDDs using case classes.

This document collects advantages of Dataset vs RDD[CaseClass] to answer the question Dan has asked on twitter:

“In #Spark, what is the advantage of a DataSet over an RDD[CaseClass]?”

Saving to or Writing from Data Sources

With Dataset API, loading data from a data source or saving it to one is as simple as using SparkSession.read or Dataset.write methods, appropriately.

Accessing Fields / Columns

You select columns in a datasets without worrying about the positions of the columns.

In RDD, you have to do an additional hop over a case class and access fields by name.

赞(0) 打赏
未经允许不得转载:spark技术分享 » Datasets vs DataFrames vs RDDs
分享到: 更多 (0)




