Spark SQL — Batch and Streaming Queries Over Structured Data on Massive Scale

2015-03-10 分类：spark-core 阅读(1457) 评论(0)

Spark SQL — Batch and Streaming Queries Over Structured Data on Massive Scale

Like Apache Spark in general, Spark SQL in particular is all about distributed in-memory computations on massive scale.

The primary difference between Spark SQL’s and the “bare” Spark Core’s RDD computation models is the framework for loading, querying and persisting structured and semi-structured data using structured queries that can be expressed using good ol’ SQL, HiveQL and the custom high-level SQL-like, declarative, type-safe Dataset API called Structured Query DSL.