关注 spark技术分享,
撸spark源码 玩spark最佳实践

Anatomy of Spark Application

Anatomy of Spark Application

Every Spark application starts from creating SparkContext.

Note
Without SparkContext no computation (as a Spark job) can be started.
Note
A Spark application is an instance of SparkContext. Or, put it differently, a Spark context constitutes a Spark application.

A Spark application is uniquely identified by a pair of the application and application attempt ids.

  1. Master URL to connect the application to

  2. Create Spark configuration

  3. Create Spark context

  4. Create lines RDD

  5. Execute count action

Tip
Spark shell creates a Spark context and SQL context for you at startup.

When a Spark application starts (using spark-submit script or as a standalone application), it connects to Spark master as described by master URL. It is part of Spark context’s initialization.

spark submit master workers.png
Figure 1. Submitting Spark application to master using master URL
Note
Your Spark application can run locally or on the cluster which is based on the cluster manager and the deploy mode (--deploy-mode). Refer to Deployment Modes.

You can then create RDDs, transform them to other RDDs and ultimately execute actions. You can also cache interim RDDs to speed up data processing.

After all the data processing is completed, the Spark application finishes by stopping the Spark context.

赞(0) 打赏
未经允许不得转载:spark技术分享 » Anatomy of Spark Application
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏