Anatomy of Spark Application-spark技术分享

Anatomy of Spark Application

Every Spark application starts from creating SparkContext.

Note	Without SparkContext no computation (as a Spark job) can be started.

Note	A Spark application is an instance of SparkContext. Or, put it differently, a Spark context constitutes a Spark application.

A Spark application is uniquely identified by a pair of the application and application attempt ids.

For it to work, you have to create a Spark configuration using SparkConf or use a custom SparkContext constructor.



package pl.japila.spark

import org.apache.spark.{SparkContext, SparkConf}

object SparkMeApp {
  def main(args: Array[String]) {

    val masterURL = "local[*]"  (1)

    val conf = new SparkConf()  (2)
      .setAppName("SparkMe Application")
      .setMaster(masterURL)

    val sc = new SparkContext(conf) (3)

    val fileName = util.Try(args(0)).getOrElse("build.sbt")

    val lines = sc.textFile(fileName).cache() (4)

    val c = lines.count() (5)
    println(s"There are $c lines in $fileName")
  }
}

package pl.japila.spark

import org.apache.spark.{SparkContext, SparkConf}

object SparkMeApp {

def main(args: Array[String]) {

val masterURL = "local[*]" (1)

val conf = new SparkConf() (2)

.setAppName("SparkMe Application")

.setMaster(masterURL)

val sc = new SparkContext(conf) (3)

val fileName = util.Try(args(0)).getOrElse("build.sbt")

val lines = sc.textFile(fileName).cache() (4)

val c = lines.count() (5)

println(s"There are $c lines in $fileName")

}

Master URL to connect the application to
Create Spark configuration
Create Spark context
Create lines RDD
Execute count action

Tip	Spark shell creates a Spark context and SQL context for you at startup.

When a Spark application starts (using spark-submit script or as a standalone application), it connects to Spark master as described by master URL. It is part of Spark context’s initialization.

Figure 1. Submitting Spark application to master using master URL

Note	Your Spark application can run locally or on the cluster which is based on the cluster manager and the deploy mode (`--deploy-mode`). Refer to Deployment Modes.

You can then create RDDs, transform them to other RDDs and ultimately execute actions. You can also cache interim RDDs to speed up data processing.

After all the data processing is completed, the Spark application finishes by stopping the Spark context.

Anatomy of Spark Application

Anatomy of Spark Application

相关推荐

欢迎关注：spark技术分享

热门标签

近期文章

分类目录

关注公众号：spark技术分享

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏

QQ咨询

回顶部