关注 spark技术分享,
撸spark源码 玩spark最佳实践

Spark Shell — spark-shell shell script

Spark Shell — spark-shell shell script

Spark shell is an interactive environment where you can learn how to make the most out of Apache Spark quickly and conveniently.

Tip
Spark shell is particularly helpful for fast interactive prototyping.

Under the covers, Spark shell is a standalone Spark application written in Scala that offers environment with auto-completion (using TAB key) where you can run ad-hoc queries and get familiar with the features of Spark (that help you in developing your own standalone Spark applications). It is a very convenient tool to explore the many things available in Spark with immediate feedback. It is one of the many reasons why Spark is so helpful for tasks to process datasets of any size.

There are variants of Spark shell for different languages: spark-shell for Scala, pyspark for Python and sparkR for R.

Note
This document (and the book in general) uses spark-shell for Scala only.

You can start Spark shell using spark-shell script.

spark-shell is an extension of Scala REPL with automatic instantiation of SparkSession as spark (and SparkContext as sc).

spark-shell also imports Scala SQL’s implicits and sql method.

Note

When you execute spark-shell you actually execute Spark submit as follows:

Set SPARK_PRINT_LAUNCH_COMMAND to see the entire command to be executed. Refer to Print Launch Command of Spark Scripts.

Using Spark shell

You start Spark shell using spark-shell script (available in bin directory).

Spark shell creates an instance of SparkSession under the name spark for you (so you don’t have to know the details how to do it yourself on day 1).

Besides, there is also sc value created which is an instance of SparkContext.

To close Spark shell, you press Ctrl+D or type in :q (or any subset of :quit).

Settings

Table 1. Spark Properties
Spark Property Default Value Description

spark.repl.class.uri

null

Used in spark-shell to create REPL ClassLoader to load new classes defined in the Scala REPL as a user types code.

Enable INFO logging level for org.apache.spark.executor.Executor logger to have the value printed out to the logs:

INFO Using REPL class URI: [classUri]

赞(0) 打赏
未经允许不得转载:spark技术分享 » Spark Shell — spark-shell shell script
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏