关注 spark技术分享,
撸spark源码 玩spark最佳实践

Building Apache Spark from Sources

Building Apache Spark from Sources

You can download pre-packaged versions of Apache Spark from the project’s web site. The packages are built for a different Hadoop versions for Scala 2.11.

Note
Since [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version the default version of Scala in Apache Spark is 2.11.

The build process for Scala 2.11 takes less than 15 minutes (on a decent machine like my shiny MacBook Pro with 8 cores and 16 GB RAM) and is so simple that it’s unlikely to refuse the urge to do it yourself.

You can use sbt or Maven as the build command.

Using sbt as the build tool

The build command with sbt as the build tool is as follows:

Using Java 8 to build Spark using sbt takes ca 10 minutes.

Build Profiles

Caution
FIXME Describe yarn profile and others

hive-thriftserver Maven profile for Spark Thrift Server

Caution
FIXME

Using Apache Maven as the build tool

The build command with Apache Maven is as follows:

After a couple of minutes your freshly baked distro is ready to fly!

I’m using Oracle Java 8 to build Spark.

Please note the messages that say the version of Spark (Building Spark Project Parent POM 2.0.0-SNAPSHOT), Scala version (maven-clean-plugin:2.6.1:clean (default-clean) @ spark-parent_2.11) and the Spark modules built.

The above command gives you the latest version of Apache Spark 2.0.0-SNAPSHOT built for Scala 2.11.8 (see the configuration of scala-2.11 profile).

Tip
You can also know the version of Spark using ./bin/spark-shell --version.

Making Distribution

./make-distribution.sh is the shell script to make a distribution. It uses the same profiles as for sbt and Maven.

Use --tgz option to have a tar gz version of the Spark distribution.

Once finished, you will have the distribution in the current directory, i.e. spark-2.0.0-SNAPSHOT-bin-2.7.2.tgz.

赞(0) 打赏
未经允许不得转载:spark技术分享 » Building Apache Spark from Sources
分享到: 更多 (0)

关注公众号:spark技术分享

联系我们联系我们

觉得文章有用就打赏一下文章作者

支付宝扫一扫打赏

微信扫一扫打赏