Building Apache Spark from Sources
You can download pre-packaged versions of Apache Spark from the project’s web site. The packages are built for a different Hadoop versions for Scala 2.11.
Note
|
Since [SPARK-6363][BUILD] Make Scala 2.11 the default Scala version the default version of Scala in Apache Spark is 2.11. |
The build process for Scala 2.11 takes less than 15 minutes (on a decent machine like my shiny MacBook Pro with 8 cores and 16 GB RAM) and is so simple that it’s unlikely to refuse the urge to do it yourself.
Using sbt as the build tool
The build command with sbt as the build tool is as follows:
1 2 3 4 5 |
./build/sbt -Phadoop-2.7,yarn,mesos,hive,hive-thriftserver -DskipTests clean assembly |
Using Java 8 to build Spark using sbt takes ca 10 minutes.
1 2 3 4 5 6 7 |
➜ spark git:(master) ✗ ./build/sbt -Phadoop-2.7,yarn,mesos,hive,hive-thriftserver -DskipTests clean assembly ... [success] Total time: 496 s, completed Dec 7, 2015 8:24:41 PM |
Build Profiles
Caution
|
FIXME Describe yarn profile and others |
hive-thriftserver
Maven profile for Spark Thrift Server
Caution
|
FIXME |
Using Apache Maven as the build tool
The build command with Apache Maven is as follows:
1 2 3 4 5 |
$ ./build/mvn -Phadoop-2.7,yarn,mesos,hive,hive-thriftserver -DskipTests clean install |
After a couple of minutes your freshly baked distro is ready to fly!
I’m using Oracle Java 8 to build Spark.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 |
➜ spark git:(master) ✗ java -version java version "1.8.0_102" Java(TM) SE Runtime Environment (build 1.8.0_102-b14) Java HotSpot(TM) 64-Bit Server VM (build 25.102-b14, mixed mode) ➜ spark git:(master) ✗ ./build/mvn -Phadoop-2.7,yarn,mesos,hive,hive-thriftserver -DskipTests clean install Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 Using `mvn` from path: /usr/local/bin/mvn Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=512M; support was removed in 8.0 [INFO] Scanning for projects... [INFO] ------------------------------------------------------------------------ [INFO] Reactor Build Order: [INFO] [INFO] Spark Project Parent POM [INFO] Spark Project Tags [INFO] Spark Project Sketch [INFO] Spark Project Networking [INFO] Spark Project Shuffle Streaming Service [INFO] Spark Project Unsafe [INFO] Spark Project Launcher [INFO] Spark Project Core [INFO] Spark Project GraphX [INFO] Spark Project Streaming [INFO] Spark Project Catalyst [INFO] Spark Project SQL [INFO] Spark Project ML Local Library [INFO] Spark Project ML Library [INFO] Spark Project Tools [INFO] Spark Project Hive [INFO] Spark Project REPL [INFO] Spark Project YARN Shuffle Service [INFO] Spark Project YARN [INFO] Spark Project Hive Thrift Server [INFO] Spark Project Assembly [INFO] Spark Project External Flume Sink [INFO] Spark Project External Flume [INFO] Spark Project External Flume Assembly [INFO] Spark Integration for Kafka 0.8 [INFO] Spark Project Examples [INFO] Spark Project External Kafka Assembly [INFO] Spark Integration for Kafka 0.10 [INFO] Spark Integration for Kafka 0.10 Assembly [INFO] Spark Project Java 8 Tests [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building Spark Project Parent POM 2.0.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ ... [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 4.186 s] [INFO] Spark Project Tags ................................. SUCCESS [ 4.893 s] [INFO] Spark Project Sketch ............................... SUCCESS [ 5.066 s] [INFO] Spark Project Networking ........................... SUCCESS [ 11.108 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 7.051 s] [INFO] Spark Project Unsafe ............................... SUCCESS [ 7.650 s] [INFO] Spark Project Launcher ............................. SUCCESS [ 9.905 s] [INFO] Spark Project Core ................................. SUCCESS [02:09 min] [INFO] Spark Project GraphX ............................... SUCCESS [ 19.317 s] [INFO] Spark Project Streaming ............................ SUCCESS [ 42.077 s] [INFO] Spark Project Catalyst ............................. SUCCESS [01:32 min] [INFO] Spark Project SQL .................................. SUCCESS [01:47 min] [INFO] Spark Project ML Local Library ..................... SUCCESS [ 10.049 s] [INFO] Spark Project ML Library ........................... SUCCESS [01:36 min] [INFO] Spark Project Tools ................................ SUCCESS [ 3.520 s] [INFO] Spark Project Hive ................................. SUCCESS [ 52.528 s] [INFO] Spark Project REPL ................................. SUCCESS [ 7.243 s] [INFO] Spark Project YARN Shuffle Service ................. SUCCESS [ 7.898 s] [INFO] Spark Project YARN ................................. SUCCESS [ 15.380 s] [INFO] Spark Project Hive Thrift Server ................... SUCCESS [ 24.876 s] [INFO] Spark Project Assembly ............................. SUCCESS [ 2.971 s] [INFO] Spark Project External Flume Sink .................. SUCCESS [ 7.377 s] [INFO] Spark Project External Flume ....................... SUCCESS [ 10.752 s] [INFO] Spark Project External Flume Assembly .............. SUCCESS [ 1.695 s] [INFO] Spark Integration for Kafka 0.8 .................... SUCCESS [ 13.013 s] [INFO] Spark Project Examples ............................. SUCCESS [ 31.728 s] [INFO] Spark Project External Kafka Assembly .............. SUCCESS [ 3.472 s] [INFO] Spark Integration for Kafka 0.10 ................... SUCCESS [ 12.297 s] [INFO] Spark Integration for Kafka 0.10 Assembly .......... SUCCESS [ 3.789 s] [INFO] Spark Project Java 8 Tests ......................... SUCCESS [ 4.267 s] [INFO] ------------------------------------------------------------------------ [INFO] BUILD SUCCESS [INFO] ------------------------------------------------------------------------ [INFO] Total time: 12:29 min [INFO] Finished at: 2016-07-07T22:29:56+02:00 [INFO] Final Memory: 110M/913M [INFO] ------------------------------------------------------------------------ |
Please note the messages that say the version of Spark (Building Spark Project Parent POM 2.0.0-SNAPSHOT), Scala version (maven-clean-plugin:2.6.1:clean (default-clean) @ spark-parent_2.11) and the Spark modules built.
The above command gives you the latest version of Apache Spark 2.0.0-SNAPSHOT built for Scala 2.11.8 (see the configuration of scala-2.11
profile).
Tip
|
You can also know the version of Spark using ./bin/spark-shell --version .
|
Making Distribution
./make-distribution.sh
is the shell script to make a distribution. It uses the same profiles as for sbt and Maven.
Use --tgz
option to have a tar gz version of the Spark distribution.
1 2 3 4 5 |
➜ spark git:(master) ✗ ./make-distribution.sh --tgz -Phadoop-2.7,yarn,mesos,hive,hive-thriftserver -DskipTests |
Once finished, you will have the distribution in the current directory, i.e. spark-2.0.0-SNAPSHOT-bin-2.7.2.tgz
.