SparkSubmitCommandBuilder
Command Builder
SparkSubmitCommandBuilder
is used to build a command that spark-submit and SparkLauncher use to launch a Spark application.
SparkSubmitCommandBuilder
uses the first argument to distinguish between shells:
-
pyspark-shell-main
-
sparkr-shell-main
-
run-example
Caution
|
FIXME Describe run-example
|
SparkSubmitCommandBuilder
parses command-line arguments using OptionParser
(which is a SparkSubmitOptionParser). OptionParser
comes with the following methods:
-
handle
to handle the known options (see the table below). It sets upmaster
,deployMode
,propertiesFile
,conf
,mainClass
,sparkArgs
internal properties. -
handleUnknown
to handle unrecognized options that usually lead toUnrecognized option
error message. -
handleExtraArgs
to handle extra arguments that are considered a Spark application’s arguments.
Note
|
For spark-shell it assumes that the application arguments are after spark-submit ‘s arguments.
|
SparkSubmitCommandBuilder.buildCommand
/ buildSparkSubmitCommand
1 2 3 4 5 |
public List<String> buildCommand(Map<String, String> env) |
Note
|
buildCommand is part of the AbstractCommandBuilder public API.
|
SparkSubmitCommandBuilder.buildCommand
simply passes calls on to buildSparkSubmitCommand private method (unless it was executed for pyspark
or sparkr
scripts which we are not interested in in this document).
buildSparkSubmitCommand
Internal Method
1 2 3 4 5 |
private List<String> buildSparkSubmitCommand(Map<String, String> env) |
buildSparkSubmitCommand
starts by building so-called effective config. When in client mode, buildSparkSubmitCommand
adds spark.driver.extraClassPath to the result Spark command.
Note
|
Use spark-submit to have spark.driver.extraClassPath in effect.
|
buildSparkSubmitCommand
builds the first part of the Java command passing in the extra classpath (only for client
deploy mode).
Caution
|
FIXME Add isThriftServer case.
|
buildSparkSubmitCommand
appends SPARK_SUBMIT_OPTS
and SPARK_JAVA_OPTS
environment variables.
(only for client
deploy mode) …
Caution
|
FIXME Elaborate on the client deply mode case. |
addPermGenSizeOpt
case…elaborate
Caution
|
FIXME Elaborate on addPermGenSizeOpt
|
buildSparkSubmitCommand
appends org.apache.spark.deploy.SparkSubmit
and the command-line arguments (using buildSparkSubmitArgs).
buildSparkSubmitArgs
method
1 2 3 4 5 |
List<String> buildSparkSubmitArgs() |
buildSparkSubmitArgs
builds a list of command-line arguments for spark-submit.
buildSparkSubmitArgs
uses a SparkSubmitOptionParser to add the command-line arguments that spark-submit
recognizes (when it is executed later on and uses the very same SparkSubmitOptionParser
parser to parse command-line arguments).
SparkSubmitCommandBuilder Property |
SparkSubmitOptionParser Attribute |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
getEffectiveConfig
Internal Method
1 2 3 4 5 |
Map<String, String> getEffectiveConfig() |
getEffectiveConfig
internal method builds effectiveConfig
that is conf
with the Spark properties file loaded (using loadPropertiesFile internal method) skipping keys that have already been loaded (it happened when the command-line options were parsed in handle method).
Note
|
Command-line options (e.g. --driver-class-path ) have higher precedence than their corresponding Spark settings in a Spark properties file (e.g. spark.driver.extraClassPath ). You can therefore control the final settings by overriding Spark settings on command line using the command-line options.charset and trims white spaces around values. |
isClientMode
Internal Method
1 2 3 4 5 |
private boolean isClientMode(Map<String, String> userProps) |
isClientMode
checks master
first (from the command-line options) and then spark.master
Spark property. Same with deployMode
and spark.submit.deployMode
.
Caution
|
FIXME Review master and deployMode . How are they set?
|
isClientMode
responds positive when no explicit master and client
deploy mode set explicitly.
OptionParser
OptionParser
is a custom SparkSubmitOptionParser that SparkSubmitCommandBuilder
uses to parse command-line arguments. It defines all the SparkSubmitOptionParser callbacks, i.e. handle, handleUnknown, and handleExtraArgs, for command-line argument handling.
OptionParser’s handle
Callback
1 2 3 4 5 |
boolean handle(String opt, String value) |
OptionParser
comes with a custom handle
callback (from the SparkSubmitOptionParser callbacks).
Command-Line Option | Property / Behaviour |
---|---|
|
|
|
|
|
|
|
Sets |
|
Sets |
|
Sets |
|
Sets |
|
Expects a |
|
Sets It may also set |
|
Disables |
|
Disables |
|
Disables |
anything else |
Adds an element to |
OptionParser’s handleUnknown
Method
1 2 3 4 5 |
boolean handleUnknown(String opt) |
If allowsMixedArguments
is enabled, handleUnknown
simply adds the input opt
to appArgs
and allows for further parsing of the argument list.
Caution
|
FIXME Where’s allowsMixedArguments enabled?
|
If isExample
is enabled, handleUnknown
sets mainClass
to be org.apache.spark.examples.[opt]
(unless the input opt
has already the package prefix) and stops further parsing of the argument list.
Caution
|
FIXME Where’s isExample enabled?
|
Otherwise, handleUnknown
sets appResource
and stops further parsing of the argument list.