SparkSubmitCommandBuilder Command Builder
SparkSubmitCommandBuilder is used to build a command that spark-submit and SparkLauncher use to launch a Spark application.
SparkSubmitCommandBuilder uses the first argument to distinguish between shells:
-
pyspark-shell-main -
sparkr-shell-main -
run-example
|
Caution
|
FIXME Describe run-example
|
SparkSubmitCommandBuilder parses command-line arguments using OptionParser (which is a SparkSubmitOptionParser). OptionParser comes with the following methods:
-
handleto handle the known options (see the table below). It sets upmaster,deployMode,propertiesFile,conf,mainClass,sparkArgsinternal properties. -
handleUnknownto handle unrecognized options that usually lead toUnrecognized optionerror message. -
handleExtraArgsto handle extra arguments that are considered a Spark application’s arguments.
|
Note
|
For spark-shell it assumes that the application arguments are after spark-submit‘s arguments.
|
SparkSubmitCommandBuilder.buildCommand / buildSparkSubmitCommand
|
1 2 3 4 5 |
public List<String> buildCommand(Map<String, String> env) |
|
Note
|
buildCommand is part of the AbstractCommandBuilder public API.
|
SparkSubmitCommandBuilder.buildCommand simply passes calls on to buildSparkSubmitCommand private method (unless it was executed for pyspark or sparkr scripts which we are not interested in in this document).
buildSparkSubmitCommand Internal Method
|
1 2 3 4 5 |
private List<String> buildSparkSubmitCommand(Map<String, String> env) |
buildSparkSubmitCommand starts by building so-called effective config. When in client mode, buildSparkSubmitCommand adds spark.driver.extraClassPath to the result Spark command.
|
Note
|
Use spark-submit to have spark.driver.extraClassPath in effect.
|
buildSparkSubmitCommand builds the first part of the Java command passing in the extra classpath (only for client deploy mode).
|
Caution
|
FIXME Add isThriftServer case.
|
buildSparkSubmitCommand appends SPARK_SUBMIT_OPTS and SPARK_JAVA_OPTS environment variables.
(only for client deploy mode) …
|
Caution
|
FIXME Elaborate on the client deply mode case. |
addPermGenSizeOpt case…elaborate
|
Caution
|
FIXME Elaborate on addPermGenSizeOpt
|
buildSparkSubmitCommand appends org.apache.spark.deploy.SparkSubmit and the command-line arguments (using buildSparkSubmitArgs).
buildSparkSubmitArgs method
|
1 2 3 4 5 |
List<String> buildSparkSubmitArgs() |
buildSparkSubmitArgs builds a list of command-line arguments for spark-submit.
buildSparkSubmitArgs uses a SparkSubmitOptionParser to add the command-line arguments that spark-submit recognizes (when it is executed later on and uses the very same SparkSubmitOptionParser parser to parse command-line arguments).
SparkSubmitCommandBuilder Property |
SparkSubmitOptionParser Attribute |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
getEffectiveConfig Internal Method
|
1 2 3 4 5 |
Map<String, String> getEffectiveConfig() |
getEffectiveConfig internal method builds effectiveConfig that is conf with the Spark properties file loaded (using loadPropertiesFile internal method) skipping keys that have already been loaded (it happened when the command-line options were parsed in handle method).
|
Note
|
Command-line options (e.g. --driver-class-path) have higher precedence than their corresponding Spark settings in a Spark properties file (e.g. spark.driver.extraClassPath). You can therefore control the final settings by overriding Spark settings on command line using the command-line options.charset and trims white spaces around values. |
isClientMode Internal Method
|
1 2 3 4 5 |
private boolean isClientMode(Map<String, String> userProps) |
isClientMode checks master first (from the command-line options) and then spark.master Spark property. Same with deployMode and spark.submit.deployMode.
|
Caution
|
FIXME Review master and deployMode. How are they set?
|
isClientMode responds positive when no explicit master and client deploy mode set explicitly.
OptionParser
OptionParser is a custom SparkSubmitOptionParser that SparkSubmitCommandBuilder uses to parse command-line arguments. It defines all the SparkSubmitOptionParser callbacks, i.e. handle, handleUnknown, and handleExtraArgs, for command-line argument handling.
OptionParser’s handle Callback
|
1 2 3 4 5 |
boolean handle(String opt, String value) |
OptionParser comes with a custom handle callback (from the SparkSubmitOptionParser callbacks).
| Command-Line Option | Property / Behaviour |
|---|---|
|
|
|
|
|
|
|
|
|
|
|
Sets |
|
|
Sets |
|
|
Sets |
|
|
Sets |
|
|
Expects a |
|
|
Sets It may also set |
|
|
Disables |
|
|
Disables |
|
|
Disables |
|
anything else |
Adds an element to |
OptionParser’s handleUnknown Method
|
1 2 3 4 5 |
boolean handleUnknown(String opt) |
If allowsMixedArguments is enabled, handleUnknown simply adds the input opt to appArgs and allows for further parsing of the argument list.
|
Caution
|
FIXME Where’s allowsMixedArguments enabled?
|
If isExample is enabled, handleUnknown sets mainClass to be org.apache.spark.examples.[opt] (unless the input opt has already the package prefix) and stops further parsing of the argument list.
|
Caution
|
FIXME Where’s isExample enabled?
|
Otherwise, handleUnknown sets appResource and stops further parsing of the argument list.
spark技术分享