SparkHadoopUtil-spark技术分享

SparkHadoopUtil

Tip

Enable DEBUG logging level for org.apache.spark.deploy.SparkHadoopUtil logger to see what happens inside.

Add the following line to conf/log4j.properties:



log4j.logger.org.apache.spark.deploy.SparkHadoopUtil=DEBUG

log4j.logger.org.apache.spark.deploy.SparkHadoopUtil=DEBUG

Refer to Logging.

Creating SparkHadoopUtil Instance — `get` Method

Caution

FIXME

`substituteHadoopVariables` Method

Caution

FIXME

`transferCredentials` Method

Caution

FIXME

`newConfiguration` Method

Caution

FIXME

`conf` Method

Caution

FIXME

`stopCredentialUpdater` Method

Caution

FIXME

Running Executable Block As Spark User — `runAsSparkUser` Method



runAsSparkUser(func: () => Unit)

runAsSparkUser(func: () => Unit)

runAsSparkUser runs func function with Hadoop’s UserGroupInformation of the current user as a thread local variable (and distributed to child threads). It is later used for authenticating HDFS and YARN calls.

Internally, runAsSparkUser reads the current username (as SPARK_USER environment variable or the short user name from Hadoop’s UserGroupInformation).

Caution

FIXME How to use SPARK_USER to change the current user name?

You should see the current username printed out in the following DEBUG message in the logs:



DEBUG YarnSparkHadoopUtil: running as user: [user]

DEBUG YarnSparkHadoopUtil: running as user: [user]

It then creates a remote user for the current user (using UserGroupInformation.createRemoteUser), transfers credential tokens and runs the input func function as the privileged user.

SparkHadoopUtil