You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[SPARK-2669] [yarn] Distribute client configuration to AM.
Currently, when Spark launches the Yarn AM, the process will use
the local Hadoop configuration on the node where the AM launches,
if one is present. A more correct approach is to use the same
configuration used to launch the Spark job, since the user may
have made modifications (such as adding app-specific configs).
The approach taken here is to use the distributed cache to make
all files in the Hadoop configuration directory available to the
AM. This is a little overkill since only the AM needs them (the
executors use the broadcast Hadoop configuration from the driver),
but is the easier approach.
Even though only a few files in that directory may end up being
used, all of them are uploaded. This allows supporting use cases
such as when auxiliary configuration files are used for SSL
configuration, or when uploading a Hive configuration directory.
Not all of these may be reflected in a o.a.h.conf.Configuration object,
but may be needed when a driver in cluster mode instantiates, for
example, a HiveConf object instead.
Author: Marcelo Vanzin <[email protected]>
Closes#4142 from vanzin/SPARK-2669 and squashes the following commits:
f5434b9 [Marcelo Vanzin] Merge branch 'master' into SPARK-2669013f0fb [Marcelo Vanzin] Review feedback.
f693152 [Marcelo Vanzin] Le sigh.
ed45b7d [Marcelo Vanzin] Zip all config files and upload them as an archive.
5927b6b [Marcelo Vanzin] Merge branch 'master' into SPARK-2669cbb9fb3 [Marcelo Vanzin] Remove stale test.
e3e58d0 [Marcelo Vanzin] Merge branch 'master' into SPARK-2669e3d0613 [Marcelo Vanzin] Review feedback.
34bdbd8 [Marcelo Vanzin] Fix test.
022a688 [Marcelo Vanzin] Merge branch 'master' into SPARK-2669a77ddd5 [Marcelo Vanzin] Merge branch 'master' into SPARK-266979221c7 [Marcelo Vanzin] [SPARK-2669] [yarn] Distribute client configuration to AM.
Copy file name to clipboardExpand all lines: docs/running-on-yarn.md
+5-1Lines changed: 5 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -211,7 +211,11 @@ Most of the configs are the same for Spark on YARN as for other deployment modes
211
211
# Launching Spark on YARN
212
212
213
213
Ensure that `HADOOP_CONF_DIR` or `YARN_CONF_DIR` points to the directory which contains the (client side) configuration files for the Hadoop cluster.
214
-
These configs are used to write to the dfs and connect to the YARN ResourceManager.
214
+
These configs are used to write to the dfs and connect to the YARN ResourceManager. The
215
+
configuration contained in this directory will be distributed to the YARN cluster so that all
216
+
containers used by the application use the same configuration. If the configuration references
217
+
Java system properties or environment variables not managed by YARN, they should also be set in the
218
+
Spark application's configuration (driver, executors, and the AM when running in client mode).
215
219
216
220
There are two deploy modes that can be used to launch Spark applications on YARN. In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by YARN on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from YARN.
0 commit comments