You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`SparkHadoopWriter` utility is used to <<write, write a key-value RDD (as a Hadoop OutputFormat)>>.
3
+
## <spanid="write"> Writing Key-Value RDD Out (As Hadoop OutputFormat)
4
4
5
-
`SparkHadoopWriter` utility is used by rdd:PairRDDFunctions.md#saveAsNewAPIHadoopDataset[saveAsNewAPIHadoopDataset] and rdd:PairRDDFunctions.md#saveAsHadoopDataset[saveAsHadoopDataset] transformations.
6
-
7
-
[[logging]]
8
-
[TIP]
9
-
====
10
-
Enable `ALL` logging level for `org.apache.spark.internal.io.SparkHadoopWriter` logger to see what happens inside.
11
-
12
-
Add the following line to `conf/log4j.properties`:
`write` uses the id of the given RDD as the `commitJobId`.
32
16
33
-
[[write-jobTrackerId]]
17
+
<spanid="write-jobTrackerId">
34
18
`write` creates a `jobTrackerId` with the current date.
35
19
36
-
[[write-jobContext]]
37
-
`write` requests the given `HadoopWriteConfigUtil` to <<HadoopWriteConfigUtil.md#createJobContext, create a Hadoop JobContext>> (for the <<write-jobTrackerId, jobTrackerId>> and <<write-commitJobId, commitJobId>>).
20
+
<spanid="write-jobContext">
21
+
`write` requests the given `HadoopWriteConfigUtil` to [create a Hadoop JobContext](HadoopWriteConfigUtil.md#createJobContext) (for the [jobTrackerId](#write-jobTrackerId)and [commitJobId](#write-commitJobId)).
38
22
39
-
`write` requests the given `HadoopWriteConfigUtil` to <<HadoopWriteConfigUtil.md#initOutputFormat, initOutputFormat>> with the Hadoop https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/mapreduce/JobContext.html[JobContext].
23
+
`write` requests the given `HadoopWriteConfigUtil` to [initOutputFormat](HadoopWriteConfigUtil.md#initOutputFormat)with the Hadoop [JobContext]({{ hadoop.api }}/api/org/apache/hadoop/mapreduce/JobContext.html).
40
24
41
-
`write` requests the given `HadoopWriteConfigUtil` to <<HadoopWriteConfigUtil.md#assertConf, assertConf>>.
25
+
`write` requests the given `HadoopWriteConfigUtil` to [assertConf](HadoopWriteConfigUtil.md#assertConf).
42
26
43
-
`write` requests the given `HadoopWriteConfigUtil` to <<HadoopWriteConfigUtil.md#createCommitter, create a HadoopMapReduceCommitProtocol committer>> for the <<write-commitJobId, commitJobId>>.
27
+
`write` requests the given `HadoopWriteConfigUtil` to [create a HadoopMapReduceCommitProtocol committer](HadoopWriteConfigUtil.md#createCommitter) for the [commitJobId](#write-commitJobId).
44
28
45
-
`write` requests the `HadoopMapReduceCommitProtocol` to <<HadoopMapReduceCommitProtocol.md#setupJob, setupJob>> (with the <<write-jobContext, jobContext>>).
29
+
`write` requests the `HadoopMapReduceCommitProtocol` to [setupJob](HadoopMapReduceCommitProtocol.md#setupJob)(with the [jobContext](#write-jobContext)).
46
30
47
-
[[write-runJob]][[write-executeTask]]
48
-
`write` uses the `SparkContext` (of the given RDD) to SparkContext.md#runJob[run a Spark job asynchronously] for the given RDD with the <<executeTask, executeTask>> partition function.
`write` uses the `SparkContext` (of the given RDD) to [run a Spark job asynchronously](SparkContext.md#runJob) for the given RDD with the [executeTask](#executeTask) partition function.
49
33
50
-
[[write-commitJob]]
51
-
In the end, `write` requests the <<write-committer, HadoopMapReduceCommitProtocol>> to <<HadoopMapReduceCommitProtocol.md#commitJob, commit the job>> and prints out the following INFO message to the logs:
34
+
<spanid="write-commitJob">
35
+
In the end, `write` requests the [HadoopMapReduceCommitProtocol](#write-committer)to [commit the job](HadoopMapReduceCommitProtocol.md#commitJob) and prints out the following INFO message to the logs:
52
36
53
-
```
37
+
```text
54
38
Job [getJobID] committed.
55
39
```
56
40
57
-
NOTE: `write` is used when `PairRDDFunctions` is requested to rdd:PairRDDFunctions.md#saveAsNewAPIHadoopDataset[saveAsNewAPIHadoopDataset] and rdd:PairRDDFunctions.md#saveAsHadoopDataset[saveAsHadoopDataset].
58
-
59
-
=== [[write-Throwable]]`write` Utility And Throwables
41
+
### <spanid="write-Throwable"> Throwables
60
42
61
43
In case of any `Throwable`, `write` prints out the following ERROR message to the logs:
62
44
63
-
```
45
+
```text
64
46
Aborting job [getJobID].
65
47
```
66
48
67
-
[[write-abortJob]]
68
-
`write` requests the <<write-committer, HadoopMapReduceCommitProtocol>> to <<HadoopMapReduceCommitProtocol.md#abortJob, abort the job>> and throws a `SparkException`:
49
+
<spanid="write-abortJob">
50
+
`write` requests the [HadoopMapReduceCommitProtocol](#write-committer)to [abort the job](HadoopMapReduceCommitProtocol.md#abortJob) and throws a `SparkException`:
0 commit comments