Skip to content

Commit a13531f

Browse files
HadoopWriteConfigUtils
1 parent e460e1c commit a13531f

7 files changed

+50
-74
lines changed

docs/HadoopMapRedCommitProtocol.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
= HadoopMapRedCommitProtocol
1+
# HadoopMapRedCommitProtocol
22

33
`HadoopMapRedCommitProtocol` is...FIXME

docs/HadoopMapRedWriteConfigUtil.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
= HadoopMapRedWriteConfigUtil
1+
# HadoopMapRedWriteConfigUtil
22

33
`HadoopMapRedWriteConfigUtil` is...FIXME
44

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
1-
= HadoopMapReduceCommitProtocol
1+
# HadoopMapReduceCommitProtocol
22

33
`HadoopMapReduceCommitProtocol` is...FIXME

docs/HadoopMapReduceWriteConfigUtil.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
1-
= HadoopMapReduceWriteConfigUtil
1+
# HadoopMapReduceWriteConfigUtil
22

33
`HadoopMapReduceWriteConfigUtil` is...FIXME
44

docs/HadoopWriteConfigUtil.md

Lines changed: 39 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -1,113 +1,86 @@
1-
= HadoopWriteConfigUtil
1+
# HadoopWriteConfigUtil
22

3-
`HadoopWriteConfigUtil[K, V]` is an <<contract, abstraction>> of <<implementations, writer configurers>>.
3+
`HadoopWriteConfigUtil[K, V]` is an [abstraction](#contract) of [writer configurers](#implementations) for [SparkHadoopWriter](SparkHadoopWriter.md) to [write a key-value RDD](SparkHadoopWriter.md#write) (for [RDD.saveAsNewAPIHadoopDataset](rdd/PairRDDFunctions.md#saveAsNewAPIHadoopDataset) and [RDD.saveAsHadoopDataset](rdd/PairRDDFunctions.md#saveAsHadoopDataset) operators).
44

5-
`HadoopWriteConfigUtil` is used for <<spark-internal-io-SparkHadoopWriter.md#, SparkHadoopWriter>> utility when requested to <<spark-internal-io-SparkHadoopWriter.md#write, write an RDD of key-value pairs>> (for rdd:PairRDDFunctions.md#saveAsNewAPIHadoopDataset[saveAsNewAPIHadoopDataset] and rdd:PairRDDFunctions.md#saveAsHadoopDataset[saveAsHadoopDataset] transformations).
5+
## Contract
66

7-
[[contract]]
8-
.HadoopWriteConfigUtil Contract
9-
[cols="30m,70",options="header",width="100%"]
10-
|===
11-
| Method
12-
| Description
7+
### <span id="assertConf"> assertConf
138

14-
| assertConf
15-
a| [[assertConf]]
16-
17-
[source, scala]
18-
----
9+
```scala
1910
assertConf(
2011
jobContext: JobContext,
2112
conf: SparkConf): Unit
22-
----
13+
```
2314

24-
| closeWriter
25-
a| [[closeWriter]]
15+
### <span id="closeWriter"> closeWriter
2616

27-
[source, scala]
28-
----
17+
```scala
2918
closeWriter(
3019
taskContext: TaskAttemptContext): Unit
31-
----
20+
```
3221

33-
| createCommitter
34-
a| [[createCommitter]]
22+
### <span id="createCommitter"> createCommitter
3523

36-
[source, scala]
37-
----
24+
```scala
3825
createCommitter(
3926
jobId: Int): HadoopMapReduceCommitProtocol
40-
----
27+
```
28+
29+
Creates a [HadoopMapReduceCommitProtocol](HadoopMapReduceCommitProtocol.md) committer
30+
31+
Used when:
4132

42-
| createJobContext
43-
a| [[createJobContext]]
33+
* `SparkHadoopWriter` is requested to [write data out](SparkHadoopWriter.md#write)
4434

45-
[source, scala]
46-
----
35+
### <span id="createJobContext"> createJobContext
36+
37+
```scala
4738
createJobContext(
4839
jobTrackerId: String,
4940
jobId: Int): JobContext
50-
----
41+
```
5142

52-
| createTaskAttemptContext
53-
a| [[createTaskAttemptContext]]
43+
### <span id="createTaskAttemptContext"> createTaskAttemptContext
5444

55-
[source, scala]
56-
----
45+
```scala
5746
createTaskAttemptContext(
5847
jobTrackerId: String,
5948
jobId: Int,
6049
splitId: Int,
6150
taskAttemptId: Int): TaskAttemptContext
62-
----
51+
```
6352

64-
Creates a Hadoop https://hadoop.apache.org/docs/r2.7.3/api/org/apache/hadoop/mapreduce/TaskAttemptContext.html[TaskAttemptContext]
53+
Creates a Hadoop [TaskAttemptContext]({{ hadoop.api }}/org/apache/hadoop/mapreduce/TaskAttemptContext.html)
6554

66-
| initOutputFormat
67-
a| [[initOutputFormat]]
55+
### <span id="initOutputFormat"> initOutputFormat
6856

69-
[source, scala]
70-
----
57+
```scala
7158
initOutputFormat(
7259
jobContext: JobContext): Unit
73-
----
60+
```
7461

75-
| initWriter
76-
a| [[initWriter]]
62+
### <span id="initWriter"> initWriter
7763

78-
[source, scala]
79-
----
64+
```scala
8065
initWriter(
8166
taskContext: TaskAttemptContext,
8267
splitId: Int): Unit
83-
----
68+
```
8469

85-
| write
86-
a| [[write]]
70+
### <span id="write"> write
8771

88-
[source, scala]
89-
----
72+
```scala
9073
write(
9174
pair: (K, V)): Unit
92-
----
75+
```
9376

9477
Writes out the key-value pair
9578

96-
Used when `SparkHadoopWriter` is requested to <<spark-internal-io-SparkHadoopWriter.md#executeTask, executeTask>> (while <<spark-internal-io-SparkHadoopWriter.md#write, writing out key-value pairs of a partition>>)
97-
98-
|===
99-
100-
[[implementations]]
101-
.HadoopWriteConfigUtils
102-
[cols="30,70",options="header",width="100%"]
103-
|===
104-
| HadoopWriteConfigUtil
105-
| Description
79+
Used when:
10680

107-
| <<spark-internal-io-HadoopMapReduceWriteConfigUtil.md#, HadoopMapReduceWriteConfigUtil>>
108-
| [[HadoopMapReduceWriteConfigUtil]]
81+
* `SparkHadoopWriter` is requested to [executeTask](SparkHadoopWriter.md#executeTask)
10982

110-
| <<spark-internal-io-HadoopMapRedWriteConfigUtil.md#, HadoopMapRedWriteConfigUtil>>
111-
| [[HadoopMapRedWriteConfigUtil]]
83+
## Implementations
11284

113-
|===
85+
* [HadoopMapReduceWriteConfigUtil](HadoopMapReduceWriteConfigUtil.md)
86+
* [HadoopMapRedWriteConfigUtil](HadoopMapRedWriteConfigUtil.md)

docs/SparkHadoopWriter.md

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,14 @@ write[K, V: ClassTag](
88
config: HadoopWriteConfigUtil[K, V]): Unit
99
```
1010

11-
!!! FIXME
12-
Review Me
11+
`write` [runs a Spark job](SparkContext.md#runJob) to [write out partition records](#executeTask) (for all partitions of the given key-value `RDD`) with the given [HadoopWriteConfigUtil](HadoopWriteConfigUtil.md) and a [HadoopMapReduceCommitProtocol](HadoopMapReduceCommitProtocol.md) committer.
12+
13+
The number of writer tasks (_parallelism_) is the number of the partitions in the given key-value `RDD`.
14+
15+
### <span id="write-internals"> Internals
1316

1417
<span id="write-commitJobId">
15-
`write` uses the id of the given RDD as the `commitJobId`.
18+
Internally, `write` uses the id of the given RDD as the `commitJobId`.
1619

1720
<span id="write-jobTrackerId">
1821
`write` creates a `jobTrackerId` with the current date.

mkdocs.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -362,10 +362,10 @@ nav:
362362
- Workers: workers.md
363363
- Internal IO:
364364
- SparkHadoopWriter: SparkHadoopWriter.md
365+
- HadoopWriteConfigUtil: HadoopWriteConfigUtil.md
365366
- FileCommitProtocol: FileCommitProtocol.md
366367
- HadoopMapReduceCommitProtocol: HadoopMapReduceCommitProtocol.md
367368
- HadoopMapRedCommitProtocol: HadoopMapRedCommitProtocol.md
368-
- HadoopWriteConfigUtil: HadoopWriteConfigUtil.md
369369
- HadoopMapReduceWriteConfigUtil: HadoopMapReduceWriteConfigUtil.md
370370
- HadoopMapRedWriteConfigUtil: HadoopMapRedWriteConfigUtil.md
371371
- Stage-Level Scheduling:

0 commit comments

Comments
 (0)