Skip to content

Commit 2b5247b

Browse files
committed
Merge pull request #6 from civitaspo/v0.1.0
V0.1.0
2 parents 91fa8f0 + 519576c commit 2b5247b

File tree

10 files changed

+518
-236
lines changed

10 files changed

+518
-236
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,5 @@
77
build/
88
.idea
99
*.iml
10+
.ruby-version
11+

README.md

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ Read files on Hdfs.
1414
- **config** overwrites configuration parameters (hash, default: `{}`)
1515
- **input_path** file path on Hdfs. you can use glob and Date format like `%Y%m%d/%s`.
1616
- **rewind_seconds** When you use Date format in input_path property, the format is executed by using the time which is Now minus this property.
17+
- **partition** when this is true, partition input files and increase task count. (default: `true`)
1718

1819
## Example
1920

@@ -24,12 +25,13 @@ in:
2425
- /opt/analytics/etc/hadoop/conf/core-site.xml
2526
- /opt/analytics/etc/hadoop/conf/hdfs-site.xml
2627
config:
27-
fs.defaultFS: 'hdfs://hdp-nn1:8020'
28+
fs.defaultFS: 'hdfs://hadoop-nn1:8020'
2829
dfs.replication: 1
2930
fs.hdfs.impl: 'org.apache.hadoop.hdfs.DistributedFileSystem'
3031
fs.file.impl: 'org.apache.hadoop.fs.LocalFileSystem'
3132
input_path: /user/embulk/test/%Y-%m-%d/*
3233
rewind_seconds: 86400
34+
partition: true
3335
decoders:
3436
- {type: gzip}
3537
parser:
@@ -50,6 +52,15 @@ in:
5052
- {name: c3, type: long}
5153
```
5254
55+
## Note
56+
- the feature of the partition supports only 3 line terminators.
57+
- `\n`
58+
- `\r`
59+
- `\r\n`
60+
61+
## The Reference Implementation
62+
- [hito4t/embulk-input-filesplit](https://github.com/hito4t/embulk-input-filesplit)
63+
5364
## Build
5465

5566
```

build.gradle

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ configurations {
1212
provided
1313
}
1414

15-
version = "0.0.3"
15+
version = "0.1.0"
1616

1717
sourceCompatibility = 1.7
1818
targetCompatibility = 1.7
@@ -22,7 +22,7 @@ dependencies {
2222
provided "org.embulk:embulk-core:0.7.0"
2323
// compile "YOUR_JAR_DEPENDENCY_GROUP:YOUR_JAR_DEPENDENCY_MODULE:YOUR_JAR_DEPENDENCY_VERSION"
2424
compile 'org.apache.hadoop:hadoop-client:2.6.0'
25-
compile 'com.google.guava:guava:14.0'
25+
compile 'com.google.guava:guava:15.0'
2626
testCompile "junit:junit:4.+"
2727
}
2828

lib/embulk/input/hdfs.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
Embulk::JavaPlugin.register_input(
2-
"hdfs", "org.embulk.input.HdfsFileInputPlugin",
2+
"hdfs", "org.embulk.input.hdfs.HdfsFileInputPlugin",
33
File.expand_path('../../../../classpath', __FILE__))

src/main/java/org/embulk/input/HdfsFileInputPlugin.java

Lines changed: 0 additions & 231 deletions
This file was deleted.

0 commit comments

Comments
 (0)