Skip to content

Commit 7e8c7c2

Browse files
authored
Source features support for V2.0 (#2090)
1 parent 4cd9978 commit 7e8c7c2

28 files changed

+549
-76
lines changed

.github/workflows/push.yml

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,16 @@ jobs:
4242
-src_vocab /tmp/onmt.vocab.src \
4343
-tgt_vocab /tmp/onmt.vocab.tgt \
4444
&& rm -rf /tmp/sample
45+
- name: Test vocabulary build with features
46+
run: |
47+
python onmt/bin/build_vocab.py \
48+
-config data/features_data.yaml \
49+
-save_data /tmp/onmt_feat \
50+
-src_vocab /tmp/onmt_feat.vocab.src \
51+
-tgt_vocab /tmp/onmt_feat.vocab.tgt \
52+
-src_feats_vocab '{"feat0": "/tmp/onmt_feat.vocab.feat0"}' \
53+
-n_sample -1 \
54+
&& rm -rf /tmp/sample
4555
- name: Test field/transform dump
4656
run: |
4757
# The dumped fields are used later when testing tools
@@ -169,6 +179,26 @@ jobs:
169179
-state_dim 256 \
170180
-n_steps 10 \
171181
-n_node 64
182+
- name: Testing training with features
183+
run: |
184+
python onmt/bin/train.py \
185+
-config data/features_data.yaml \
186+
-src_vocab /tmp/onmt_feat.vocab.src \
187+
-tgt_vocab /tmp/onmt_feat.vocab.tgt \
188+
-src_feats_vocab '{"feat0": "/tmp/onmt_feat.vocab.feat0"}' \
189+
-src_vocab_size 1000 -tgt_vocab_size 1000 \
190+
-rnn_size 2 -batch_size 10 \
191+
-word_vec_size 5 -rnn_size 10 \
192+
-report_every 5 -train_steps 10 \
193+
-save_model /tmp/onmt.model \
194+
-save_checkpoint_steps 10
195+
- name: Testing translation with features
196+
run: |
197+
python translate.py \
198+
-model /tmp/onmt.model_step_10.pt \
199+
-src data/data_features/src-test.txt \
200+
-src_feats "{'feat0': 'data/data_features/src-test.feat0'}" \
201+
-verbose
172202
- name: Test RNN translation
173203
run: |
174204
head data/src-test.txt > /tmp/src-test.txt

data/data_features/src-test.feat0

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
C B A B

data/data_features/src-test.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
she is a hard-working.

data/data_features/src-train.feat0

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
A A A A B A A A C
2+
A B C D E
3+
C B A B

data/data_features/src-train.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
however, according to the logs, she is a hard-working.
2+
however, according to the logs,
3+
she is a hard-working.

data/data_features/src-val.feat0

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
C B A B

data/data_features/src-val.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
she is a hard-working.

data/data_features/tgt-train.txt

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
however, according to the logs, she is a hard-working.
2+
however, according to the logs,
3+
she is a hard-working.

data/data_features/tgt-val.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
she is a hard-working.

data/features_data.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# Corpus opts:
2+
data:
3+
corpus_1:
4+
path_src: data/data_features/src-train.txt
5+
path_tgt: data/data_features/tgt-train.txt
6+
src_feats:
7+
feat0: data/data_features/src-train.feat0
8+
transforms: [filterfeats, inferfeats]
9+
valid:
10+
path_src: data/data_features/src-val.txt
11+
path_tgt: data/data_features/tgt-val.txt

0 commit comments

Comments
 (0)