Skip to content

Commit c795a1e

Browse files
achoumcopybara-github
authored andcommitted
Internal change
PiperOrigin-RevId: 406833837
1 parent 3b85dbe commit c795a1e

File tree

22 files changed

+267
-130
lines changed

22 files changed

+267
-130
lines changed

CHANGELOG.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Changelog
22

3-
## 0.2.0 - ????
3+
## 0.2.0 - 2021-10-29
44

55
### Features
66

@@ -11,8 +11,10 @@
1111
- Add support for permutation variable importance in the GBT learner with the
1212
`compute_permutation_variable_importance` parameter.
1313
- Support for tf.int8 and tf.int16 values.
14-
- Support for distributed gradient boosted trees learning using the
15-
ParameterServerStrategy distribution strategy.
14+
- Support for distributed gradient boosted trees learning. Currently, the TF
15+
ParameterServerStrategy distribution strategy is only available in
16+
monolithic TF-DF builds. The Yggdrasil Decision Forest GRPC distribute
17+
strategy can be used instead.
1618
- Support for training from dataset stored on disk in CSV and RecordIO format
1719
(instead of creating a tensorflow dataset). This option is currently more
1820
efficient for distributed training (until the ParameterServerStrategy

WORKSPACE

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,11 @@ load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
88
# absl used by tensorflow.
99
http_archive(
1010
name = "org_tensorflow",
11+
12+
# sha256 = "4896b49c4088030f62b98264441475c09569ea6e49cfb270e2e1f3ef0f743a2f",
13+
# strip_prefix = "tensorflow-2.7.0-rc1",
14+
# urls = ["https://github.com/tensorflow/tensorflow/archive/refs/tags/v2.7.0-rc1.zip"],
15+
1116
sha256 = "40d3203ab5f246d83bae328288a24209a2b85794f1b3e2cd0329458d8e7c1985",
1217
strip_prefix = "tensorflow-2.6.0",
1318
urls = ["https://github.com/tensorflow/tensorflow/archive/refs/tags/v2.6.0.zip"],
@@ -58,6 +63,7 @@ ydf_load_deps(
5863
"absl",
5964
"protobuf",
6065
"zlib",
66+
"farmhash",
6167
],
6268
repo_name = "@ydf",
6369
)

configure/MANIFEST.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,3 +5,4 @@ recursive-include * *.so
55
recursive-include * *.so.[0-9]
66
recursive-include * *.dylib
77
recursive-include * *.dll
8+
recursive-include * grpc_worker_main

configure/setup.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,15 +20,15 @@
2020
from setuptools.command.install import install
2121
from setuptools.dist import Distribution
2222

23-
_VERSION = "0.1.9"
23+
_VERSION = "0.2.0"
2424

2525
with open("README.md", "r", encoding="utf-8") as fh:
2626
long_description = fh.read()
2727

2828
REQUIRED_PACKAGES = [
2929
"numpy",
3030
"pandas",
31-
"tensorflow~=2.6",
31+
"tensorflow~=2.6", # "tensorflow >= 2.7.0rc0, < 2.8'",
3232
"six",
3333
"absl_py",
3434
"wheel",

documentation/distributed_training.md

Lines changed: 48 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -16,15 +16,19 @@
1616

1717
Distributed training makes it possible to train models quickly on larger
1818
datasets. Distributed training in TF-DF relies on the TensorFlow
19-
ParameterServerV2 distribution strategy. Only some of the TF-DF models support
20-
distributed training.
19+
ParameterServerV2 distribution strategy or the Yggdrasil Decision Forest GRPC
20+
distribute strategy. Only some of the TF-DF models support distributed training.
2121

2222
See the
2323
[distributed training](https://github.com/google/yggdrasil-decision-forests/documentation/user_manual.md?#distributed-training)
2424
section in the Yggdrasil Decision Forests user manual for details about the
25-
available distributed training algorithms. When using distributed training in
26-
TF-DF, Yggdrasil Decision Forests is effectively running the `TF_DIST distribute
27-
implementation`.
25+
available distributed training algorithms. When using distributed training with
26+
TF Parameter Server in TF-DF, Yggdrasil Decision Forests is effectively running
27+
the `TF_DIST` distribute implementation.
28+
29+
**Note:** Currently (Oct. 2021), the shared (i.e. != monolithic) OSS build of
30+
TF-DF does not support TF ParameterServer distribution strategy. Please use the
31+
Yggdrasil DF GRPC distribute strategy instead.
2832

2933
## Dataset
3034

@@ -40,21 +44,20 @@ As of today ( Oct 2021), the following solutions are available for TF-DF:
4044
solution is the fastest and the one that gives the best results as it is
4145
currently the only one that guarantees that each example is read only once.
4246
The downside is that this solution does not support TensorFlow
43-
pre-processing.
47+
pre-processing. The "Yggdrasil DF GRPC distribute strategy" only support
48+
this option for dataset reading.
4449

4550
2. To use **ParameterServerV2 distributed dataset** with dataset file sharding
4651
using TF-DF worker index. This solution is the most natural for TF users.
4752

4853
Currently, using ParameterServerV2 distributed dataset with context or
4954
tf.data.service are not compatible with TF-DF.
5055

51-
Note that in all cases, ParameterServerV2 is used to distribute the computation.
52-
5356
## Examples
5457

5558
Following are some examples of distributed training.
5659

57-
### Distribution with Yggdrasil distributed dataset reading
60+
### Distribution with Yggdrasil distributed dataset reading and TF ParameterServerV2 strategy
5861

5962
```python
6063
import tensorflow_decision_forests as tfdf
@@ -78,7 +81,7 @@ See Yggdrasil Decision Forests
7881
[supported formats](https://github.com/google/yggdrasil-decision-forests/blob/main/documentation/user_manual.md#dataset-path-and-format)
7982
for the possible values of `dataset_format`.
8083

81-
### Distribution with ParameterServerV2 distributed dataset
84+
### Distribution with ParameterServerV2 distributed dataset and TF ParameterServerV2 strategy
8285

8386
```python
8487
import tensorflow_decision_forests as tfdf
@@ -149,3 +152,38 @@ model.fit(
149152
print("Trained model")
150153
model.summary()
151154
```
155+
156+
### Distribution with Yggdrasil distributed dataset reading and Yggdrasil DF GRPC distribute strategy
157+
158+
```python
159+
import tensorflow_decision_forests as tfdf
160+
import tensorflow as tf
161+
162+
deployment_config = tfdf.keras.core.YggdrasilDeploymentConfig()
163+
deployment_config.try_resume_training = True
164+
deployment_config.distribute.implementation_key = "GRPC"
165+
socket_addresses = deployment_config.distribute.Extensions[
166+
tfdf.keras.core.grpc_pb2.grpc].socket_addresses
167+
168+
# Socket addresses of ":grpc_worker_main" running instances.
169+
socket_addresses.addresses.add(ip="127.0.0.1", port=2001)
170+
socket_addresses.addresses.add(ip="127.0.0.2", port=2001)
171+
socket_addresses.addresses.add(ip="127.0.0.3", port=2001)
172+
socket_addresses.addresses.add(ip="127.0.0.4", port=2001)
173+
174+
model = tfdf.keras.DistributedGradientBoostedTreesModel(
175+
advanced_arguments=tfdf.keras.AdvancedArguments(
176+
yggdrasil_deployment_config=deployment_config))
177+
178+
model.fit_on_dataset_path(
179+
train_path="/path/to/dataset@100000",
180+
label_key="label_key",
181+
dataset_format="tfrecord+tfe")
182+
183+
print("Trained model")
184+
model.summary()
185+
```
186+
187+
See Yggdrasil Decision Forests
188+
[supported formats](https://github.com/google/yggdrasil-decision-forests/blob/main/documentation/user_manual.md#dataset-path-and-format)
189+
for the possible values of `dataset_format`.

documentation/known_issues.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ The following table shows the compatibility between
4747

4848
tensorflow_decision_forests | tensorflow
4949
--------------------------- | ----------
50+
0.2.0 | 2.6
5051
0.1.9 | 2.6
5152
0.1.1 - 0.1.8 | 2.5
5253
0.1.0 | 2.4

tensorflow_decision_forests/BUILD

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -31,3 +31,14 @@ config_setting(
3131
name = "stop_training_on_interrupt",
3232
values = {"define": "stop_training_on_interrupt=1"},
3333
)
34+
35+
# If "disable_tf_ps_distribution_strategy" is true, the TF Parameter Server
36+
# distribution strategy is not available for distributed training.
37+
#
38+
# Distribution with TF PS is currently NOT supported for OSS TF-DF with shared
39+
# build (monolithic build works however) and TF<2.7. In this case, the GRPC
40+
# Worker Server can be used instead.
41+
config_setting(
42+
name = "disable_tf_ps_distribution_strategy",
43+
values = {"define": "tf_ps_distribution_strategy=0"},
44+
)

tensorflow_decision_forests/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@
4545
4646
"""
4747

48-
__version__ = "0.1.9"
48+
__version__ = "0.2.0"
4949
__author__ = "Mathieu Guillame-Bert"
5050

5151
from tensorflow_decision_forests import keras

tensorflow_decision_forests/keras/BUILD

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ py_library(
7575
"@ydf//yggdrasil_decision_forests/dataset:data_spec_py_proto",
7676
"@ydf//yggdrasil_decision_forests/learner:abstract_learner_py_proto",
7777
"@ydf//yggdrasil_decision_forests/model:abstract_model_py_proto",
78+
"@ydf//yggdrasil_decision_forests/utils/distribute/implementations/grpc:grpc_py_proto",
7879
],
7980
)
8081

@@ -112,13 +113,15 @@ py_test(
112113

113114
# This test relies on the support of TF PS distribution strategy and TF-DF.
114115
# Note: TF PS distribution strategy and TF-DF are currently not compatible in non-monolithic build of TensorFlow+TFDF (e.g. OSS TFDF).
116+
#
117+
# This test is expected to fail TF PS distributed training is disabled (i.e.
118+
# enabling the ":disable_tf_ps_distribution_strategy" rule).
115119
py_test(
116120
name = "keras_distributed_test",
117121
size = "large",
118122
srcs = ["keras_distributed_test.py"],
119123
data = [
120-
":synthetic_dataset",
121-
":test_runner",
124+
":grpc_worker_main",
122125
"@ydf//yggdrasil_decision_forests/test_data",
123126
],
124127
python_version = "PY3",
@@ -132,10 +135,10 @@ py_test(
132135
# absl/testing:parameterized dep,
133136
# numpy dep,
134137
# pandas dep,
135-
"//third_party/py/portpicker",
138+
# portpicker dep,
136139
"@org_tensorflow//tensorflow/python",
137140
"@org_tensorflow//tensorflow/python/distribute:distribute_lib",
138-
"//third_party/tensorflow_decision_forests",
141+
"//tensorflow_decision_forests",
139142
],
140143
)
141144

@@ -164,3 +167,12 @@ tf_cc_binary(
164167
"@ydf//yggdrasil_decision_forests/cli/utils:synthetic_dataset_lib_with_main",
165168
],
166169
)
170+
171+
tf_cc_binary(
172+
name = "grpc_worker_main",
173+
deps = [
174+
"@org_tensorflow//tensorflow/core:framework",
175+
"@org_tensorflow//tensorflow/core:lib",
176+
"@ydf//yggdrasil_decision_forests/utils/distribute/implementations/grpc:grpc_worker_lib_with_main",
177+
],
178+
)

tensorflow_decision_forests/keras/core.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@
6363
from yggdrasil_decision_forests.dataset import data_spec_pb2
6464
from yggdrasil_decision_forests.learner import abstract_learner_pb2
6565
from yggdrasil_decision_forests.model import abstract_model_pb2 # pylint: disable=unused-import
66+
from yggdrasil_decision_forests.utils.distribute.implementations.grpc import grpc_pb2 # pylint: disable=unused-import
6667

6768
layers = tf.keras.layers
6869
models = tf.keras.models

0 commit comments

Comments
 (0)