You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is also recommended to use HPU optimized versions of transformers:
12
+
13
+
```python
14
+
from optimum.habana.transformers.modeling_utils import adapt_transformers_to_gaudi
15
+
adapt_transformers_to_gaudi()
16
+
```
17
+
18
+
## Bucketing
19
+
Multipack sampler implementation produces wide range of batches with different sample lengths and number of samples. Each of these combinations leads to graph recompilation and this recompilation takes time and slows down training. To reduce number of recompilations HPU implementation uses bucketing approach, when maximum sample length in batch is aligned to some predefined value. It is similar to padding but all samples in the batch are padded not to the longest sample but to the some slightly bigger value.
20
+
21
+

22
+
23
+
24
+
To compute bucked size, we use next algorithm:
25
+
- Firstly, we find MSB of the longest sample in the batch, let's call it S.
26
+
- Then we slice the range [2 ** S, 2 ** (S+1)] into 16 buckets of the same size.
27
+
- Then we use top boundary of the smallest suitable bucked as padding value.
28
+
29
+
This approach limits overhead of the bucketing to 1/16 th of the longest sample and allows us to significantly reduce number of recompilations.
30
+
31
+
## How to run
32
+
To run training make next changes to config file:
33
+
```json
34
+
train:
35
+
device: hpu
36
+
distributed_backend: fsdp
37
+
fsdp_cpu_offload_optimizer: false
38
+
is_padding_free: true
39
+
pipeline: accelerated
40
+
disable_flash_attn: true
41
+
```
42
+
43
+
And use this command line:
44
+
```bash
45
+
ilab --config=./config.yaml model train --pipeline accelerated --data-path ./data.jsonl
0 commit comments