Skip to content

Commit 3114f2f

Browse files
committed
[llama3-8B] add flex_attention model flavor
ghstack-source-id: 2e7013d Pull Request resolved: #1884
1 parent 3d3726c commit 3114f2f

File tree

1 file changed

+11
-0
lines changed

1 file changed

+11
-0
lines changed

torchtitan/models/llama3/__init__.py

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -48,6 +48,17 @@
4848
multiple_of=1024,
4949
rope_theta=500000,
5050
),
51+
"8B_flex_attn": TransformerModelArgs(
52+
dim=4096,
53+
n_layers=32,
54+
n_heads=32,
55+
n_kv_heads=8,
56+
ffn_dim_multiplier=1.3,
57+
multiple_of=1024,
58+
rope_theta=500000,
59+
use_flex_attn=True,
60+
attn_mask_type="block_causal",
61+
),
5162
"70B": TransformerModelArgs(
5263
dim=8192,
5364
n_layers=80,

0 commit comments

Comments
 (0)