Skip to content

Commit f8eb594

Browse files
committed
Add comments and description in the MegatronConfig(TypedDict)
Signed-off-by: Kate Cheng <[email protected]>
1 parent 0063601 commit f8eb594

File tree

4 files changed

+7
-3
lines changed

4 files changed

+7
-3
lines changed

examples/configs/grpo_math_1B.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ policy:
8585
moe_permute_fusion: false
8686
#gives ~20% training perf speedup with sequence packing
8787
apply_rope_fusion: True
88-
# gives training perf speedup
88+
# gives ~25% training perf speedup with sequence packing and apply_rope_fusion
8989
bias_activation_fusion: True
9090
defer_fp32_logits: null
9191

examples/configs/sft.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -90,7 +90,7 @@ policy:
9090
moe_permute_fusion: false
9191
#gives ~20% training perf speedup with sequence packing
9292
apply_rope_fusion: True
93-
# gives training perf speedup
93+
# gives ~25% training perf speedup with sequence packing and apply_rope_fusion
9494
bias_activation_fusion: True
9595

9696
optimizer:

examples/configs/sft_openmathinstruct2_megatron.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -85,7 +85,7 @@ policy:
8585
moe_permute_fusion: false
8686
#gives ~20% training perf speedup with sequence packing
8787
apply_rope_fusion: True
88-
# gives training perf speedup
88+
# gives ~25% training perf speedup with sequence packing and apply_rope_fusion
8989
bias_activation_fusion: True
9090

9191
env_vars:

nemo_rl/models/policy/__init__.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,10 @@ class MegatronConfig(TypedDict):
9999
expert_tensor_parallel_size: int
100100
expert_model_parallel_size: int
101101
defer_fp32_logits: NotRequired[bool]
102+
# gives ~20% training perf speedup with sequence packing
103+
apply_rope_fusion: bool
104+
# gives ~25% training perf speedup with sequence packing and apply_rope_fusion
105+
bias_activation_fusion: bool
102106

103107
optimizer: NotRequired[MegatronOptimizerConfig]
104108
scheduler: NotRequired[MegatronSchedulerConfig]

0 commit comments

Comments
 (0)