Qwen 25vl training pipeline #830

yaoyu-33 · 2025-10-01T00:48:42Z

No description provided.

Signed-off-by: yaoyu-33 <[email protected]>

# Conflicts: # src/megatron/bridge/training/config.py

Signed-off-by: yaoyu-33 <[email protected]>

model Signed-off-by: yaoyu-33 <[email protected]>

Signed-off-by: yaoyu-33 <[email protected]>

copy-pr-bot · 2025-10-01T00:48:46Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

ananthsub · 2025-10-01T00:58:59Z

src/megatron/bridge/training/utils/omegaconf_utils.py

+    # Handle Hugging Face GenerationConfig / PretrainedConfig by converting to a callable dict
+    # compatible with our YAML representer logic
+    try:
+        from transformers import GenerationConfig, PretrainedConfig  # type: ignore
+
+        if isinstance(val_to_convert, (GenerationConfig, PretrainedConfig)):
+            cfg_class = val_to_convert.__class__
+            target = f"{inspect.getmodule(cfg_class).__name__}.{cfg_class.__qualname__}.from_dict"
+            logger.debug(f"Converting {cfg_class.__qualname__} at {current_path} to callable dict")
+            return {
+                "_target_": target,
+                "_call_": True,
+                "config_dict": val_to_convert.to_dict(),
+            }
+    except ModuleNotFoundError:
+        # transformers is optional; if unavailable, fall through to other handlers
+        pass


this should be handled here:

Megatron-Bridge/src/megatron/bridge/utils/yaml_utils.py

Lines 62 to 77 in f9aad1f

# Try to add GenerationConfig representer if available

try:

from transformers import GenerationConfig

yaml.SafeDumper.add_representer(GenerationConfig, _generation_config_representer)

except ModuleNotFoundError:

pass

# Try to add PretrainedConfig representer if available (generic for HF configs)

try:

from transformers import PretrainedConfig

# Use multi-representer so subclasses of PretrainedConfig are also handled

yaml.SafeDumper.add_multi_representer(PretrainedConfig, _pretrained_config_representer)

except ModuleNotFoundError:

pass

were you seeing serialization issues outside of this?

it's because we are using transformers' model config in the config, that's causing some issues here.

i dont think the yaml would work, here it's not for saving just for omega conf creation

ananthsub · 2025-10-01T01:02:00Z

src/megatron/bridge/recipes/qwen_vl/qwen25_vl_step.py

@@ -0,0 +1,233 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.


should we offer higher-level functions like pretrain_gpt / finetune_gpt / pretrain_qwen25_vl, finetune_qwen25_vl that automatically select the right step for users?
or how else should we associate the custom forward step func with the recipe? otherwise it might not be obvious to users that an implementation exists

previously this was bound to the model config, but we want to keep the model providers independent of the training side

good point, what do you think let recipe returns cfg and the step function?

returning the config and the step function will be a breaking change for existing users

src/megatron/bridge/recipes/qwen_vl/qwen25_vl_dataset.py

ananthsub · 2025-10-01T01:08:05Z

src/megatron/bridge/recipes/qwen_vl/qwen25_vl_step.py

+from megatron.bridge.training.gpt_step import (
+    _create_loss_function,
+    get_packed_seq_params,
+)


we can spilt these functions out of gpt_step to indicate they are more generic

i feel gpt_step is the general one lol

moved them out to utils

src/megatron/bridge/recipes/qwen_vl/qwen25_vl_step.py

Signed-off-by: yaoyu-33 <[email protected]>

ananthsub · 2025-10-02T20:36:12Z

src/megatron/bridge/recipes/qwen_vl/qwen25_vl_step.py

@@ -0,0 +1,233 @@
+# Copyright (c) 2025, NVIDIA CORPORATION.  All rights reserved.


returning the config and the step function will be a breaking change for existing users

ananthsub · 2025-10-02T20:36:39Z

src/megatron/bridge/recipes/qwen_vl/qwen25_vl_step.py

+
+
+def forward_step(
+    state: GlobalState, data_iterator: Iterable, model: GPTModel, return_schedule_plan: bool = False


is it still a GPTModel?

Signed-off-by: yaoyu-33 <[email protected]>

yaoyu-33 added 11 commits September 18, 2025 19:20

update utils for transformers config in hydra

7858117

Signed-off-by: yaoyu-33 <[email protected]>

temp save

457bace

Signed-off-by: yaoyu-33 <[email protected]>

Merge branch 'refs/heads/main' into qwen-25vl-training

6937da4

Merge branch 'refs/heads/main' into qwen-25vl-training

8a51440

# Conflicts: # src/megatron/bridge/training/config.py

lint

3bc6ba5

Signed-off-by: yaoyu-33 <[email protected]>

revert qwen-vl changes in gpt

8061e0f

Signed-off-by: yaoyu-33 <[email protected]>

revert qwen-vl changes in gpt #2

df4755a

Signed-off-by: yaoyu-33 <[email protected]>

Add mock dataset provider for qwen25 vl

975efd2

Signed-off-by: yaoyu-33 <[email protected]>

add qwen25 vl dataset support from auto

be708c2

model Signed-off-by: yaoyu-33 <[email protected]>

lint

6822d34

Signed-off-by: yaoyu-33 <[email protected]>

update _attn_implementation

bc8c605

Signed-off-by: yaoyu-33 <[email protected]>

ananthsub reviewed Oct 1, 2025

View reviewed changes

yaoyu-33 added 7 commits September 30, 2025 18:34

update comments

689f491

Signed-off-by: yaoyu-33 <[email protected]>

add preloaded dataset provider

4f0e90f

Signed-off-by: yaoyu-33 <[email protected]>

update _processor to a private attr

2af0c2e

Signed-off-by: yaoyu-33 <[email protected]>

update qwen training utils

ccf6abe

Signed-off-by: yaoyu-33 <[email protected]>

training bug fix

94c6192

Signed-off-by: yaoyu-33 <[email protected]>

fix finalize grad

95d3002

Signed-off-by: yaoyu-33 <[email protected]>

save qwen25 vl recipes

4b7ef60

Signed-off-by: yaoyu-33 <[email protected]>

ananthsub reviewed Oct 3, 2025

View reviewed changes

add padding logic for pp

608117e

Signed-off-by: yaoyu-33 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Qwen 25vl training pipeline #830

Qwen 25vl training pipeline #830

Uh oh!

yaoyu-33 commented Oct 1, 2025

Uh oh!

copy-pr-bot bot commented Oct 1, 2025

Uh oh!

ananthsub Oct 1, 2025

Uh oh!

yaoyu-33 Oct 2, 2025

Uh oh!

yaoyu-33 Oct 2, 2025

Uh oh!

ananthsub Oct 1, 2025

Uh oh!

yaoyu-33 Oct 2, 2025

Uh oh!

ananthsub Oct 2, 2025

Uh oh!

Uh oh!

ananthsub Oct 1, 2025

Uh oh!

yaoyu-33 Oct 2, 2025

Uh oh!

yaoyu-33 Oct 2, 2025

Uh oh!

Uh oh!

ananthsub Oct 2, 2025

Uh oh!

ananthsub Oct 2, 2025

Uh oh!

Uh oh!

	# Try to add GenerationConfig representer if available
	try:
	from transformers import GenerationConfig

	yaml.SafeDumper.add_representer(GenerationConfig, _generation_config_representer)
	except ModuleNotFoundError:
	pass

	# Try to add PretrainedConfig representer if available (generic for HF configs)
	try:
	from transformers import PretrainedConfig

	# Use multi-representer so subclasses of PretrainedConfig are also handled
	yaml.SafeDumper.add_multi_representer(PretrainedConfig, _pretrained_config_representer)
	except ModuleNotFoundError:
	pass

		@@ -0,0 +1,233 @@
		# Copyright (c) 2025, NVIDIA CORPORATION. All rights reserved.



		def forward_step(
		state: GlobalState, data_iterator: Iterable, model: GPTModel, return_schedule_plan: bool = False

Qwen 25vl training pipeline #830

Are you sure you want to change the base?

Qwen 25vl training pipeline #830

Uh oh!

Conversation

yaoyu-33 commented Oct 1, 2025

Uh oh!

copy-pr-bot bot commented Oct 1, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!