[Qwen3 MoE] Add initial implementation #1674

SonicSaurav · 2025-09-02T13:03:18Z

Add Qwen3 MoE experiment with model args, architecture, and train spec registration.

meta-cla · 2025-09-02T13:03:24Z

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

- Fix config validation by removing invalid TOML fields - Add proper model registration in experiments/__init__.py - Fix missing model attributes: qk_norm, initializer_range, hidden_dim, pad_token_id, stage_idx, num_stages, head_dim - Correct TransformerBlock forward signature to accept attention_mask and position_ids - Add RoPE cache initialization and proper forwarding to decoder layers - Implement required init_weights methods for model initialization - Fix parallelize function signature and implementation for TorchTitan compatibility - Correct attribute naming (hidden_size → dim, num_hidden_layers → n_layers) - Update model configuration with accurate HuggingFace Qwen3-30B-A3B parameters - Enable activation checkpointing and optimize memory usage settings - Successfully tested: model builds, initializes, and trains without errors The Qwen3 MoE model now fully integrates with TorchTitan framework and trains successfully.

wwwjn

Hi @SonicSaurav Thanks for contributing! From my reading, in this PR the model are dense model and the MoE part is not added yet. Also I would suggest reuse the experiments/qwen model as they share common parts, and don't start a new folder under experiments for the same model

wwwjn · 2025-09-02T17:53:06Z

torchtitan/experiments/qwen3_moe/model/model.py

+        )
+        return hidden_states
+
+class QwenForCausalLM(torch.nn.Module):


This name is adopted from transformers?

wwwjn · 2025-09-02T17:53:38Z

torchtitan/experiments/qwen3_moe/model/model.py

+        r"""
+        Example:
+
+        ```python


Remove those transformers related comments

I will do all this also can you please make it for moe incase i made some mistake as i am in learning phase right now but need this urgently help will be really appreciated

Hey @SonicSaurav I made it in #1685, please take a look, thank you!

SonicSaurav · 2025-09-02T18:33:46Z

Can you please fix and merge i am new just trying to do something useful with qwen models

…

On Tue, 2 Sept, 2025, 11:30 pm Jiani Wang, ***@***.***> wrote: ***@***.**** commented on this pull request. Hi @SonicSaurav <https://github.com/SonicSaurav> Thanks for contributing! From my readining, in this PR the model are dense model and the MoE part is not added yet ------------------------------ In torchtitan/experiments/qwen3_moe/model/model.py <#1674 (comment)>: > + + # decoder layers + for decoder_layer in self.layers.values(): + hidden_states = decoder_layer( + hidden_states, + self.rope_cache, + attention_mask=attention_mask, + position_ids=position_ids, + ) + + hidden_states = ( + self.norm(hidden_states) if self.norm is not None else hidden_states + ) + return hidden_states + +class QwenForCausalLM(torch.nn.Module): This name is adopted from transformers? ------------------------------ In torchtitan/experiments/qwen3_moe/model/model.py <#1674 (comment)>: > + """Initialize model weights.""" + if self.model is not None: + self.model.init_weights(buffer_device=buffer_device) + if self.lm_head is not None: + nn.init.normal_(self.lm_head.weight, std=0.02) + + def forward( + self, + tokens: torch.Tensor, + attention_mask: Optional[torch.Tensor] = None, + position_ids: Optional[torch.LongTensor] = None, + ) -> Tuple: + r""" + Example: + + ```python Remove those transformers related comments — Reply to this email directly, view it on GitHub <#1674 (review)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AXZITIK33WPEYHVC4I676V33QXLNPAVCNFSM6AAAAACFNG4IZ2VHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZTCNZXGU4DANBWGE> . You are receiving this because you were mentioned.Message ID: ***@***.***>

tianyu-l

Maybe we should put this under the same folder qwen3 and reuse as much as possible.
cc @wwwjn

soulitzer · 2025-09-04T15:45:52Z

We may want to add a2a in the selective AC policy with this PR similar to in #1672 since now the save lists are model specific.

wwwjn · 2025-09-05T21:54:21Z

Close this PR because of #1685 and avoid confusion, please feel free to re-open it if needed :)

[Qwen3 MoE] Add initial implementation

3706a68

Add Qwen3 MoE experiment with model args, architecture, and train spec registration.

wwwjn reviewed Sep 2, 2025

View reviewed changes

tianyu-l reviewed Sep 2, 2025

View reviewed changes

tianyu-l linked an issue Sep 2, 2025 that may be closed by this pull request

Qwen3 MoE Support #1673

Open

wwwjn closed this Sep 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Qwen3 MoE] Add initial implementation #1674

[Qwen3 MoE] Add initial implementation #1674

SonicSaurav commented Sep 2, 2025

Uh oh!

meta-cla bot commented Sep 2, 2025

Uh oh!

wwwjn left a comment •

edited

Loading

Uh oh!

wwwjn Sep 2, 2025

Uh oh!

wwwjn Sep 2, 2025

Uh oh!

SonicSaurav Sep 2, 2025

Uh oh!

wwwjn Sep 5, 2025

Uh oh!

SonicSaurav commented Sep 2, 2025 via email

Uh oh!

tianyu-l left a comment

Uh oh!

soulitzer commented Sep 4, 2025 •

edited

Loading

Uh oh!

wwwjn commented Sep 5, 2025

Uh oh!

Uh oh!

[Qwen3 MoE] Add initial implementation #1674

[Qwen3 MoE] Add initial implementation #1674

Conversation

SonicSaurav commented Sep 2, 2025

Uh oh!

meta-cla bot commented Sep 2, 2025

Action Required

Process

Uh oh!

wwwjn left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wwwjn Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

SonicSaurav Sep 2, 2025

Choose a reason for hiding this comment

Uh oh!

wwwjn Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

SonicSaurav commented Sep 2, 2025 via email

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

soulitzer commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

wwwjn commented Sep 5, 2025

Uh oh!

Uh oh!

wwwjn left a comment •

edited

Loading

soulitzer commented Sep 4, 2025 •

edited

Loading