Skip to content

Conversation

@yiming0416
Copy link
Contributor

@yiming0416 yiming0416 commented Nov 5, 2025

Currently we hardcode "tp" dimension when parallelizing inputs (converting from plain tensor to DTensor). This won't work when there is no TP is our parallelisms (e.g., SimpleFSDP only or SimpleFSDP + EP)

The fundamental reason is that currently SimpleFSDP only accepts plain tensor inputs. Making it accept DTensor inputs would require eager rewrite of SimpleFSDP's frontend.

So as a workaround, we only DTensorize the inputs when there is a "tp" dimension in the world_mesh.

SimpleFSDP Only on llama3

NGPU=8 CONFIG_FILE=./torchtitan/models/llama3/train_configs/debug_model.toml ./run_train.sh --model.name compiler_toolkit.llama3 --parallelism.data_parallel_shard_degree=8

SimpleFSDP + EP on dsv3

NGPU=4 CONFIG_FILE=./torchtitan/models/deepseek_v3/train_configs/debug_model.toml ./run_train.sh --model.name compiler_toolkit.deepseek_v3 --parallelism.data_parallel_shard_degree=4 --parallelism.expert_parallel_degree=2 --activation_checkpoint.mode none

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 5, 2025
@yiming0416 yiming0416 marked this pull request as ready for review November 5, 2025 22:41
@yiming0416 yiming0416 force-pushed the yiming/compiler_toolkit_without_tp branch from 6e062f2 to 2380ed3 Compare November 6, 2025 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants