Use new DeviceMesh unflatten to rewrite parallel_dims #1660

fegin · 2025-08-29T05:51:59Z

Summary
This PR utilizes the latest APIs provided by DeviceMesh to simplify the creation of all different meshes.

The design philosophy is as follow:

Create one world mesh with the shape as [world_size,]
Create all 1-D submeshes by using 1) unflattening from the world mesh, or 2) slicing and flatten from other derived meshes.
ParallelDims now provides an API, get_mesh(), which accepts str or list[str]. When the argument is str, the API directly return the corresponding 1-D submesh. If the argument is list[str], the dim names will be used to concatenate to form a n-D device mesh.

torchtitan/distributed/parallel_dims.py

torchtitan/models/llama3/infra/parallelize.py

torchtitan/distributed/parallel_dims.py

tianyu-l · 2025-10-17T05:26:38Z

torchtitan/distributed/parallel_dims.py


-        return mesh
+        if self._meshes[dim].size() == 1:
+            return None


Not sure if this will break user expectation. We got asks that DTensor redistribute running on a mesh of size 1 should perform no op.

But even for current TorchTitan, we won't create any DeviceMesh if the parallelism degree is 1. So it is unclear to me how DeviceMesh with size 1 exists?

not in torchtitan, in internal

PyTorch? Then it is okay, right? DeviceMesh still supports the case but TorchTitan makes a stronger assumption in our use case.

torchtitan/distributed/utils.py

wconstab · 2025-10-28T21:38:42Z

torchtitan/distributed/parallel_dims.py

+        fsdp = self.dp_shard * self.cp
+        efsdp = fsdp * self.tp // (self.etp * self.ep)
+
+        self._world_mesh = init_device_mesh(


does this initialize a world PG?

it may be fine to just ignore this for now in torchtitan, but, i am wondering if users want control over world group creation what would that look like?

cc., @fduwjj are we able to disable the global PG initialization?

I think so right now we don't use split, so we can make it a fake pg. But if split is needed then we need to materialize the world PG anyway.

torchtitan/distributed/parallel_dims.py

torchtitan/distributed/utils.py

tianyu-l

We should modify FLUX train.py as it's in core now.

@ruisizhang123 let's adapt SimpleFSDP after this PR is merged.
oh it seems being fixed in #1959

tianyu-l · 2025-10-29T00:15:53Z

torchtitan/distributed/parallel_dims.py


-        return mesh
+        if self._meshes[dim].size() == 1:
+            return None


not in torchtitan, in internal

torchtitan/distributed/parallel_dims.py

torchtitan/distributed/utils.py

torchtitan/train.py

torchtitan/models/llama4/infra/parallelize.py

torchtitan/distributed/utils.py

wwwjn · 2025-10-30T19:03:54Z

torchtitan/distributed/parallel_dims.py

+        self._world_mesh = init_device_mesh(
+            device_type, (self.world_size,), mesh_dim_names=("world",)
+        )
+        dataloading_mesh = unflatten_mesh(


Curious what will happen if self.pp * batch * self.cp * self.tp != world_size? Will the _unflatten() fail?

Yes, it will fail

ghstack-source-id: 065ffd4 Pull-Request: #1890

ghstack-source-id: 08dd4a6 Pull-Request: #1891

ghstack-source-id: dcf962b Pull-Request: #1892

ghstack-source-id: c9fdc96 Pull-Request: #1893

tianyu-l · 2025-11-07T22:56:19Z

torchtitan/distributed/expert_parallel.py

        #       reductions are performed during backward.
        routed_input = DTensor.from_local(
-            routed_input, device_mesh["tp"], (Replicate(),)
+            routed_input, device_mesh["etp"], (Replicate(),)


This should be "tp".

tianyu-l · 2025-11-07T22:59:31Z

torchtitan/distributed/utils.py

    # IF PP is also used, this seed is unique per PP rank.
-    if duplicate_seed_mesh and duplicate_seed_mesh.get_coordinate() is not None:
-        torch.distributed.tensor._random.manual_seed(seed, duplicate_seed_mesh)
+    # TODO: remove the need of duplicate_seed_meshes once torch.distributed.tensor._random.manual_seed


Is this on your list? @wconstab

i'm not quite sure what 'duplicate_seed_meshes' is supposed to mean. anyway, the plan is to change manual_seed from taking a mesh to taking a device. the device (id) is the only thing the DTensor OffsetBasedRNGTracker needs from the mesh, it doesn't use the mesh for other purposes. It would look worse but it would be functionally fine to just pass 'world_mesh' to manual_seed for all cases, since, all meshes on one process share the same device. As for making the change, i'm happy to do it, but @fegin said he was going to do it so I did not.

thinking back on this, probably what happened is that I wrote that duplicate_seed_mesh code because I didn't understand manual_seed api at the time, and i assumed that it was important to pass the correct mesh into each api call. I think we just need to keep track of the distinct_mesh_dims part and can simplify this code now.

tianyu-l · 2025-11-07T23:04:33Z

torchtitan/experiments/simple_fsdp/deepseek_v3/parallelize.py

                assert hasattr(transformer_block, "moe")
                if (
-                    dp_mod_ep_mesh.size() * parallel_dims.ep
+                    edp_mesh.size() * parallel_dims.ep


The logic seems wrong before this PR.
Here it should be efsdp_mesh because we only do sharding on efsdp not dp_replicate.

similar for the FSDP2 application in llama4

tianyu-l · 2025-11-07T23:05:01Z

torchtitan/experiments/simple_fsdp/deepseek_v3/parallelize.py

+            dp_mesh_dim_names = ["dp_replicate", "efsdp"]
+        else:
+            dp_mesh_dim_names = ["efsdp"]
+        edp_mesh = parallel_dims.get_mesh(dp_mesh_dim_names)


edp sounds the right name here

tianyu-l · 2025-11-07T23:13:43Z

torchtitan/distributed/parallel_dims.py

+            "dp_replicate_fsdp": hsdp_mesh,
+            "dp_replicate_efsdp": ehsdp_mesh,
+            "ep_etp": ep_etp_mesh,


It seems only "ep_etp" is used explicitly, the other two are not. I think we should be consistent -- e.g. we can disallow 2D slicing using get_mesh. Every time we use "dp_replicate_fsdp", we ask user to send in "hsdp", instead of ["dp_replicate", "fsdp"]. This aligns with the requirement of predefining everything in parallel_dims.py.

My guess of the motivation of erroring out if it's not pre-created n-D sliced mesh is because

we didn't keep references to all the global meshes (e.g. dense_mesh, sparse_mesh) which we are going to slice submeshes from, so without concatenate we don't know where to slice from. I think this is workaroundable by keeping references to all global meshes.

I had the comment that concatenate is too powerful.

I think a concern of the current approach is that user may not be able to extend to other fancy nD submesh use cases without modifying parallel_dims.py, which they probably don't need to in most cases. If they are developing outside torchtitan, they may use concatenate anyway.

In short, I think

if we are going with the pre-defining everything approach, we can ban get_mesh with list inputs.

if we are going with the flexible approach, we need to keep references to global meshes to look up from.

Either is fine to me.

tianyu-l · 2025-11-07T23:20:19Z

torchtitan/models/llama4/infra/parallelize.py


        # the mesh dim names of which the MoE params are sharded on via FSDP/HSDP
-        dp_mod_ep_mesh_dim_names = []
+        dp_mod_ep_mesh = None


we should change the name of this parameter, to edp?

also the code here seems wrong https://github.com/pytorch/torchtitan/pull/1660/files#diff-2656e3a28f6b9141967a7e6ce9552879b330db03043333302081e9b8800a6a75R329

as it didn't consider dp_replicate in edp.

tianyu-l · 2025-11-07T23:20:40Z

torchtitan/models/llama4/infra/parallelize.py

-                if parallel_dims.ep_enabled
-                else None
-            ),
+            dp_mod_ep_mesh=dp_mod_ep_mesh,


also here, and all other occurrences.

tianyu-l · 2025-11-07T23:26:11Z

torchtitan/distributed/parallel_dims.py


-        return mesh
+        if self._meshes[dim].size() == 1:
+            return None


meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Aug 29, 2025

wconstab reviewed Aug 29, 2025

View reviewed changes

torchtitan/distributed/parallel_dims.py Outdated Show resolved Hide resolved

wconstab reviewed Aug 29, 2025

View reviewed changes

torchtitan/distributed/parallel_dims.py Outdated Show resolved Hide resolved

tianyu-l reviewed Aug 29, 2025

View reviewed changes

torchtitan/distributed/parallel_dims.py Outdated Show resolved Hide resolved

ezyang reviewed Aug 30, 2025

View reviewed changes

torchtitan/distributed/parallel_dims.py Outdated Show resolved Hide resolved

ezyang reviewed Aug 30, 2025

View reviewed changes

torchtitan/distributed/parallel_dims.py Outdated Show resolved Hide resolved

fegin force-pushed the chienchin/new_device_mesh branch 7 times, most recently from 12eca61 to 19e4a23 Compare October 15, 2025 20:39

tianyu-l mentioned this pull request Oct 17, 2025

[TorchComms] Support training with EP #1902

Merged

tianyu-l reviewed Oct 17, 2025

View reviewed changes

fegin force-pushed the chienchin/new_device_mesh branch from 19e4a23 to 178bc11 Compare October 28, 2025 20:34

fegin marked this pull request as ready for review October 28, 2025 21:01

fegin requested a review from wwwjn as a code owner October 28, 2025 21:01

wconstab reviewed Oct 28, 2025

View reviewed changes

torchtitan/distributed/parallel_dims.py Outdated Show resolved Hide resolved

wconstab reviewed Oct 28, 2025

View reviewed changes

torchtitan/distributed/utils.py Show resolved Hide resolved

tianyu-l reviewed Oct 29, 2025

View reviewed changes

wwwjn reviewed Oct 30, 2025

View reviewed changes

tianyu-l mentioned this pull request Nov 1, 2025

Why is the ep mesh derived from a factoring of the dp mesh, instead of its own dimension? #1977

Open

fegin force-pushed the chienchin/new_device_mesh branch from 20910ef to a67e87a Compare November 3, 2025 23:19

fegin requested review from allenwang28, ebsmothers, joecummings and pbontrager as code owners November 4, 2025 00:05

fegin added 28 commits November 6, 2025 23:20

lint

9e1d68e

ghstack-source-id: 065ffd4 Pull-Request: #1890

misc

d9c9d72

ghstack-source-id: 08dd4a6 Pull-Request: #1891

misc

381a917

ghstack-source-id: dcf962b Pull-Request: #1892

Another round

76db68f

ghstack-source-id: c9fdc96 Pull-Request: #1893

misc

4ed68f1

misc

cf99ac7

misc

794b927

misc

2ed9308

misc

daaf2bd

misc

316bd9d

misc

e48ebb6

fix

0e7901c

misc

ffddb54

misc

3dd14a3

misc

436cbf3

fix

1e1ff2c

fix

dec8de4

fix

eab618c

misc

dd258a8

fix

e535c72

fix

a4d7a9d

fix

7677798

misc

3c0d5b1

misc

4034abc

fix

4d4b2f3

misc

0ac427f

misc

8db2f16

misc

d6eae58

fegin force-pushed the chienchin/new_device_mesh branch from 99c46dc to d6eae58 Compare November 7, 2025 07:20

tianyu-l reviewed Nov 7, 2025

View reviewed changes

Use new DeviceMesh unflatten to rewrite parallel_dims #1660

Are you sure you want to change the base?

Use new DeviceMesh unflatten to rewrite parallel_dims #1660

Uh oh!

Conversation

fegin commented Aug 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

tianyu-l left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fegin commented Aug 29, 2025 •

edited

Loading

tianyu-l left a comment •

edited

Loading