[CP][RFC] Enable FlexCP for llama3 with parallelize_module #1707

fegin · 2025-09-12T18:27:03Z

Similar to #1696, but this PR uses parallel_module similar to TP/SP.

This PR also requires pytorch/pytorch#162542

Similar to #1696, but this PR uses parallel_module similar to TP/SP. This PR also requires pytorch/pytorch#162542

tianyu-l · 2025-09-12T22:07:08Z

torchtitan/models/llama3/infra/parallelize.py

+                device_mesh=world_mesh["cp"],
+                parallelize_plan=_ContextParallel(
+                    seq_dim=2,
+                    attention_type=_ContextParallel.AttentionType.FLEX,


Does this only work for FlexAttention?
Is there a plan to consolidate SDPA and FlexAttention in terms of how CP is applied?

This will work for both SDPA and Flex. We just need to pass in a different type based on what attention is used.

[CP][RFC] Enable FlexCP for llama3 with parallelize_module

70e5920

Similar to #1696, but this PR uses parallel_module similar to TP/SP. This PR also requires pytorch/pytorch#162542

fegin requested review from tianyu-l, wwwjn and wconstab as code owners September 12, 2025 18:27

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 12, 2025

tianyu-l reviewed Sep 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CP][RFC] Enable FlexCP for llama3 with parallelize_module #1707

[CP][RFC] Enable FlexCP for llama3 with parallelize_module #1707

Uh oh!

fegin commented Sep 12, 2025

Uh oh!

tianyu-l Sep 12, 2025

Uh oh!

fegin Sep 12, 2025

Uh oh!

Uh oh!

[CP][RFC] Enable FlexCP for llama3 with parallelize_module #1707

Are you sure you want to change the base?

[CP][RFC] Enable FlexCP for llama3 with parallelize_module #1707

Uh oh!

Conversation

fegin commented Sep 12, 2025

Uh oh!

tianyu-l Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

fegin Sep 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!