Skip to content

Conversation

fegin
Copy link
Contributor

@fegin fegin commented Sep 12, 2025

Similar to #1696, but this PR uses parallel_module similar to TP/SP.

This PR also requires pytorch/pytorch#162542

Similar to #1696, but this PR uses parallel_module similar to TP/SP.

This PR also requires pytorch/pytorch#162542
@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Sep 12, 2025
device_mesh=world_mesh["cp"],
parallelize_plan=_ContextParallel(
seq_dim=2,
attention_type=_ContextParallel.AttentionType.FLEX,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this only work for FlexAttention?
Is there a plan to consolidate SDPA and FlexAttention in terms of how CP is applied?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will work for both SDPA and Flex. We just need to pass in a different type based on what attention is used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants