Skip to content

Conversation

@allenwang28
Copy link
Contributor

Adding this as a development platform for weight sync optimizations.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 6, 2025
@casteryh
Copy link
Contributor

casteryh commented Nov 6, 2025

Do we want this to run multiple iterations, or indefinitely?

@allenwang28
Copy link
Contributor Author

Do we want this to run multiple iterations, or indefinitely?

For this sandbox, just a single run so we can time how long a single weight sync step takes

@JenniferWang
Copy link
Contributor

I don't have objection to adding a new sandbox test -- just that I've been using this one https://fburl.com/code/8400zng6
So is it reasonable to consolidate the two apps?

logging_mode: global_reduce

policy:
prefetch_weights_to_shm: false # Disable to avoid shared memory warnings in test
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what warnings are you seeing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It spams resource_tracking stuff saying that the shared memory files don't exist anymore. Claude couldn't figure it out so I just disabled it lol

# Weight Sync Sandbox Configuration
# >>> python -m tests.sandbox.weight_sync.main --config tests/sandbox/weight_sync/qwen3_1_7b.yaml

model: "Qwen/Qwen3-1.7B"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we could use a larger model like 8b

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can add more model configs as needed

@casteryh
Copy link
Contributor

casteryh commented Nov 6, 2025

lgtm but doesn't the integration test do exactly this (+ verification)? Why do we need a separate one?

@allenwang28
Copy link
Contributor Author

So is it reasonable to consolidate the two apps?
doesn't the integration test do exactly this (+ verification)? Why do we need a separate one?

I envisioned this just as a temporary sandbox prioritizing hacking and fast iteration times. I find that developing against pytest can add overhead in logging etc. and so I'm ok with these two being separate things. The pytest is very helpful for e.g., in CI making sure that this passes consistently.

Does this separation make sense?

@allenwang28 allenwang28 merged commit f55bac8 into meta-pytorch:main Nov 6, 2025
10 checks passed
@allenwang28 allenwang28 deleted the weight_sync branch November 6, 2025 19:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants