-
Notifications
You must be signed in to change notification settings - Fork 52
Add a simple weight sync sandbox #531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Do we want this to run multiple iterations, or indefinitely? |
For this sandbox, just a single run so we can time how long a single weight sync step takes |
|
I don't have objection to adding a new sandbox test -- just that I've been using this one https://fburl.com/code/8400zng6 |
| logging_mode: global_reduce | ||
|
|
||
| policy: | ||
| prefetch_weights_to_shm: false # Disable to avoid shared memory warnings in test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what warnings are you seeing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It spams resource_tracking stuff saying that the shared memory files don't exist anymore. Claude couldn't figure it out so I just disabled it lol
| # Weight Sync Sandbox Configuration | ||
| # >>> python -m tests.sandbox.weight_sync.main --config tests/sandbox/weight_sync/qwen3_1_7b.yaml | ||
|
|
||
| model: "Qwen/Qwen3-1.7B" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we could use a larger model like 8b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we can add more model configs as needed
|
lgtm but doesn't the integration test do exactly this (+ verification)? Why do we need a separate one? |
I envisioned this just as a temporary sandbox prioritizing hacking and fast iteration times. I find that developing against pytest can add overhead in logging etc. and so I'm ok with these two being separate things. The pytest is very helpful for e.g., in CI making sure that this passes consistently. Does this separation make sense? |
Adding this as a development platform for weight sync optimizations.