-
Notifications
You must be signed in to change notification settings - Fork 1.2k
optimize infer_auto_device_map for multi-GPU allocation #3321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
optimize infer_auto_device_map for multi-GPU allocation #3321
Conversation
…tion - This feature can be enabled by setting `reserve_max_layer` to `False`. By default, the parameter is set to `True`, preserving the original behavior. - When multiple GPUs are present, all modules can be allocated across them. However, reserving space for the largest layer size may cause unnecessary offloading.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Thanks again for your contribution ! let me know when is the ready to be reviewed ! |
Hey @SunMarc , Currently, my approach adds a parameter to infer_auto_device_map that enables or disables the use of max_layer_size. If there are offloaded layers, the function calls infer_auto_device_map internally with this parameter set to False. Does this approach make sense to you? If so, should this behavior be enabled by default? If not, do you have any alternative suggestions? |
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread. Please note that issues that do not follow the contributing guidelines are likely to be ignored. |
Hi @SunMarc, thank you for your patience. I believe this PR is finally ready for review. I implemented the new module allocation strategy as you suggested in this PR comment. Sorry for such a long wait! |
Hey @SunMarc, I hope you are doing well. I am following up on my last comment about if we should proceed with this PR since you might have missed it. I understand that you have many duties across multiple repos, and this PR was opened a long time ago, which takes a lot of time and effort to review. However, if you still think the PR is worth merging, I am more than happy to provide a detailed explanation for what I have done. I am looking forward to your insights. Thank you so much! |
What does this PR do?
This PR continues to solve issues raised in #3041 and discussed in #3066. When multiple GPUs are present, reserving memory for
max_layer_size
can cause unnecessary offloading to the CPU or disks. The PR implements the approach proposed by @SunMarc. It works by first assuming no offloading is necessary, and if there are offloaded modules in the device map, it's recomputed, assuming offloading will occur.Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.