optimize infer_auto_device_map for multi-GPU allocation #3321

Nech-C · 2025-01-02T20:05:24Z

What does this PR do?

This PR continues to solve issues raised in #3041 and discussed in #3066. When multiple GPUs are present, reserving memory for max_layer_size can cause unnecessary offloading to the CPU or disks. The PR implements the approach proposed by @SunMarc. It works by first assuming no offloading is necessary, and if there are offloaded modules in the device map, it's recomputed, assuming offloading will occur.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

…tion - This feature can be enabled by setting `reserve_max_layer` to `False`. By default, the parameter is set to `True`, preserving the original behavior. - When multiple GPUs are present, all modules can be allocated across them. However, reserving space for the largest layer size may cause unnecessary offloading.

…c-optim

github-actions · 2025-02-02T15:06:25Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

SunMarc · 2025-02-04T12:56:07Z

Thanks again for your contribution ! let me know when is the ready to be reviewed !

Nech-C · 2025-02-13T04:39:03Z

Thanks again for your contribution ! let me know when is the ready to be reviewed !

Hey @SunMarc ,
Thanks for the reminder about this PR! After implementing a basic version, I got stuck on some design choices and then completely forgot about it. I’d really appreciate your guidance on how to proceed.

Currently, my approach adds a parameter to infer_auto_device_map that enables or disables the use of max_layer_size. If there are offloaded layers, the function calls infer_auto_device_map internally with this parameter set to False.

Does this approach make sense to you? If so, should this behavior be enabled by default? If not, do you have any alternative suggestions?

github-actions · 2025-03-09T15:06:40Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

…device-map-multi-gpu-alloc-optim

Nech-C · 2025-09-05T00:12:36Z

Hi @SunMarc, thank you for your patience. I believe this PR is finally ready for review. I implemented the new module allocation strategy as you suggested in this PR comment. Sorry for such a long wait!

Nech-C · 2025-10-16T20:25:21Z

Hey @SunMarc, I hope you are doing well. I am following up on my last comment about if we should proceed with this PR since you might have missed it. I understand that you have many duties across multiple repos, and this PR was opened a long time ago, which takes a lot of time and effort to review. However, if you still think the PR is worth merging, I am more than happy to provide a detailed explanation for what I have done.

I am looking forward to your insights. Thank you so much!

Nech-C added 2 commits January 2, 2025 12:57

Merge branch 'main' into feature/infer-auto-device-map-multi-gpu-allo…

f4de3c0

…c-optim

github-actions bot closed this Mar 17, 2025

SunMarc reopened this Mar 17, 2025

github-actions bot closed this Mar 26, 2025

SunMarc added the wip Work in progress label Mar 26, 2025

SunMarc reopened this Mar 26, 2025

Nech-C added 3 commits August 9, 2025 16:58

Merge remote-tracking branch 'upstream/main' into feature/infer-auto-…

bb22ffc

…device-map-multi-gpu-alloc-optim

fix: let function recompute device_map when it is {}

6a3ff7e

test: add tests for reserve_max_layer

0af5899

Nech-C marked this pull request as ready for review August 22, 2025 19:43

Nech-C changed the title ~~[WIP] optimize infer_auto_device_map for multi-GPU allocation~~ optimize infer_auto_device_map for multi-GPU allocation Aug 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

optimize infer_auto_device_map for multi-GPU allocation #3321

optimize infer_auto_device_map for multi-GPU allocation #3321

Nech-C commented Jan 2, 2025

Uh oh!

github-actions bot commented Feb 2, 2025

Uh oh!

SunMarc commented Feb 4, 2025

Uh oh!

Nech-C commented Feb 13, 2025

Uh oh!

github-actions bot commented Mar 9, 2025

Uh oh!

Nech-C commented Sep 5, 2025

Uh oh!

Nech-C commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

optimize infer_auto_device_map for multi-GPU allocation #3321

Are you sure you want to change the base?

optimize infer_auto_device_map for multi-GPU allocation #3321

Conversation

Nech-C commented Jan 2, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

github-actions bot commented Feb 2, 2025

Uh oh!

SunMarc commented Feb 4, 2025

Uh oh!

Nech-C commented Feb 13, 2025

Uh oh!

github-actions bot commented Mar 9, 2025

Uh oh!

Nech-C commented Sep 5, 2025

Uh oh!

Nech-C commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants