Skip to content

Conversation

Nech-C
Copy link
Contributor

@Nech-C Nech-C commented Jan 2, 2025

What does this PR do?

This PR continues to solve issues raised in #3041 and discussed in #3066. When multiple GPUs are present, reserving memory for max_layer_size can cause unnecessary offloading to the CPU or disks. The PR implements the approach proposed by @SunMarc. It works by first assuming no offloading is necessary, and if there are offloaded modules in the device map, it's recomputed, assuming offloading will occur.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

Nech-C added 2 commits January 2, 2025 12:57
…tion

- This feature can be enabled by setting `reserve_max_layer` to `False`. By default, the parameter is set to `True`, preserving the original behavior.
- When multiple GPUs are present, all modules can be allocated across them. However, reserving space for the largest layer size may cause unnecessary offloading.
Copy link
Contributor

github-actions bot commented Feb 2, 2025

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@SunMarc
Copy link
Member

SunMarc commented Feb 4, 2025

Thanks again for your contribution ! let me know when is the ready to be reviewed !

@Nech-C
Copy link
Contributor Author

Nech-C commented Feb 13, 2025

Thanks again for your contribution ! let me know when is the ready to be reviewed !

Hey @SunMarc ,
Thanks for the reminder about this PR! After implementing a basic version, I got stuck on some design choices and then completely forgot about it. I’d really appreciate your guidance on how to proceed.

Currently, my approach adds a parameter to infer_auto_device_map that enables or disables the use of max_layer_size. If there are offloaded layers, the function calls infer_auto_device_map internally with this parameter set to False.

Does this approach make sense to you? If so, should this behavior be enabled by default? If not, do you have any alternative suggestions?

Copy link
Contributor

github-actions bot commented Mar 9, 2025

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

@github-actions github-actions bot closed this Mar 17, 2025
@SunMarc SunMarc reopened this Mar 17, 2025
@github-actions github-actions bot closed this Mar 26, 2025
@SunMarc SunMarc added the wip Work in progress label Mar 26, 2025
@SunMarc SunMarc reopened this Mar 26, 2025
@Nech-C Nech-C marked this pull request as ready for review August 22, 2025 19:43
@Nech-C Nech-C changed the title [WIP] optimize infer_auto_device_map for multi-GPU allocation optimize infer_auto_device_map for multi-GPU allocation Aug 23, 2025
@Nech-C
Copy link
Contributor Author

Nech-C commented Sep 5, 2025

Hi @SunMarc, thank you for your patience. I believe this PR is finally ready for review. I implemented the new module allocation strategy as you suggested in this PR comment. Sorry for such a long wait!

@Nech-C
Copy link
Contributor Author

Nech-C commented Oct 16, 2025

Hey @SunMarc, I hope you are doing well. I am following up on my last comment about if we should proceed with this PR since you might have missed it. I understand that you have many duties across multiple repos, and this PR was opened a long time ago, which takes a lot of time and effort to review. However, if you still think the PR is worth merging, I am more than happy to provide a detailed explanation for what I have done.

I am looking forward to your insights. Thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

wip Work in progress

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants