-
Notifications
You must be signed in to change notification settings - Fork 174
[Feature] Offload all text encoders by default #594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
need to rebase for the buildkite CI fix |
6c5f09f
to
62ec225
Compare
@SolitaryThinker Done |
thanks, testing now on multi-node |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where is the actual logic for offloading text encoders? I think you forgot to include that logic in this PR?
|
||
# Load the module | ||
return loader.load(component_model_path, architecture, fastvideo_args) | ||
return loader.load(component_model_path, architecture, fastvideo_args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
revert this?
|
||
# choose `assign=True` since we cannot call `copy_` on meta tensor | ||
return model.load_state_dict(sharded_sd, strict=strict, assign=True) | ||
return model.load_state_dict(sharded_sd, strict=strict, assign=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here
|
||
# If there were no matches, return the untouched param name | ||
return name | ||
return name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here
Added |
Part of #572. Please check that this doesn't cause multi-node training to hang.
cc @SolitaryThinker