[Feature] Offload all text encoders by default #594

Edenzzzz · 2025-07-03T03:00:10Z

Part of #572. Please check that this doesn't cause multi-node training to hang.
cc @SolitaryThinker

SolitaryThinker · 2025-07-03T05:37:36Z

need to rebase for the buildkite CI fix

…b#571)" This reverts commit 5f938b5.

Edenzzzz · 2025-07-03T19:31:17Z

@SolitaryThinker Done

SolitaryThinker · 2025-07-03T20:42:15Z

thanks, testing now on multi-node

SolitaryThinker

where is the actual logic for offloading text encoders? I think you forgot to include that logic in this PR?

SolitaryThinker · 2025-07-03T22:20:46Z

fastvideo/v1/models/loader/component_loader.py


        # Load the module
-        return loader.load(component_model_path, architecture, fastvideo_args)
+        return loader.load(component_model_path, architecture, fastvideo_args)


revert this?

SolitaryThinker · 2025-07-03T22:20:55Z

fastvideo/v1/models/loader/fsdp_load.py


    # choose `assign=True` since we cannot call `copy_` on meta tensor
-    return model.load_state_dict(sharded_sd, strict=strict, assign=True)
+    return model.load_state_dict(sharded_sd, strict=strict, assign=True)


SolitaryThinker · 2025-07-03T22:21:00Z

fastvideo/v1/models/loader/weight_utils.py


    # If there were no matches, return the untouched param name
-    return name
+    return name


Edenzzzz · 2025-07-03T23:26:09Z

where is the actual logic for offloading text encoders? I think you forgot to include that logic in this PR?

Added

SolitaryThinker added the go Trigger Buildkite CI label Jul 3, 2025

SolitaryThinker self-requested a review July 3, 2025 04:31

Edenzzzz and others added 5 commits July 3, 2025 10:11

Revert "[Revert] "[Feature] Load weights from distributed" (hao-ai-la…

08a7810

…b#571)" This reverts commit 5f938b5.

fix group

17cc0cd

fix fsdp to cpu

2fa4ecf

remove dist load related

99cbe15

pre-commit

62ec225

Edenzzzz force-pushed the offload_text_enc branch from 6c5f09f to 62ec225 Compare July 3, 2025 17:12

SolitaryThinker requested changes Jul 3, 2025

View reviewed changes

Edenzzzz added 5 commits July 3, 2025 16:17

add missing offload logic

47c1c7c

remove useless args

a92d1e6

fix

9a773d2

fix

6913a7e

fix

4d4842b

SolitaryThinker approved these changes Jul 4, 2025

View reviewed changes

SolitaryThinker merged commit ed1e8d6 into hao-ai-lab:main Jul 4, 2025
1 check passed

Edenzzzz deleted the offload_text_enc branch July 4, 2025 00:46

Edenzzzz mentioned this pull request Jul 5, 2025

Set encoder TP size to 1 by default #569

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Offload all text encoders by default #594

[Feature] Offload all text encoders by default #594

Uh oh!

Edenzzzz commented Jul 3, 2025

Uh oh!

SolitaryThinker commented Jul 3, 2025

Uh oh!

Edenzzzz commented Jul 3, 2025

Uh oh!

SolitaryThinker commented Jul 3, 2025

Uh oh!

SolitaryThinker left a comment

Uh oh!

SolitaryThinker Jul 3, 2025

Uh oh!

SolitaryThinker Jul 3, 2025

Uh oh!

SolitaryThinker Jul 3, 2025

Uh oh!

Edenzzzz commented Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!

[Feature] Offload all text encoders by default #594

[Feature] Offload all text encoders by default #594

Uh oh!

Conversation

Edenzzzz commented Jul 3, 2025

Uh oh!

SolitaryThinker commented Jul 3, 2025

Uh oh!

Edenzzzz commented Jul 3, 2025

Uh oh!

SolitaryThinker commented Jul 3, 2025

Uh oh!

SolitaryThinker left a comment

Choose a reason for hiding this comment

Uh oh!

SolitaryThinker Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

SolitaryThinker Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

SolitaryThinker Jul 3, 2025

Choose a reason for hiding this comment

Uh oh!

Edenzzzz commented Jul 3, 2025

Uh oh!

Uh oh!

Uh oh!