Set encoder TP size to 1 by default #569

Edenzzzz · 2025-06-29T00:17:54Z

After #594, encoder offload is enabled by default, so TP will almost always be slower than offload + layer-wise prefetch. This PR sets TP size to 1 for training and inference by default.

I will need to spend some time finishing up the LoRA features, so please ensure this doesn't break any training or inference scripts.

cc @BrianChen1129 @SolitaryThinker

Edenzzzz had a problem deploying to runpod-runners June 29, 2025 00:18 — with GitHub Actions Failure

Edenzzzz temporarily deployed to runpod-runners June 29, 2025 00:18 — with GitHub Actions Inactive

SolitaryThinker added the go Trigger Buildkite CI label Jun 30, 2025

Edenzzzz force-pushed the wenxuan/deprecate_tp branch from 8206640 to 53a6f4a Compare July 5, 2025 01:08

Edenzzzz requested review from SolitaryThinker and BrianChen1129 July 5, 2025 01:08

fix

4bca696

SolitaryThinker approved these changes Jul 9, 2025

View reviewed changes

SolitaryThinker force-pushed the wenxuan/deprecate_tp branch from 53a6f4a to 4bca696 Compare July 9, 2025 21:17

Edenzzzz added 2 commits July 9, 2025 14:47

reduce ssim threshold

daa3eb1

lower threshold

e884917

Edenzzzz merged commit 65ed588 into main Jul 9, 2025
3 of 4 checks passed

Edenzzzz deleted the wenxuan/deprecate_tp branch July 9, 2025 22:44

Edenzzzz mentioned this pull request Jul 20, 2025

[Feature] Ask about training arguments #617

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Set encoder TP size to 1 by default #569

Set encoder TP size to 1 by default #569

Uh oh!

Edenzzzz commented Jun 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Set encoder TP size to 1 by default #569

Set encoder TP size to 1 by default #569

Uh oh!

Conversation

Edenzzzz commented Jun 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Edenzzzz commented Jun 29, 2025 •

edited

Loading