Skip to content

Conversation

JerryZhou54
Copy link
Collaborator

@JerryZhou54 JerryZhou54 commented May 28, 2025

  • Save to parquets periodically
  • Allow resuming from the middle

@JerryZhou54
Copy link
Collaborator Author

JerryZhou54 commented May 28, 2025

Need to merge #438 first, because this PR requires v1/datasets

@JerryZhou54 JerryZhou54 force-pushed the wei/preprocess branch 2 times, most recently from 93fd4ac to 5c3ec38 Compare May 28, 2025 06:00
Comment on lines 20 to 21
local_dir=os.path.join(
'data', BASE_MODEL_PATH))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why we should use local_dir--this can make cache sharing more complicated. I've PRed to remove this

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it may have been me that added the hardcoded path? @JerryZhou54 perhaps lets just use model_path arg here and have the registry detect the correct pipeline config by directly using the HF string instead of a local path?

# export WANDB_MODE="offline"
GPU_NUM=1 # 2,4,8
MODEL_PATH="Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
TEXT_ENCODER_PATH="/Wan-AI/Wan2.1-T2V-1.3B-Diffusers/tokenizer"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

instead of hardcoding this TEXT_ENCODER_PATH, we can simply do:

path = maybe_download_model(args.model_path)
encoder_path = os.join(path, 'tokenizer')


logger = init_logger(__name__)

BASE_MODEL_PATH = "/workspace/data/Wan-AI/Wan2.1-T2V-1.3B-Diffusers"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just use args.model_path here instead?

@SolitaryThinker
Copy link
Collaborator

feel free to merge after addressing my comments

@JerryZhou54 JerryZhou54 merged commit a004408 into main May 29, 2025
6 checks passed
@SolitaryThinker SolitaryThinker deleted the wei/preprocess branch May 29, 2025 21:31
if __name__ == "__main__":
parser = argparse.ArgumentParser()
# dataset & dataloader
parser.add_argument("--model_path", type=str, default="data/mochi")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's better to not use data except only for testers running on Runpod, just use hf's default cache path

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants