Skip to content

[Feature] Explicit failure for unmatched model and checkpoints #415

@KiddoZhu

Description

@KiddoZhu

Currently nemo-rl always tries to resume from the last checkpoint in the checkpoint path. When we change the policy model, the new model will fail silently at loading old checkpoints, resulting in two negative consequences:

  1. New checkpoints will overwrite old checkpoints from a different model.
  2. The training step is counted from the old checkpoint, even if the new model is actually trained from scratch.

I feel it's better to fail explicitly when the policy model doesn't match the checkpoints, to prevent such undefined behaviors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions