[Feature] Explicit failure for unmatched model and checkpoints

Currently nemo-rl always tries to resume from the last checkpoint in the checkpoint path. When we change the policy model, the new model will fail silently at loading old checkpoints, resulting in two negative consequences:

1. New checkpoints will overwrite old checkpoints from a different model. 
2. The training step is counted from the old checkpoint, even if the new model is actually trained from scratch.

I feel it's better to fail explicitly when the policy model doesn't match the checkpoints, to prevent such undefined behaviors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Explicit failure for unmatched model and checkpoints #415

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature] Explicit failure for unmatched model and checkpoints #415

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions