Skip to content

Conversation

LudWittg
Copy link

The original regex didn’t correctly match newlines inside the reasoning and answer tags. As a result, soft_format_reward_func always returned 0, since strict_format_reward_func required exactly two newlines inside each tag.

@Erland366
Copy link
Collaborator

I move this PR to make it updated with the new notebook as well to here -> #96

Will close this for now

@Erland366 Erland366 closed this Sep 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants