huggingface / trl Public

generated from fastai/nbdev_template

Notifications You must be signed in to change notification settings
Fork 2.1k
Star 15.2k

Code
Issues 466
Pull requests 66
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: huggingface/trl

Labels 32 Milestones 0

New pull request New

66 Open 1,849 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

fix bug when using dataset streaming by accelerate

#3950 opened Aug 25, 2025 by kaixuanliu

Loading…

add default dataset and related precessing to make this example work

#3949 opened Aug 25, 2025 by kaixuanliu

Loading…

Update grpo_trainer.py to fix the dim error.

#3943 opened Aug 23, 2025 by HelloWorldLTY

Loading…

Return position_ids for flash_attention_3

#3942 opened Aug 23, 2025 by jue-jue-zi

Loading…

5 tasks

Docker update

#3931 opened Aug 20, 2025 by qgallouedec

Loading…

5 tasks

[SFTTrainer]: Check for assistant mask up to max_length

#3930 opened Aug 20, 2025 by pramodith

Loading…

2 of 5 tasks

Support for pre-defined image positions in VLM training data

#3911 opened Aug 17, 2025 by YeFD

Loading…

3 of 5 tasks

[DRAFT] Refactor DPO

#3906 opened Aug 15, 2025 by qgallouedec • Draft

5 tasks

Test in distributed setting

#3902 opened Aug 15, 2025 by qgallouedec

Loading…

5 tasks

BEMA for ref model

#3898 opened Aug 14, 2025 by qgallouedec

Loading…

5 tasks

validate examples on xpu

#3897 opened Aug 14, 2025 by yao-matrix

Loading…

🧭 HF jobs x TRL guide

#3890 opened Aug 13, 2025 by sergiopaniego

Loading…

3 of 12 tasks

GRPOTrainer : fix prompt truncation for multimodal inputs with multiple image tokens

#3879 opened Aug 11, 2025 by artem-spector

Loading…

4 tasks

vLLM rollout numerical differences causing off-policy RL.

#3867 opened Aug 7, 2025 by LeonEricsson • Draft

5 tasks

Implement DPOP

#3864 opened Aug 7, 2025 by 1485840691

Loading…

[#3647] Fix: Assign default values in the GKDTrainer's constructor only when …

#3851 opened Aug 5, 2025 by seungduk-yanolja

Loading…

2 of 5 tasks

Update profiling.py: fix scoping problems for wandb and mlflow

#3845 opened Aug 4, 2025 by markshinyounglee

Loading…

5 tasks done

dynamic temperature

#3844 opened Aug 4, 2025 by shirinyamani • Draft

5 tasks

Optimize RLOO Trainer memory usage with string-level processing

#3837 opened Aug 2, 2025 by luckyvickyricky

Loading…

2 of 5 tasks

[GSPO]: Refactor _compute_loss

#3835 opened Aug 1, 2025 by pramodith

Loading…

2 of 5 tasks

support GSPO-token

#3820 opened Jul 31, 2025 by hjh0119

Loading…

GSPO docs - Sequence importance ratio and differences in relation to GRPO

#3816 opened Jul 31, 2025 by almeidava93

Loading…

2 of 5 tasks

Rloo final

#3801 opened Jul 29, 2025 by shirinyamani

Loading…

5 tasks

Add vLLM server mode and VLM support to OnlineDPOTrainer

#3783 opened Jul 27, 2025 by vaelev

Loading…

6 tasks done

change doc for num_iterations and steps_per_generation to hopefully make them more clear and differentiate between them more clearly

#3761 opened Jul 23, 2025 by avishaiElmakies

Loading…

2 of 5 tasks

Previous 1 2 3 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2025-07-25.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!