[Feature] Automatic Chat Template #947

riedgar-ms · 2024-07-09T15:10:11Z

Create a very basic function to extract a guidance.ChatTemplate from a given Transformers Tokenizer. This can be extended over time as we discover new and exciting tokenisers. However, this will have to be balanced against the probability that it can never be fully general - Transformers uses Jinja2 templates, which have all sorts of goodies like loops and branches.

…t-template-01

hudson-ai · 2024-07-09T16:28:24Z

Just thinking out loud a bit... But could structured state (in some form -- not necessarily the rough implementation in my open PR) allow us to call tokenizer.apply_chat_template directly, rather than "simulating" it like this?

It might be arbitrarily hard to make that work, but again, just thinking out loud ;)

riedgar-ms · 2024-07-09T17:31:17Z

Just thinking out loud a bit... But could structured state (in some form -- not necessarily the rough implementation in my open PR) allow us to call tokenizer.apply_chat_template directly, rather than "simulating" it like this?

I believe so, yes. There may be a few edge cases (e.g. what to do when a 'system' prompt isn't available... some templates will error, others quietly prepend to the first 'user' prompt) but that should be far more reliable than this entire approach

…t-template-01

codecov-commenter · 2024-07-09T18:52:44Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

Attention: Patch coverage is 87.09677% with 4 lines in your changes missing coverage. Please review.

Project coverage is 50.77%. Comparing base (4f7bf9e) to head (99c6178).
Report is 284 commits behind head on main.

Files with missing lines	Patch %	Lines
guidance/chat.py	87.09%	4 Missing ⚠️

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

❗ There is a different number of reports uploaded between BASE (4f7bf9e) and HEAD (99c6178). Click for more details.

HEAD has 12 uploads less than BASE

Flag BASE (4f7bf9e) HEAD (99c6178)

16 4

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #947      +/-   ##
==========================================
- Coverage   56.45%   50.77%   -5.68%     
==========================================
  Files          63       63              
  Lines        4793     4823      +30     
==========================================
- Hits         2706     2449     -257     
- Misses       2087     2374     +287

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…t-template-01

xruifan · 2024-11-13T07:51:58Z

Just thinking out loud a bit... But could structured state (in some form -- not necessarily the rough implementation in my open PR) allow us to call tokenizer.apply_chat_template directly, rather than "simulating" it like this?

Hi hudson, I would like to know your approach and would you mind pointing me to the relevant part of the code in your open PR or kindly give me an example? Thank you very much.

fan

riedgar-ms added 4 commits July 9, 2024 10:39

Experimenting....

cce0e9f

Working.....

56d9360

Merge remote-tracking branch 'upstream/main' into riedgar-ms/auto-cha…

44146e4

…t-template-01

Get things (roughly) working

55dea3a

riedgar-ms requested review from Harsha-Nori, hudson-ai and paulbkoch July 9, 2024 15:10

riedgar-ms added 2 commits July 9, 2024 11:10

Don't need re

ca0961a

Try to fix a CI test

abc8e25

Typing not so easy...

f593137

riedgar-ms added 4 commits July 9, 2024 13:33

Another fix for Llama2 template

1cd855c

Merge remote-tracking branch 'upstream/main' into riedgar-ms/auto-cha…

9d45696

…t-template-01

Trying to deal with the Llama2 system prompts.... and failing

ac6c1f4

Had wrong xfail...

545d8ff

Merge remote-tracking branch 'upstream/main' into riedgar-ms/auto-cha…

99c6178

…t-template-01

microdev1 mentioned this pull request Nov 11, 2024

ollama support? #1001

Open

paulbkoch force-pushed the main branch from c291eda to c4531ea Compare February 3, 2025 08:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature] Automatic Chat Template #947

[Feature] Automatic Chat Template #947

Uh oh!

riedgar-ms commented Jul 9, 2024 •

edited

Loading

Uh oh!

hudson-ai commented Jul 9, 2024

Uh oh!

riedgar-ms commented Jul 9, 2024

Uh oh!

codecov-commenter commented Jul 9, 2024 •

edited

Loading

Uh oh!

xruifan commented Nov 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

[Feature] Automatic Chat Template #947

Are you sure you want to change the base?

[Feature] Automatic Chat Template #947

Uh oh!

Conversation

riedgar-ms commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hudson-ai commented Jul 9, 2024

Uh oh!

riedgar-ms commented Jul 9, 2024

Uh oh!

codecov-commenter commented Jul 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

xruifan commented Nov 13, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

riedgar-ms commented Jul 9, 2024 •

edited

Loading

codecov-commenter commented Jul 9, 2024 •

edited

Loading