-
Notifications
You must be signed in to change notification settings - Fork 108
Fix: Added check the type of tokenizer.json's model_merges #987
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Mistral |
|
Gpt-2
|
awesome, thanks for this! could you add a unit test here as well to ensure expected functionality: https://github.com/microsoft/onnxruntime-extensions/tree/main/test? |
|
/azp run onnxruntime-extensions.CI |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
Using phi-4 as an example |
|
The following Functions is only called when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use test/data/phi-4-base or test/data/phi-4-mini-reasoning, we'd like to minimize tokenizer files checked into the repo; (also, FYI onnxruntime-extensions only needs the tokenizer.json and tokenizer_config.json files to load tokenizers).
|
Commenter does not have sufficient privileges for PR 987 in repo microsoft/onnxruntime-extensions |
|
Hi @sayanshaw24 |
|
/azp run onnxruntime-extensions.CI |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge the latest code .Please rerun the pipeline
|
/azp where |
|
Azure DevOps orgs getting events for this repository: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sayanshaw24 pr please merge my PR
Modern Tokenizer store merge's value as list of lists of strings. But models like GPT-2 or gemma-2 store as a list of strings
Ex Mistral
Added a check for this fix