Skip to content

Conversation

2015aroras
Copy link
Contributor

@2015aroras 2015aroras commented Sep 15, 2025

This PR adds the upcoming Olmo 3. The main architectural differences from Olmo 2 are:

  • Sliding window attention is used for 3 out of 4 layers. RoPE scaling is not applied to sliding window attention layers.

Since the architecture is very similar to Olmo 2, this PR opts to merge Olmo 3 changes into the Olmo 2 implementation (similar to vllm-project/vllm#24534). I can create a separate Olmo 3 implementation instead if preferred.

@github-actions github-actions bot added the python python script changes label Sep 15, 2025
@2015aroras 2015aroras marked this pull request as ready for review September 15, 2025 20:08
@2015aroras
Copy link
Contributor Author

2015aroras commented Sep 15, 2025

I used the model conversion example for testing. I got the following results when using bf16 on shanearora/2025-sep-a-base-model, modified to have yarn rope scaling enabled.

📈 METRICS
==============================
MSE (Mean Squared Error):     1.592396e-02
Reference Variance:           6.831117e+00
NMSE:                         2.331092e-03
Max Absolute Error:           0.438750
Mean Absolute Error:          0.116665
NMSE (dB):                    -26.32 dB

🎯 INTERPRETATION
==============================
👍 Good match

📋 GUIDANCE
==============================
👍 GOOD: Your GGML conversion is working well.
   Small differences are likely due to precision/quantization.

📚 NMSE BENCHMARKS
==============================
✅ RESULT: PASS (NMSE = 2.33e-03)

Also, below is the allenai/OLMo-2-0425-1B with fp32.

📈 METRICS
==============================
MSE (Mean Squared Error):     1.594746e-03
Reference Variance:           9.219801e+00
NMSE:                         1.729697e-04
Max Absolute Error:           0.168732
Mean Absolute Error:          0.033951
NMSE (dB):                    -37.62 dB

🎯 INTERPRETATION
==============================
👍 Very good match

📋 GUIDANCE
==============================
✅ EXCELLENT: Your GGML conversion is working very well!
   The differences are negligible for practical use.

📚 NMSE BENCHMARKS
==============================
✅ RESULT: PASS (NMSE = 1.73e-04)

2015aroras and others added 2 commits September 15, 2025 15:13
@pwilkin
Copy link
Collaborator

pwilkin commented Sep 16, 2025

@2015aroras What tool are you using to compare the conversion?

@2015aroras
Copy link
Contributor Author

@pwilkin I am using the model conversion tools inside this repo. These have been created to help make sure HF to llama.cpp conversion is accurate. The logs above are from the Model logits verfication step.

@pwilkin
Copy link
Collaborator

pwilkin commented Sep 16, 2025

Ah, that's nice, I haven't used that specific one yet :)

@2015aroras
Copy link
Contributor Author

2015aroras commented Sep 16, 2025

All the check failures seem to be unrelated to this change. Before merging master again, an ios check was failing instead. So imo ready to merge.

@CISC
Copy link
Collaborator

CISC commented Sep 16, 2025

All the check failures seem to be unrelated to this change. Before merging master again, an ios check was failing instead. So imo ready to merge.

Yes, sorry for the delay, just a minor cosmetic change and we'll merge. :)

Co-authored-by: Sigbjørn Skjæret <[email protected]>
@CISC CISC merged commit 85286f3 into ggml-org:master Sep 17, 2025
52 checks passed
angt pushed a commit to angt/llama.cpp that referenced this pull request Sep 17, 2025
* Add HF to gguf conversion logic for Olmo3

* Add Olmo3 implementation

* Update rope comment

* Fix indentation

Co-authored-by: Sigbjørn Skjæret <[email protected]>

* Apply suggestion from @CISC

Co-authored-by: Sigbjørn Skjæret <[email protected]>

---------

Co-authored-by: Sigbjørn Skjæret <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
python python script changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants