Introduce Fused MoE to Nomic MoE #717

kozistr · 2025-09-09T14:35:16Z

What does this PR do?

related to #596

I've completed testing the fused MoE kernel, which is originally implemented in here by @EricLBuehler. (thanks!)

Here's a fused MoE implementation repository: https://github.com/kozistr/candle-moe. (I adopted and edited Eric's baseline to work with the Nomic MoE version)

Main Changes

Nomic MoE model
- topk_softmax
- fused MoE

Of course, I've also tested that it outputs the (almost) identical result to the naive implementation.

And, honestly, I haven't yet run extensive benchmarks across multiple settings due to time and resource constraints :(, but I've observed an improvement in latency based on wall clock time. (but still have to be verified more and benchmark precise kernel timing)

Also, I'm very new to CUDA programming, so any feedback or suggestions would be greatly appreciated :)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
Did you write any new necessary tests? If applicable, did you include or update the insta snapshots?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@Narsil OR @alvarobartt

alvarobartt · 2025-09-09T16:02:10Z

Hey @kozistr thanks for this PR! I'll try to run some benchmarks on my end to in order to compare both solutions!

Also, do you think it would make sense to add this within https://github.com/huggingface/candle-extensions instead of on a separate repository? Asking as this way it might be easier to maintain in the long-term cc @ivarflakstad too as per the latter!

Thanks again, I'll come back to you once I've tested + reviewed @kozistr 🤗

kozistr · 2025-09-09T16:10:20Z

Hey @kozistr thanks for this PR! I'll try to run some benchmarks on my end to in order to compare both solutions!

Also, do you think it would make sense to add this within https://github.com/huggingface/candle-extensions instead of on a separate repository? Asking as this way it might be easier to maintain in the long-term cc @ivarflakstad too as per the latter!

Thanks again, I'll come back to you once I've tested + reviewed @kozistr 🤗

Hi @alvarobartt! I also think it'd be much better to add the MoE kernel to the candle-extensions repository!

Actually, I had opened an incomplete version of the MoE kernel PR before, which is partially working. And, it'd be a nice time to renew that PR with the new implementation 🤗

I'll get back to you when ready to reopen the PR to candle-extensions! Thanks for your suggestion :)

alvarobartt · 2025-09-09T16:16:59Z

Hey @kozistr thanks for flagging, I missed that! Thanks for the work, and please do let us know if there's anything other than reviews that we can do to help 🤗

kozistr · 2025-09-10T12:38:04Z

Hey @kozistr thanks for flagging, I missed that! Thanks for the work, and please do let us know if there's anything other than reviews that we can do to help 🤗

thanks for your help :) I'll surely get back to you if I need any help 🤗

btw, I've just opened a PR at candle-extensions! Could you please review it if you have the bandwidth? thanks!

kozistr added 9 commits September 8, 2025 21:07

build(deps): candle-moe

3d290ce

feature: fused moe kernel

ed96f76

update: nomic

e505350

fix: weight name

dd9be80

update: fused moe

7aa38f7

fix: experts mlp

d00a1a1

refactor: fused moe to nomic

da9064f

update: fused moe

4eeb1b7

Merge branch 'main' into feature/fused-moe-kernel

e5919ab

kozistr mentioned this pull request Sep 9, 2025

Introduce topk_softmax kernel from the MoE kernel huggingface/candle-extensions#2

Closed

kozistr mentioned this pull request Sep 10, 2025

Introduce fused MoE kernel huggingface/candle-extensions#5

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce Fused MoE to Nomic MoE #717

Introduce Fused MoE to Nomic MoE #717

kozistr commented Sep 9, 2025

Uh oh!

alvarobartt commented Sep 9, 2025

Uh oh!

kozistr commented Sep 9, 2025

Uh oh!

alvarobartt commented Sep 9, 2025

Uh oh!

kozistr commented Sep 10, 2025

Uh oh!

Uh oh!

Introduce Fused MoE to Nomic MoE #717

Are you sure you want to change the base?

Introduce Fused MoE to Nomic MoE #717

Conversation

kozistr commented Sep 9, 2025

What does this PR do?

Before submitting

Who can review?

Uh oh!

alvarobartt commented Sep 9, 2025

Uh oh!

kozistr commented Sep 9, 2025

Uh oh!

alvarobartt commented Sep 9, 2025

Uh oh!

kozistr commented Sep 10, 2025

Uh oh!

Uh oh!