Introduce `topk_softmax` kernel from the `MoE` kernel #2

kozistr · 2025-06-13T14:51:50Z

Note

I'm working on this in my repository, candle-moe. Currently, only the topk_softmax kernel is verified, and other kernels (moe_sum, moe_align_block_size) should be tested properly. And, moe_wna16_gemm is under heavy development.

Actually, it's my first time dealing with CUDA and Rust FFI stuff, so there may be something lacking.

I fixed the candle version to 0.8.0 due to cudarc. iirc, the latest version of candle (0.9.0) has a cudarc 0.16 dependency, where there are some API changes (e.g. device_ptr).

Please feel free to leave any comments or feedback :)

Performance

I haven't profiled its performance using GPU event timers, but wall clock time. Looks like it brings huge improvement (x10, x4 faster, depending on the environment) compared to the naive candle implementation.

Reviewer(s)

@Narsil

kozistr · 2025-09-09T16:12:05Z

huggingface/text-embeddings-inference#717

will open another PR to add the fully working MoE kernel.

kozistr added 4 commits June 13, 2025 23:26

add a member, candle-moe

cc1f3df

add candle-moe

32486f0

build(deps): dependencies

a1bc3d3

add: moe wna16 kernel

5413be7

kozistr mentioned this pull request Sep 9, 2025

Introduce Fused MoE to Nomic MoE huggingface/text-embeddings-inference#717

Open

7 tasks

kozistr closed this Sep 9, 2025

kozistr deleted the feature/topk-softmax-kernel branch September 10, 2025 04:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Introduce `topk_softmax` kernel from the `MoE` kernel #2

Introduce `topk_softmax` kernel from the `MoE` kernel #2

Uh oh!

kozistr commented Jun 13, 2025

Uh oh!

kozistr commented Sep 9, 2025

Uh oh!

Uh oh!

Introduce topk_softmax kernel from the MoE kernel #2

Introduce topk_softmax kernel from the MoE kernel #2

Uh oh!

Conversation

kozistr commented Jun 13, 2025

Note

Performance

Reviewer(s)

Uh oh!

kozistr commented Sep 9, 2025

Uh oh!

Uh oh!

Introduce `topk_softmax` kernel from the `MoE` kernel #2

Introduce `topk_softmax` kernel from the `MoE` kernel #2