Introduce topk_softmax
kernel from the MoE
kernel
#2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Note
I'm working on this in my repository,
candle-moe
. Currently, only thetopk_softmax
kernel is verified, and other kernels (moe_sum
,moe_align_block_size
) should be tested properly. And,moe_wna16_gemm
is under heavy development.Actually, it's my first time dealing with CUDA and Rust FFI stuff, so there may be something lacking.
I fixed the
candle
version to0.8.0
due tocudarc
. iirc, the latest version ofcandle
(0.9.0) has acudarc 0.16
dependency, where there are some API changes (e.g.device_ptr
).Please feel free to leave any comments or feedback :)
Performance
I haven't profiled its performance using GPU event timers, but wall clock time. Looks like it brings huge improvement (x10, x4 faster, depending on the environment) compared to the naive candle implementation.
Reviewer(s)
@Narsil