Skip to content

Commit b74da07

Browse files
authored
Add trust remote code for Kimi-K2-Thinking (#116)
Signed-off-by: Mohammad Miadh Angkad <[email protected]>
1 parent cf5bdee commit b74da07

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

moonshotai/Kimi-K2-Think.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ run tensor-parallel like this:
2828

2929
```bash
3030
vllm serve moonshotai/Kimi-K2-Thinking \
31+
--trust-remote-code \
3132
--tensor-parallel-size 8 \
3233
--enable-auto-tool-choice \
3334
--tool-call-parser kimi_k2 \
@@ -114,6 +115,7 @@ vLLM supports [Decode Context Parallel](https://docs.vllm.ai/en/latest/serving/c
114115
```bash
115116

116117
vllm serve moonshotai/Kimi-K2-Thinking \
118+
--trust-remote-code \
117119
--tensor-parallel-size 8 \
118120
--decode-context-parallel-size 8 \
119121
--enable-auto-tool-choice \
@@ -217,4 +219,4 @@ You can observe from the service startup logs that the kv cache token number has
217219
```
218220

219221

220-
Enabling DCP delivers strong advantages (43% faster token generation, 26% higher throughput) with minimal drawbacks (marginal median latency improvement). We recommend reading our [DCP DOC](https://docs.vllm.ai/en/latest/serving/context_parallel_deployment.html#decode-context-parallel) and trying out DCP in your LLM workloads.
222+
Enabling DCP delivers strong advantages (43% faster token generation, 26% higher throughput) with minimal drawbacks (marginal median latency improvement). We recommend reading our [DCP DOC](https://docs.vllm.ai/en/latest/serving/context_parallel_deployment.html#decode-context-parallel) and trying out DCP in your LLM workloads.

0 commit comments

Comments
 (0)