Add trust remote code for Kimi-K2-Thinking (#116)

mmangkad · web-flow · commit b74da0706453 · 2025-11-07T10:20:43.000+08:00
Signed-off-by: Mohammad Miadh Angkad &lt;mangkad.bsdsba2027@aim.edu&gt;
diff --git a/moonshotai/Kimi-K2-Think.md b/moonshotai/Kimi-K2-Think.md
@@ -28,6 +28,7 @@ run tensor-parallel like this:
 
 ```bash
 vllm serve moonshotai/Kimi-K2-Thinking \
+  --trust-remote-code \
   --tensor-parallel-size 8 \
   --enable-auto-tool-choice \
   --tool-call-parser kimi_k2 \
@@ -114,6 +115,7 @@ vLLM supports [Decode Context Parallel](https://docs.vllm.ai/en/latest/serving/c
 ```bash
 
 vllm serve moonshotai/Kimi-K2-Thinking \
+  --trust-remote-code \
   --tensor-parallel-size 8 \
   --decode-context-parallel-size 8 \
   --enable-auto-tool-choice \
@@ -217,4 +219,4 @@ You can observe from the service startup logs that the kv cache token number has
 ```
 
 
-Enabling DCP delivers strong advantages (43% faster token generation, 26% higher throughput) with minimal drawbacks (marginal median latency improvement). We recommend reading our  [DCP DOC](https://docs.vllm.ai/en/latest/serving/context_parallel_deployment.html#decode-context-parallel) and trying out DCP in your LLM workloads.
+Enabling DCP delivers strong advantages (43% faster token generation, 26% higher throughput) with minimal drawbacks (marginal median latency improvement). We recommend reading our  [DCP DOC](https://docs.vllm.ai/en/latest/serving/context_parallel_deployment.html#decode-context-parallel) and trying out DCP in your LLM workloads.