Skip to content

Commit 049167e

Browse files
committed
remove benchmarking script to reduce notes
1 parent 2c378c2 commit 049167e

File tree

1 file changed

+0
-4
lines changed

1 file changed

+0
-4
lines changed

blog/2025-09-24_kvcache-wins-you-can-see.md

Lines changed: 0 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -63,10 +63,6 @@ vLLM takes this further with **Automatic Prefix Caching**: it intelligently iden
6363

6464
In a simple test sending a request with a \~10,000 token prompt to a Qwen/Qwen3-32B instance a second time, time-to-first-token drops from **4.3 seconds** to just **0.6 seconds**.
6565

66-
:::info vLLM benchmark script
67-
For deeper analysis, see the vLLM [`benchmark_prefix_caching.py`](https://github.com/vllm-project/vllm/blob/65a5910ce35f889740bddb2e19dad35c83278873/benchmarks/benchmark_prefix_caching.py) script.
68-
:::
69-
7066
## **Prefix Reuse in Practical Use Cases**
7167

7268
The power of vLLM's caching isn't theoretical; it directly maps to the structure of the most common and valuable LLM workloads. By understanding this pattern, we can see exactly what's at stake when serving in production.

0 commit comments

Comments
 (0)