Skip to content

Commit 1d38367

Browse files
committed
minor fix
1 parent 08cc02e commit 1d38367

File tree

1 file changed

+4
-4
lines changed

1 file changed

+4
-4
lines changed

blog/2025-09-24_kvcache-wins-you-can-see.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -271,19 +271,19 @@ The journey of llm-d reflects a broader shift in how we think about LLM inferenc
271271
By moving from AI-blind routing to a precise, KV-cache aware strategy, **we can unlock order-of-magnitude improvements in latency and throughput on the exact same hardware**. The well-lit path of precise prefix-cache awareness offers a tested, benchmarked solution to make your distributed deployments dramatically more efficient.
272272

273273
:::tip Choosing the Right Strategy
274-
The optimal scheduler depends on the complexity of the workload. Below is a hierarchy of common strategies, where each level addresses the limitations of the one before it.
274+
The optimal scheduler depends on the complexity of the workload. Below is a hierarchy of supported strategies, where each level addresses the limitations of the one before it.
275275

276276
* **1. Random/Round-Robin Scheduling**
277277
This simplest approach works well for symmetric workloads where all requests have similar computational costs and minimal cache reuse.
278-
* **Its Limitation:** It creates load imbalance when workloads are asymmetric
278+
* **Limitation:** It creates load imbalance when workloads are asymmetric
279279

280280
* **2. Load-Aware Scheduling**
281281
The necessary next step for asymmetric workloads. By routing requests based serving capacity, it prevents overload and improves resource utilization.
282-
* **Its Limitation:** It cannot exploit caching opportunities, resulting in redundant computation.
282+
* **Limitation:** It cannot exploit caching opportunities, resulting in redundant computation.
283283

284284
* **3. Approximate Prefix-Cache Scheduling**
285285
This strategy introduces cache-awareness for workloads with predictable prefix reuse. It is effective when its estimations of the cache state are reliable.
286-
* **Its Limitation:** The estimations can become stale at high scale or with dynamic workloads, leading to suboptimal routing.
286+
* **Limitation:** The estimations can become stale at high scale or with dynamic workloads, leading to suboptimal routing.
287287

288288
* **4. Precise Prefix-Cache Aware Scheduling**
289289
In production environments with tight SLOs - this is the most effective strategy for dynamic, high-scale workloads where maximizing the cache-hit ratio is a primary performance driver.

0 commit comments

Comments
 (0)