Skip to content

Commit 4a7d096

Browse files
committed
Update links in blog post for precise prefix-cache aware scheduling and intelligent inference scheduling to use relative paths for improved navigation.
Signed-off-by: Pete Cheslock <[email protected]>
1 parent 0539927 commit 4a7d096

File tree

1 file changed

+6
-6
lines changed

1 file changed

+6
-6
lines changed

blog/2025-09-24_kvcache-wins-you-can-see.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@ authors:
1616
tags: [blog, updates, llm-d]
1717
---
1818

19-
The llm-d project provides a series of “well-lit paths” \- tested, benchmarked solutions for deploying large language models in production. Our first path, [**Intelligent Inference Scheduling**](https://llm-d.ai/blog/intelligent-inference-scheduling-with-llm-d), established a baseline for AI-aware routing by balancing both cluster load and prefix-cache affinities. The default configuration for that path uses an *approximate* method for the latter, predicting cache locality based on request traffic.
19+
The llm-d project provides a series of “well-lit paths” \- tested, benchmarked solutions for deploying large language models in production. Our first path, [**Intelligent Inference Scheduling**](/blog/intelligent-inference-scheduling-with-llm-d), established a baseline for AI-aware routing by balancing both cluster load and prefix-cache affinities. The default configuration for that path uses an *approximate* method for the latter, predicting cache locality based on request traffic.
2020

21-
This blog illuminates a more advanced and powerful path: [**precise prefix-cache aware scheduling**](https://github.com/llm-d/llm-d/blob/main/guides/precise-prefix-cache-aware/README.md).
21+
This blog illuminates a more advanced and powerful path: [**precise prefix-cache aware scheduling**](/docs/guide/Installation/precise-prefix-cache-aware).
2222

2323
We take a deep dive into the next generation of this feature, which moves beyond prediction and gives the scheduler direct introspection into distributed vLLM caches. This precision is key to maximizing cache hit rates and achieving a new level of performance and maximizing cost-efficiency in your distributed deployments.
2424

@@ -154,7 +154,7 @@ With an accurate, real-time global view of the cache, the scheduler can now perf
154154

155155
This scorer provides a strong **stickiness** signal, scheduling requests to maximize the probability of a cache hit. However, relying solely on stickiness can create new problems, like sending a stream of requests to an already overloaded pod while others sit idle.
156156

157-
Therefore, the final routing decision isn't based on this score alone. As detailed in our previous post on the [**Intelligent Inference Scheduling**](https://llm-d.ai/blog/intelligent-inference-scheduling-with-llm-d) well-lit path, the KV-cache affinity score is combined with distributive, load-aware scores, creating a balanced decision.
157+
Therefore, the final routing decision isn't based on this score alone. As detailed in our previous post on the [**Intelligent Inference Scheduling**](/blog/intelligent-inference-scheduling-with-llm-d) well-lit path, the KV-cache affinity score is combined with distributive, load-aware scores, creating a balanced decision.
158158

159159
## **Performance Results**
160160

@@ -274,10 +274,10 @@ By moving from AI-blind routing to a precise, KV-cache aware strategy, **we can
274274

275275
The llm-d project thrives on community contributions, and there are many ways to get involved:
276276

277-
* Explore the llm-d Community Quickstart Guide → [Start here](https://llm-d.ai/docs/community) to learn more about getting involved in the llm-d project.
278-
* Join our Slack → [Get your invite](https://llm-d.ai/slack) and connect with maintainers and contributors
277+
* Explore the llm-d Community Quickstart Guide → [Start here](/docs/community) to learn more about getting involved in the llm-d project.
278+
* Join our Slack → [Get your invite](/slack) and connect with maintainers and contributors
279279
* Explore the code → Browse our [GitHub organization](https://github.com/llm-d) and find issues that interest you
280-
* Attend meetings → All meetings are open\! Add our [public calendar](https://llm-d.ai/docs/community#public-meeting-calendar) and join discussions\`
280+
* Attend meetings → All meetings are open\! Add our [public calendar](/docs/community#public-meeting-calendar) and join discussions\`
281281

282282
---
283283

0 commit comments

Comments
 (0)