Skip to content

Commit 93904d6

Browse files
authored
Fixing broken links. (#62)
* Fixing broken links. this should fix the broken links from Jun 25ths check. Signed-off-by: JJ Asghar <[email protected]> * fixed mdx and relitive paths fixed mdx and relitive paths for docs Signed-off-by: JJ Asghar <[email protected]> * Removed endings of files Not sure if this will actually work... Signed-off-by: JJ Asghar <[email protected]> * Fixing the architecture links. - fixed some white space - fixed the architecture link Signed-off-by: JJ Asghar <[email protected]> * we need "I" not "i" docusourus can redirect, but still gives a 404 first. Signed-off-by: JJ Asghar <[email protected]> * removed the hard coded llm-d.ai url This will work relitive now. Signed-off-by: JJ Asghar <[email protected]> --------- Signed-off-by: JJ Asghar <[email protected]>
1 parent b3a3188 commit 93904d6

File tree

6 files changed

+43
-46
lines changed

6 files changed

+43
-46
lines changed

docs/architecture/00_architecture.mdx

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ import VideoEmbed from '@site/src/components/VideoEmbed';
2121

2222

2323
![llm-d Architecture](../assets/images/llm-d-arch-simplified.svg)
24-
24+
2525

2626
Key features of `llm-d` include:
2727

@@ -54,12 +54,13 @@ See the guided experience with our [quickstart](https://github.com/llm-d/llm-d-d
5454
llm-d repo is a metaproject with subcomponents that can be cloned individually.
5555

5656
To clone all the components:
57+
58+
```bash
59+
git clone --recurse-submodules https://github.com/llm-d/llm-d.git
5760
```
58-
git clone --recurse-submodules https://github.com/llm-d/llm-d.git
59-
```
6061

6162
**Tip**
62-
As a customization example, see [here](https://github.com/llm-d/llm-d/tree/dev) a template for adding a scheduler scorer.
63+
As a customization example, see [here](https://github.com/llm-d/llm-d/tree/dev) a template for adding a scheduler scorer.
6364

6465
## Releases
6566

docs/architecture/Components/06_kv-cache.md

Lines changed: 23 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -6,24 +6,24 @@ sidecar_label: KV-Cache Manager
66

77
## Introduction
88

9-
LLM inference can be computationally expensive due to the sequential nature of token generation.
10-
KV-caching plays a critical role in optimizing this process. By storing previously computed key and value attention vectors,
11-
KV-cache reuse avoids redundant computations during inference, significantly reducing latency and resource consumption.
12-
This is particularly beneficial for long context multi-turn conversations or Agentic (&RAG) applications where
13-
previously computed information can be leveraged effectively.
9+
LLM inference can be computationally expensive due to the sequential nature of token generation.
10+
KV-caching plays a critical role in optimizing this process. By storing previously computed key and value attention vectors,
11+
KV-cache reuse avoids redundant computations during inference, significantly reducing latency and resource consumption.
12+
This is particularly beneficial for long context multi-turn conversations or Agentic (&RAG) applications where
13+
previously computed information can be leveraged effectively.
1414
Efficient KV-cache management and routing are essential for scaling LLM inference and delivering a responsive user experience.
1515

1616
llmd-kv-cache-manager is a pluggable KV-cache Manager for KV-cache Aware Routing in LLM serving platforms.
1717

1818
This initial work will expand in capacity as development continues.
19-
19+
2020
See the [docs folder in the repository](https://github.com/llm-d/llm-d-kv-cache-manager/blob/main/docs/README.md) for more information on goals, architecture and more.
2121

2222
## Goals
2323

2424
The KV-Cache-Manager is designed to connect high-level serving-stack goals with concrete system capabilities through a layered objective structure:
2525

26-
- **Improve user experience**
26+
- **Improve user experience**
2727
- By reducing Time-To-First-Token (TTFT)
2828
- Enabled through higher KVCache hit rates and reduced tensor transfers
2929
- Supported by smart routing and distributed cache availability
@@ -38,26 +38,26 @@ The KV-Cache-Manager is designed to connect high-level serving-stack goals with
3838
- User session duplication/migration for true and seamless load balancing
3939

4040

41-
## Vision
41+
## Vision
4242

43-
This goal structure above is shaped by our vision for emerging use cases like RAG and agentic workflows,
44-
which involve heavy context-reuse across sessions and instances.
45-
Shared documents, tool prompts, and workflow steps create overlapping token streams that benefit significantly from
46-
cross-instance KVCache coordination.
43+
This goal structure above is shaped by our vision for emerging use cases like RAG and agentic workflows,
44+
which involve heavy context-reuse across sessions and instances.
45+
Shared documents, tool prompts, and workflow steps create overlapping token streams that benefit significantly from
46+
cross-instance KVCache coordination.
4747

48-
To implement this vision, the KVCache-Manager incorporates proactive cache placement, session duplication,
49-
and cluster-level cache APIs - bridging gaps in current serving stacks where KVCache management and utilization is
48+
To implement this vision, the KVCache-Manager incorporates proactive cache placement, session duplication,
49+
and cluster-level cache APIs - bridging gaps in current serving stacks where KVCache management and utilization is
5050
not yet treated as a first-class concern.
5151

5252
## Architecture Overview
5353

54-
The code defines a [kvcache.Indexer](https://github.com/llm-d/llm-d-kv-cache-manager/tree/main/pkg/kv-cache/indexer.go) module that efficiently maintains a global view of KV-cache states and localities.
54+
The code defines a [kvcache.Indexer](https://github.com/llm-d/llm-d-kv-cache-manager/blob/main/pkg/kvcache/indexer.go) module that efficiently maintains a global view of KV-cache states and localities.
5555
In the current state of vLLM, the only available information on KV-cache availability is that of the offloaded tensors to KV-cache Engines via the Connector API.
5656

5757
The `kvcache.Indexer` module is a pluggable Go package designed for use by orchestrators to enable KV-cache-aware scheduling decisions.
5858

5959
```mermaid
60-
graph
60+
graph
6161
subgraph Cluster
6262
Router
6363
subgraph KVCacheManager[KV-cache Manager]
@@ -75,7 +75,7 @@ graph
7575
Router -->|"Score(prompt, ModelName, relevantPods)"| kvcache.Indexer
7676
kvcache.Indexer -->|"{Pod to Scores map}"| Router
7777
Router -->|Route| vLLMNode
78-
78+
7979
kvcache.Indexer -->|"FindLongestTokenizedPrefix(prompt, ModelName) -> tokens"| PrefixStore
8080
PrefixStore -->|"DigestPromptAsync"| PrefixStore
8181
kvcache.Indexer -->|"GetPodsForKeys(tokens) -> {KVBlock keys to Pods} availability map"| KVBlockToPodIndex
@@ -88,7 +88,7 @@ This overview greatly simplifies the actual architecture and combines steps acro
8888

8989

9090

91-
## Architecture
91+
## Architecture
9292

9393
For even more a detailed architecture, refer to the [architecture](https://github.com/llm-d/llm-d-kv-cache-manager/tree/main/docs/architecture.md) document.
9494

@@ -98,7 +98,7 @@ The architecture is designed to efficiently maintain a global view of KV-cache s
9898

9999
```mermaid
100100
sequenceDiagram
101-
participant U as User
101+
participant U as User
102102
participant KVI as kvcache.Indexer
103103
box
104104
participant KVBS as KVBlockScorer
@@ -130,7 +130,7 @@ KVI->>PS: 2. FindLongestTokenizedPrefix(prompt, ModelName)
130130
end
131131
PS->>KVI: 2.2 Tokens of longest prefix
132132
133-
# get block keys
133+
# get block keys
134134
KVI->>TPR: 3 GetBlockKeys(tokens, ModelName)
135135
TPR->>KVI: 3.1 BlockKeys
136136
@@ -207,8 +207,8 @@ Future enhancements will enable the `llm-d-kv-cache-manager` component to proces
207207

208208
## Examples
209209

210-
- [KV-cache Indexer](https://github.com/llm-d/llm-d-kv-cache-manager/tree/main/examples/kv-cache-index/):
210+
- [KV-cache Indexer](https://github.com/llm-d/llm-d-kv-cache-manager/tree/main/examples/kv-cache-index/):
211211
- A reference implementation of using the `kvcache.Indexer` module.
212-
- [KV-cache Aware Scorer](https://github.com/llm-d/llm-d-kv-cache-manager/tree/main/examples/kv-cache-aware-scorer/):
213-
- A reference implementation of integrating the `kvcache.Indexer` module in
212+
- [KV-cache Aware Scorer](https://github.com/llm-d/llm-d-kv-cache-manager/tree/main/examples/kv-cache-aware-scorer/):
213+
- A reference implementation of integrating the `kvcache.Indexer` module in
214214
[llm-d-inference-scheduler](https://github.com/llm-d/llm-d-inference-scheduler) in a KV-cache aware scorer.

docs/guide/guide.md

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -10,16 +10,15 @@ The user guide is organized in sections to help you get started with llm-d and t
1010

1111
llm-d is an open source project providing distributed inferencing for GenAI runtimes on any Kubernetes cluster. Its highly performant, scalable architecture helps reduce costs through a spectrum of hardware efficiency improvements. The project prioritizes ease of deployment+use as well as SRE needs + day 2 operations associated with running large GPU clusters.
1212

13-
[For more information check out the Architecture Documentation](/architecture/00_architecture.md)
13+
[For more information check out the Architecture Documentation](./architecture/architecture)
1414

1515
## Installation: Start here to minimize your frustration
1616

1717
This guide will walk you through the steps to install and deploy the llm-d quickstart demo on a Kubernetes cluster.
1818

19+
- [Prerequisites](./guide/Installation/prerequisites) Make sure your compute resources and system configuration are ready
20+
- [Quick Start](./guide/Installation/quickstart) If your resources are ready, "kick the tires" with our Quick Start!
21+
1922

20-
- [Prerequisites](./Installation/prerequisites.md) Make sure your compute resources and system configuration are ready
21-
- [Quick Start](./Installation/quickstart.md) If your resources are ready, "kick the tires" with our Quick Start!
22-
2323

2424

25-

docs/intro.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ sidebar_position: 1
66

77
## Fork and Clone the Repository
88

9-
Fork the repositorty of the site on our [GitHub](https://https://github.com/RedHatOfficial/static-website-template-for-ospo).
9+
Fork the repositorty of the site on our [GitHub](https://https://github.com/RedHatOfficial/static-website-template-for-ospo).
1010

1111
Clone your fork to your own computer then using this command and replacing the link with your own HTTPS clone link found underneath the **Code** button (see image below):
1212

@@ -35,12 +35,12 @@ Run the server itself with this command:
3535
npm start
3636
```
3737

38-
The `cd` command changes the directory you're working with.
38+
The `cd` command changes the directory you're working with.
3939

4040
The `npm start` command builds your website locally and serves it through a development server, ready for you to view at http://localhost:3000/.
4141

4242
Open `docs/intro.md` (this page) and edit some lines: the site **reloads automatically** and displays your changes.
4343

4444
## Using Docusaurus
4545

46-
Docusaurus is a static-site generator that convetakes Markdown files and donverts them into a documentation website. It is written in JavaScript however no prior knowledge of JavaScript is needed to edit the website.
46+
Docusaurus is a static-site generator that convetakes Markdown files and donverts them into a documentation website. It is written in JavaScript however no prior knowledge of JavaScript is needed to edit the website.

src/components/Install/index.js

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ export default function Install() {
2222
alt="1. "
2323
src={require('/docs/assets/counting-01.png').default}
2424
></img>
25-
<a className="link" href="docs/guide/installation/prerequisites#compute">
25+
<a className="link" href="docs/guide/Installation/prerequisites#compute">
2626
Check the Prerequisites
2727
</a>
2828
</h3>
@@ -34,7 +34,7 @@ export default function Install() {
3434
alt="2. "
3535
src={require('/docs/assets/counting-02.png').default}
3636
></img>
37-
<a className="link" href="docs/guide/installation/quickstart#install">
37+
<a className="link" href="docs/guide/Installation/quickstart#install">
3838
Run the Quickstart
3939
</a>
4040
</h3>
@@ -46,7 +46,7 @@ export default function Install() {
4646
alt="3. "
4747
src={require('/docs/assets/counting-03.png').default}
4848
></img>
49-
<a className="link" href="docs/guide/installation/quickstart#examples">Explore llm-d!</a></h3>
49+
<a className="link" href="docs/guide/Installation/quickstart#examples">Explore llm-d!</a></h3>
5050
{/* -------------------------------------------------------------------------- */}
5151
<a className="static-button install-button button-link" href="docs/guide">
5252
Complete install methods here

src/components/Welcome/index.js

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -14,16 +14,13 @@ export default function Welcome() {
1414
<h2 className="welcome-h2">
1515
llm-d: a Kubernetes-native high-performance distributed LLM inference framework
1616
</h2>
17-
17+
1818

1919
<div className="button-group">
2020
<a className="static-button button-link" href="docs/architecture/architecture">
2121
Architecture
2222
</a>
23-
<a
24-
className="static-button button-link"
25-
href="docs/guide/Installation/Prerequisites"
26-
>
23+
<a className="static-button button-link" href="docs/guide/Installation/prerequisites" >
2724
{/* Link to install page on the docs */}
2825
Installation
2926
</a>
@@ -35,13 +32,13 @@ export default function Welcome() {
3532

3633
<div className="hidden-for-mobile">
3734
<p>
38-
llm-d is a well-lit path for anyone to serve at scale,
39-
with the fastest time-to-value and competitive performance per dollar,
35+
llm-d is a well-lit path for anyone to serve at scale,
36+
with the fastest time-to-value and competitive performance per dollar,
4037
for most models across a diverse and comprehensive set of hardware accelerators.
4138
</p>
4239

4340
</div>
44-
41+
4542
</div>
4643
</div>
4744
);

0 commit comments

Comments
 (0)