llm-d · jessicachitas · Jun 26, 2025 · Jun 25, 2025 · Jun 25, 2025 · Jun 26, 2025
diff --git a/docs/architecture/00_architecture.mdx b/docs/architecture/00_architecture.mdx
@@ -21,7 +21,7 @@ import VideoEmbed from '@site/src/components/VideoEmbed';
 
 
 ![llm-d Architecture](../assets/images/llm-d-arch-simplified.svg)
-    
+
 
 Key features of `llm-d` include:
 
@@ -54,12 +54,13 @@ See the guided experience with our [quickstart](https://github.com/llm-d/llm-d-d
 llm-d repo is a metaproject with subcomponents that can be cloned individually.
 
 To clone all the components:
+
+```bash
+git clone --recurse-submodules https://github.com/llm-d/llm-d.git
 ```
-    git clone --recurse-submodules https://github.com/llm-d/llm-d.git 
-``` 
 
 **Tip**
-As a customization example, see [here](https://github.com/llm-d/llm-d/tree/dev) a template for adding a scheduler scorer. 
+As a customization example, see [here](https://github.com/llm-d/llm-d/tree/dev) a template for adding a scheduler scorer.
 
 ## Releases
 

diff --git a/docs/architecture/Components/06_kv-cache.md b/docs/architecture/Components/06_kv-cache.md
@@ -6,24 +6,24 @@ sidecar_label: KV-Cache Manager
 
 ## Introduction
 
-LLM inference can be computationally expensive due to the sequential nature of token generation. 
-KV-caching plays a critical role in optimizing this process. By storing previously computed key and value attention vectors, 
-KV-cache reuse avoids redundant computations during inference, significantly reducing latency and resource consumption. 
-This is particularly beneficial for long context multi-turn conversations or Agentic (&RAG) applications where 
-previously computed information can be leveraged effectively. 
+LLM inference can be computationally expensive due to the sequential nature of token generation.
+KV-caching plays a critical role in optimizing this process. By storing previously computed key and value attention vectors,
+KV-cache reuse avoids redundant computations during inference, significantly reducing latency and resource consumption.
+This is particularly beneficial for long context multi-turn conversations or Agentic (&RAG) applications where
+previously computed information can be leveraged effectively.
 Efficient KV-cache management and routing are essential for scaling LLM inference and delivering a responsive user experience.
 
 llmd-kv-cache-manager is a pluggable KV-cache Manager for KV-cache Aware Routing in LLM serving platforms.
 
 This initial work will expand in capacity as development continues.
- 
+
  See the [docs folder in the repository](https://github.com/llm-d/llm-d-kv-cache-manager/blob/main/docs/README.md) for more information on goals, architecture and more.
 
 ## Goals
 
 The KV-Cache-Manager is designed to connect high-level serving-stack goals with concrete system capabilities through a layered objective structure:
 
-- **Improve user experience** 
+- **Improve user experience**
   - By reducing Time-To-First-Token (TTFT)
      - Enabled through higher KVCache hit rates and reduced tensor transfers
      - Supported by smart routing and distributed cache availability
@@ -38,26 +38,26 @@ The KV-Cache-Manager is designed to connect high-level serving-stack goals with
    - User session duplication/migration for true and seamless load balancing
 
 
-## Vision 
+## Vision
 
-This goal structure above is shaped by our vision for emerging use cases like RAG and agentic workflows, 
-which involve heavy context-reuse across sessions and instances. 
-Shared documents, tool prompts, and workflow steps create overlapping token streams that benefit significantly from 
-cross-instance KVCache coordination. 
+This goal structure above is shaped by our vision for emerging use cases like RAG and agentic workflows,
+which involve heavy context-reuse across sessions and instances.
+Shared documents, tool prompts, and workflow steps create overlapping token streams that benefit significantly from
+cross-instance KVCache coordination.
 
-To implement this vision, the KVCache-Manager incorporates proactive cache placement, session duplication, 
-and cluster-level cache APIs - bridging gaps in current serving stacks where KVCache management and utilization is 
+To implement this vision, the KVCache-Manager incorporates proactive cache placement, session duplication,
+and cluster-level cache APIs - bridging gaps in current serving stacks where KVCache management and utilization is
 not yet treated as a first-class concern.
 
 ## Architecture Overview
 
-The code defines a [kvcache.Indexer](https://github.com/llm-d/llm-d-kv-cache-manager/tree/main/pkg/kv-cache/indexer.go) module that efficiently maintains a global view of KV-cache states and localities. 
+The code defines a [kvcache.Indexer](https://github.com/llm-d/llm-d-kv-cache-manager/blob/main/pkg/kvcache/indexer.go) module that efficiently maintains a global view of KV-cache states and localities.
 In the current state of vLLM, the only available information on KV-cache availability is that of the offloaded tensors to KV-cache Engines via the Connector API.
 
 The `kvcache.Indexer` module is a pluggable Go package designed for use by orchestrators to enable KV-cache-aware scheduling decisions.
 
 ```mermaid
-graph 
+graph
   subgraph Cluster
     Router
     subgraph KVCacheManager[KV-cache Manager]
@@ -75,7 +75,7 @@ graph
   Router -->|"Score(prompt, ModelName, relevantPods)"| kvcache.Indexer
   kvcache.Indexer -->|"{Pod to Scores map}"| Router
   Router -->|Route| vLLMNode
-  
+
   kvcache.Indexer -->|"FindLongestTokenizedPrefix(prompt, ModelName) -> tokens"| PrefixStore
   PrefixStore -->|"DigestPromptAsync"| PrefixStore
   kvcache.Indexer -->|"GetPodsForKeys(tokens) -> {KVBlock keys to Pods} availability map"| KVBlockToPodIndex
@@ -88,7 +88,7 @@ This overview greatly simplifies the actual architecture and combines steps acro
 
 
 
-## Architecture 
+## Architecture
 
 For even more a detailed architecture, refer to the [architecture](https://github.com/llm-d/llm-d-kv-cache-manager/tree/main/docs/architecture.md) document.
 
@@ -98,7 +98,7 @@ The architecture is designed to efficiently maintain a global view of KV-cache s
 
 ```mermaid
 sequenceDiagram
-    participant U as User  
+    participant U as User
     participant KVI as kvcache.Indexer
     box
         participant KVBS as KVBlockScorer
@@ -130,7 +130,7 @@ KVI->>PS: 2. FindLongestTokenizedPrefix(prompt, ModelName)
     end
 PS->>KVI: 2.2 Tokens of longest prefix
 
-# get block keys  
+# get block keys
 KVI->>TPR: 3 GetBlockKeys(tokens, ModelName)
     TPR->>KVI: 3.1 BlockKeys
 
@@ -207,8 +207,8 @@ Future enhancements will enable the `llm-d-kv-cache-manager` component to proces
 
 ## Examples
 
-- [KV-cache Indexer](https://github.com/llm-d/llm-d-kv-cache-manager/tree/main/examples/kv-cache-index/): 
+- [KV-cache Indexer](https://github.com/llm-d/llm-d-kv-cache-manager/tree/main/examples/kv-cache-index/):
   - A reference implementation of using the `kvcache.Indexer` module.
-- [KV-cache Aware Scorer](https://github.com/llm-d/llm-d-kv-cache-manager/tree/main/examples/kv-cache-aware-scorer/): 
-  - A reference implementation of integrating the `kvcache.Indexer` module in 
+- [KV-cache Aware Scorer](https://github.com/llm-d/llm-d-kv-cache-manager/tree/main/examples/kv-cache-aware-scorer/):
+  - A reference implementation of integrating the `kvcache.Indexer` module in
   [llm-d-inference-scheduler](https://github.com/llm-d/llm-d-inference-scheduler) in a KV-cache aware scorer.
diff --git a/docs/guide/guide.md b/docs/guide/guide.md
@@ -10,16 +10,15 @@ The user guide is organized in sections to help you get started with llm-d and t
 
 llm-d is an open source project providing distributed inferencing for GenAI runtimes on any Kubernetes cluster. Its highly performant, scalable architecture helps reduce costs through a spectrum of hardware efficiency improvements. The project prioritizes ease of deployment+use as well as SRE needs + day 2 operations associated with running large GPU clusters.
 
-[For more information check out the Architecture Documentation](/architecture/00_architecture.md)
+[For more information check out the Architecture Documentation](./architecture/architecture)
 
 ## Installation: Start here to minimize your frustration
 
 This guide will walk you through the steps to install and deploy the llm-d quickstart demo on a Kubernetes cluster.
 
+ - [Prerequisites](./guide/Installation/prerequisites) Make sure your compute resources and system configuration are ready
+ - [Quick Start](./guide/Installation/quickstart) If your resources are ready, "kick the tires" with our Quick Start!
+
 
- - [Prerequisites](./Installation/prerequisites.md) Make sure your compute resources and system configuration are ready
- - [Quick Start](./Installation/quickstart.md) If your resources are ready, "kick the tires" with our Quick Start!
-
 
 
-
diff --git a/docs/intro.md b/docs/intro.md
@@ -6,7 +6,7 @@ sidebar_position: 1
 
 ## Fork and Clone the Repository
 
-Fork the repositorty of the site on our [GitHub](https://https://github.com/RedHatOfficial/static-website-template-for-ospo). 
+Fork the repositorty of the site on our [GitHub](https://https://github.com/RedHatOfficial/static-website-template-for-ospo).
 
 Clone your fork to your own computer then using this command and replacing the link with your own HTTPS clone link found underneath the **Code** button (see image below):
 
@@ -35,12 +35,12 @@ Run the server itself with this command:
 npm start
 ```
 
-The `cd` command changes the directory you're working with. 
+The `cd` command changes the directory you're working with.
 
 The `npm start` command builds your website locally and serves it through a development server, ready for you to view at http://localhost:3000/.
 
 Open `docs/intro.md` (this page) and edit some lines: the site **reloads automatically** and displays your changes.
 
 ## Using Docusaurus
 
-Docusaurus is a static-site generator that convetakes Markdown files and donverts them into a documentation website. It is written in JavaScript however no prior knowledge of JavaScript is needed to edit the website.
+Docusaurus is a static-site generator that convetakes Markdown files and donverts them into a documentation website. It is written in JavaScript however no prior knowledge of JavaScript is needed to edit the website.
diff --git a/src/components/Install/index.js b/src/components/Install/index.js
@@ -22,7 +22,7 @@ export default function Install() {
               alt="1. "
               src={require('/docs/assets/counting-01.png').default}
               ></img>
-            <a className="link" href="docs/guide/installation/prerequisites#compute">
+            <a className="link" href="docs/guide/Installation/prerequisites#compute">
               Check the Prerequisites
             </a>
           </h3>
@@ -34,7 +34,7 @@ export default function Install() {
               alt="2. "
               src={require('/docs/assets/counting-02.png').default}
               ></img>
-            <a className="link" href="docs/guide/installation/quickstart#install">
+            <a className="link" href="docs/guide/Installation/quickstart#install">
               Run the Quickstart
             </a>
           </h3>
@@ -46,7 +46,7 @@ export default function Install() {
               alt="3. "
               src={require('/docs/assets/counting-03.png').default}
               ></img>
-              <a className="link" href="docs/guide/installation/quickstart#examples">Explore llm-d!</a></h3>
+              <a className="link" href="docs/guide/Installation/quickstart#examples">Explore llm-d!</a></h3>
           {/* -------------------------------------------------------------------------- */}
           <a className="static-button install-button button-link" href="docs/guide">
             Complete install methods here

diff --git a/src/components/Welcome/index.js b/src/components/Welcome/index.js
@@ -14,16 +14,13 @@ export default function Welcome() {
         <h2 className="welcome-h2">
           llm-d: a Kubernetes-native high-performance distributed LLM inference framework
         </h2>
-        
+
 
         <div className="button-group">
           <a className="static-button button-link" href="docs/architecture/architecture">
             Architecture
           </a>
-          <a
-            className="static-button button-link"
-            href="docs/guide/Installation/Prerequisites"
-          >
+          <a className="static-button button-link" href="docs/guide/Installation/prerequisites" >
             {/* Link to install page on the docs */}
             Installation
           </a>
@@ -35,13 +32,13 @@ export default function Welcome() {
 
         <div className="hidden-for-mobile">
           <p>
-            llm-d is a well-lit path for anyone to serve at scale, 
-            with the fastest time-to-value and competitive performance per dollar, 
+            llm-d is a well-lit path for anyone to serve at scale,
+            with the fastest time-to-value and competitive performance per dollar,
             for most models across a diverse and comprehensive set of hardware accelerators.
           </p>
 
         </div>
-        
+
       </div>
     </div>
   );