llm-d · jjasghar · Jul 30, 2025
diff --git a/blog/2025-05-20_News.md b/blog/2025-05-20_News.md
diff --git a/blog/2025-05-20_announce.md b/blog/2025-05-20_announce.md
diff --git a/blog/2025-06-03_week_1_round_up.md b/blog/2025-06-03_week_1_round_up.md
@@ -5,7 +5,7 @@ slug: llm-d-week-1-round-up
 
 authors:
   - petecheslock
- 
+
 tags: [news]
 
 hide_table_of_contents: false
@@ -54,4 +54,7 @@ We use Google Groups to share architecture diagrams and other content. Please jo
 * [LinkedIn](http://linkedin.com/company/llm-d)
 * [@\_llm\_d\_](https://twitter.com/_llm_d_)
 * [r/llm\_d](https://www.reddit.com/r/llm_d/)
-* YouTube - coming soon
+* YouTube - coming soon
+
+<script data-goatcounter="https://llm-d-tracker.asgharlabs.io/count"
+        async src="//llm-d-tracker.asgharlabs.io/count.js"></script>
diff --git a/blog/2025-06-25_community_update.md b/blog/2025-06-25_community_update.md
@@ -75,3 +75,6 @@ There are many ways to contribute to llm-d:
 6. Check out our [Contributor Guidelines](https://llm-d.ai/docs/community/contribute) to start contributing code
 
 We're looking forward to hearing from you and working together to make llm-d even better!
+
+<script data-goatcounter="https://llm-d-tracker.asgharlabs.io/count"
+        async src="//llm-d-tracker.asgharlabs.io/count.js"></script>
diff --git a/blog/2025-07-29_llm-d-v0.2-our-first-well-lit-paths.md b/blog/2025-07-29_llm-d-v0.2-our-first-well-lit-paths.md
@@ -27,8 +27,8 @@ Our deployments have been tested and benchmarked on recent GPUs, such as H200 no
 
 We’ve defined and improved three well-lit paths that form the foundation of this release:
 
-* [**Intelligent inference scheduling over any vLLM deployment**](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/inference-scheduling): support for precise prefix-cache aware routing with no additional infrastructure, out-of-the-box load-aware scheduling for better tail latency that “just works”, and a new configurable scheduling profile system enable teams to see immediate latency wins and still customize scheduling behavior for their workloads and infrastructure.  
-* [**P/D disaggregation**:](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/pd-disaggregation) support for separating prefill and decode workloads to improve latency and GPU utilization for long-context scenarios.  
+* [**Intelligent inference scheduling over any vLLM deployment**](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/inference-scheduling): support for precise prefix-cache aware routing with no additional infrastructure, out-of-the-box load-aware scheduling for better tail latency that “just works”, and a new configurable scheduling profile system enable teams to see immediate latency wins and still customize scheduling behavior for their workloads and infrastructure.
+* [**P/D disaggregation**:](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/pd-disaggregation) support for separating prefill and decode workloads to improve latency and GPU utilization for long-context scenarios.
 * [**Wide expert parallelism for DeepSeek R1 (EP/DP)**](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/wide-ep-lws): support for large-scale multi-node deployments using expert and data parallelism patterns for MoE models. This includes optimized deployments leveraging NIXL+UCX for inter-node communication, with fixes and improvements to reduce latency, and demonstrates the use of LeaderWorkerSet for Kubernetes-native inference orchestration.
 
 All of these scenarios are reproducible: we provide reference hardware specs, workloads, and benchmarking harness support, so others can evaluate, reproduce, and extend these benchmarks easily. This also reflects improvements to our deployment tooling and benchmarking framework, a new "machinery" that allows users to set up, test, and analyze these scenarios consistently.
@@ -47,9 +47,9 @@ We've refactored the deployer into a Helm-first, modular structure, splitting ch
 
 The path for Prefill/Decode (P/D) disaggregation and multi-node DP/EP MoE deployments is now more clearly defined and tested. This work integrates and optimizes key [vLLM 0.10.0](https://github.com/vllm-project/vllm/releases/tag/v0.10.0) kernel improvements, including DeepGEMM and CUTLASS for expert parallel compute, as well as PPLX and DeepEP kernels and intra- and inter-node communication fixes and optimizations and multi-node scenarios. We now include:
 
-* Kubernetes-native deployment recipes now support API servers per DP rank for one-pod-per-rank placement, enhancing scalability and control  
-* Helm charts are updated to support LeaderWorkerSet (LWS) for multi-node setups and direct one-pod-per-DP-rank deployments  
-* Optimized intra-node communication by enabling DeepEP to use cuda\_ipc efficiently  
+* Kubernetes-native deployment recipes now support API servers per DP rank for one-pod-per-rank placement, enhancing scalability and control
+* Helm charts are updated to support LeaderWorkerSet (LWS) for multi-node setups and direct one-pod-per-DP-rank deployments
+* Optimized intra-node communication by enabling DeepEP to use cuda\_ipc efficiently
 * Enhanced NIXL+UCX performance, with fixes and optimizations that significantly reduce inter-node communication overhead, particularly for long context workloads
 
 These validated scenarios are backed by benchmark baselines and example deployments via our quickstarts, offering clearer guidance on what works well today. As part of the "well-lit path" we have also identified limitations including known edge cases around response sizes and failure modes where more work is required.
@@ -84,9 +84,9 @@ Multi-arch support, smaller images, and hardened configurations ensure a reliabl
 
 Here are some key lessons we learned so far in our progress with llm-d:
 
-* **Low-hanging fruit matters.** Targeted optimizations, like reducing KV‑cache transfer overhead between prefill and decode workers and refining prefix‑aware scheduling, delivered significant gains in throughput and tail latency. These quick wins required minimal change but paved the way for the deeper architectural improvements planned in upcoming releases.  
-* **Using bleeding-edge libraries is hard.** Many key libraries associated with distributed inference are immature. Through our applied experiments in our well-lit paths and in close collaboration with ecosystem partners, we have improved much of the key infrastructure the larger community relies on in real-world conditions.  
-* **Build on proven paths.** This validates why llm-d exists: to help users avoid discovering these problems themselves, offering reproducible deployments, performance baselines, and extensibility. llm-d focuses on building these paths so our users don’t need to troubleshoot these complex challenges in isolation.  
+* **Low-hanging fruit matters.** Targeted optimizations, like reducing KV‑cache transfer overhead between prefill and decode workers and refining prefix‑aware scheduling, delivered significant gains in throughput and tail latency. These quick wins required minimal change but paved the way for the deeper architectural improvements planned in upcoming releases.
+* **Using bleeding-edge libraries is hard.** Many key libraries associated with distributed inference are immature. Through our applied experiments in our well-lit paths and in close collaboration with ecosystem partners, we have improved much of the key infrastructure the larger community relies on in real-world conditions.
+* **Build on proven paths.** This validates why llm-d exists: to help users avoid discovering these problems themselves, offering reproducible deployments, performance baselines, and extensibility. llm-d focuses on building these paths so our users don’t need to troubleshoot these complex challenges in isolation.
 * **Community matters.** Working closely with the NVIDIA Dynamo community, we've tackled NIXL/UCX performance overheads for long context workloads, leading to significant improvements and active upstream contributions.
 
 ### Our survey
@@ -99,10 +99,10 @@ Conversational AI (82.9%) and real-time applications (56.1%) stood out as the mo
 
 Today, [llm-d 0.2](https://github.com/llm-d/llm-d/releases/tag/v0.2.0) offers:
 
-* Modular Helm charts and clear deployment workflows.  
-* Verified support for P/D, DP/EP, pod-per-rank, and heterogeneous GPUs (H200, B200).  
-* Reproducible performance baselines, now with MoE support.  
-* New foundations for routing and scheduler extensibility.  
+* Modular Helm charts and clear deployment workflows.
+* Verified support for P/D, DP/EP, pod-per-rank, and heterogeneous GPUs (H200, B200).
+* Reproducible performance baselines, now with MoE support.
+* New foundations for routing and scheduler extensibility.
 * A developer, and researcher-friendly platform with tested examples, with detailed guides on the way.
 
 ## A growing community
@@ -111,31 +111,31 @@ The best part of llm-d has been watching the community grow around it. We're thr
 
 Much of the work happens within our seven Special Interest Groups (SIGs), each focused on a key area:
 
-* **Inference Scheduler** – Developing smarter routing and load‑balancing strategies, including KV‑cache‑aware scheduling.  
-* **P/D Disaggregation** – Advancing phase‑separation strategies to improve resource‑utilization efficiency.  
-* **KV Disaggregation** – Advancing and optimizing distributed KV‑cache management.  
-* **Installation** – Streamlining deployment on Kubernetes, from single‑node setups to large multi‑node clusters.  
-* **Benchmarking** – Building tools to automate performance validation and make scenarios easier to reproduce and extend.  
-* **Autoscaling** – Adapting resources dynamically based on workload demands.  
+* **Inference Scheduler** – Developing smarter routing and load‑balancing strategies, including KV‑cache‑aware scheduling.
+* **P/D Disaggregation** – Advancing phase‑separation strategies to improve resource‑utilization efficiency.
+* **KV Disaggregation** – Advancing and optimizing distributed KV‑cache management.
+* **Installation** – Streamlining deployment on Kubernetes, from single‑node setups to large multi‑node clusters.
+* **Benchmarking** – Building tools to automate performance validation and make scenarios easier to reproduce and extend.
+* **Autoscaling** – Adapting resources dynamically based on workload demands.
 * **Observability** – Providing deep visibility into system performance and health.
 
 We're also collaborating with other great open-source communities like vLLM, Dynamo, and LMCache. Every one of these groups is open, and we’d love for you to join in. Whether you want to contribute code, share ideas, or just listen in, you are welcome. You can find details for each SIG, including their leaders and meeting times, on [our community page](https://llm-d.ai/docs/community/sigs).
 
-## What's next: 
+## What's next:
 
 Looking ahead, our community is focusing on these key areas:
 
-* **Core optimizations**  
-  * TCP-based request dispatch upstream  
-  * Disaggregation protocol refinements, including possible sidecar removal  
-  * CPU cache offloading to expand memory capacity  
-  * KV event awareness baked directly into routing decisions  
-  * SLO-driven scheduling architecture for predictable performance  
-* **Benchmarking enhancements:**  
-  * Expanded reproducibility guides.  
-  * Complete performance validation for core scenarios.  
-* **Developer experience improvements:**  
-  * Expanded examples for inference gateway and scheduler extensibility.  
+* **Core optimizations**
+  * TCP-based request dispatch upstream
+  * Disaggregation protocol refinements, including possible sidecar removal
+  * CPU cache offloading to expand memory capacity
+  * KV event awareness baked directly into routing decisions
+  * SLO-driven scheduling architecture for predictable performance
+* **Benchmarking enhancements:**
+  * Expanded reproducibility guides.
+  * Complete performance validation for core scenarios.
+* **Developer experience improvements:**
+  * Expanded examples for inference gateway and scheduler extensibility.
   * Central Helm charts and expanded documentation.
 
 See our [roadmap issue](https://github.com/llm-d/llm-d/issues/146) to see what is coming next and make your voice heard\!
@@ -149,3 +149,6 @@ Community engagement is key to our success:
 * [**Join our community calls**](https://red.ht/llm-d-public-calendar) (Wed 12:30pm ET)
 
 Contribute on [GitHub](https://github.com/llm-d), join our community calls, join the SIGs and build with us\!
+
+<script data-goatcounter="https://llm-d-tracker.asgharlabs.io/count"
+        async src="//llm-d-tracker.asgharlabs.io/count.js"></script>
diff --git a/docs/community/contact_us.md b/docs/community/contact_us.md
@@ -20,3 +20,5 @@ You can also find us on
 - [**LinkedIn:** https://linkedin.com/company/llm-d ](https://linkedin.com/company/llm-d)
 - [**X:** https://x.com/\_llm_d\_](https://x.com/_llm_d_)
 
+<script data-goatcounter="https://llm-d-tracker.asgharlabs.io/count"
+        async src="//llm-d-tracker.asgharlabs.io/count.js"></script>
diff --git a/src/components/Welcome/index.js b/src/components/Welcome/index.js
@@ -37,6 +37,9 @@ export default function Welcome() {
             for most models across a diverse and comprehensive set of hardware accelerators.
           </p>
 
+				  <script data-goatcounter="https://llm-d-tracker.asgharlabs.io/count"
+		        async src="//llm-d-tracker.asgharlabs.io/count.js"></script>
+
         </div>
 
       </div>
-Original file line number
+Diff line change
@@ Expand Up / @@ -37,6 +37,9 @@ export default function Welcome() { @@
                 for most models across a diverse and comprehensive set of hardware accelerators.
               </p>
+    				  <script data-goatcounter="https://llm-d-tracker.asgharlabs.io/count"
+    		        async src="//llm-d-tracker.asgharlabs.io/count.js"></script>
             </div>
           </div>
@@ Expand Down @@