Skip to content

Commit a1cca7a

Browse files
committed
Added "goat tracker"
Added a simple GDPR compliant tracker so we can see hits and engagements on the main hosted sites. This is ideally a temporary solution, until we find a more robust system. Signed-off-by: JJ Asghar <[email protected]>
1 parent 3f483d5 commit a1cca7a

File tree

7 files changed

+136
-117
lines changed

7 files changed

+136
-117
lines changed

blog/2025-05-20_News.md

Lines changed: 48 additions & 45 deletions
Large diffs are not rendered by default.

blog/2025-05-20_announce.md

Lines changed: 42 additions & 40 deletions
Large diffs are not rendered by default.

blog/2025-06-03_week_1_round_up.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ slug: llm-d-week-1-round-up
55

66
authors:
77
- petecheslock
8-
8+
99
tags: [news]
1010

1111
hide_table_of_contents: false
@@ -54,4 +54,7 @@ We use Google Groups to share architecture diagrams and other content. Please jo
5454
* [LinkedIn](http://linkedin.com/company/llm-d)
5555
* [@\_llm\_d\_](https://twitter.com/_llm_d_)
5656
* [r/llm\_d](https://www.reddit.com/r/llm_d/)
57-
* YouTube - coming soon
57+
* YouTube - coming soon
58+
59+
<script data-goatcounter="https://llm-d-tracker.asgharlabs.io/count"
60+
async src="//llm-d-tracker.asgharlabs.io/count.js"></script>

blog/2025-06-25_community_update.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,3 +75,6 @@ There are many ways to contribute to llm-d:
7575
6. Check out our [Contributor Guidelines](https://llm-d.ai/docs/community/contribute) to start contributing code
7676

7777
We're looking forward to hearing from you and working together to make llm-d even better!
78+
79+
<script data-goatcounter="https://llm-d-tracker.asgharlabs.io/count"
80+
async src="//llm-d-tracker.asgharlabs.io/count.js"></script>

blog/2025-07-29_llm-d-v0.2-our-first-well-lit-paths.md

Lines changed: 33 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -27,8 +27,8 @@ Our deployments have been tested and benchmarked on recent GPUs, such as H200 no
2727

2828
We’ve defined and improved three well-lit paths that form the foundation of this release:
2929

30-
* [**Intelligent inference scheduling over any vLLM deployment**](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/inference-scheduling): support for precise prefix-cache aware routing with no additional infrastructure, out-of-the-box load-aware scheduling for better tail latency that “just works”, and a new configurable scheduling profile system enable teams to see immediate latency wins and still customize scheduling behavior for their workloads and infrastructure.
31-
* [**P/D disaggregation**:](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/pd-disaggregation) support for separating prefill and decode workloads to improve latency and GPU utilization for long-context scenarios.
30+
* [**Intelligent inference scheduling over any vLLM deployment**](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/inference-scheduling): support for precise prefix-cache aware routing with no additional infrastructure, out-of-the-box load-aware scheduling for better tail latency that “just works”, and a new configurable scheduling profile system enable teams to see immediate latency wins and still customize scheduling behavior for their workloads and infrastructure.
31+
* [**P/D disaggregation**:](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/pd-disaggregation) support for separating prefill and decode workloads to improve latency and GPU utilization for long-context scenarios.
3232
* [**Wide expert parallelism for DeepSeek R1 (EP/DP)**](https://github.com/llm-d-incubation/llm-d-infra/tree/main/quickstart/examples/wide-ep-lws): support for large-scale multi-node deployments using expert and data parallelism patterns for MoE models. This includes optimized deployments leveraging NIXL+UCX for inter-node communication, with fixes and improvements to reduce latency, and demonstrates the use of LeaderWorkerSet for Kubernetes-native inference orchestration.
3333

3434
All of these scenarios are reproducible: we provide reference hardware specs, workloads, and benchmarking harness support, so others can evaluate, reproduce, and extend these benchmarks easily. This also reflects improvements to our deployment tooling and benchmarking framework, a new "machinery" that allows users to set up, test, and analyze these scenarios consistently.
@@ -47,9 +47,9 @@ We've refactored the deployer into a Helm-first, modular structure, splitting ch
4747

4848
The path for Prefill/Decode (P/D) disaggregation and multi-node DP/EP MoE deployments is now more clearly defined and tested. This work integrates and optimizes key [vLLM 0.10.0](https://github.com/vllm-project/vllm/releases/tag/v0.10.0) kernel improvements, including DeepGEMM and CUTLASS for expert parallel compute, as well as PPLX and DeepEP kernels and intra- and inter-node communication fixes and optimizations and multi-node scenarios. We now include:
4949

50-
* Kubernetes-native deployment recipes now support API servers per DP rank for one-pod-per-rank placement, enhancing scalability and control
51-
* Helm charts are updated to support LeaderWorkerSet (LWS) for multi-node setups and direct one-pod-per-DP-rank deployments
52-
* Optimized intra-node communication by enabling DeepEP to use cuda\_ipc efficiently
50+
* Kubernetes-native deployment recipes now support API servers per DP rank for one-pod-per-rank placement, enhancing scalability and control
51+
* Helm charts are updated to support LeaderWorkerSet (LWS) for multi-node setups and direct one-pod-per-DP-rank deployments
52+
* Optimized intra-node communication by enabling DeepEP to use cuda\_ipc efficiently
5353
* Enhanced NIXL+UCX performance, with fixes and optimizations that significantly reduce inter-node communication overhead, particularly for long context workloads
5454

5555
These validated scenarios are backed by benchmark baselines and example deployments via our quickstarts, offering clearer guidance on what works well today. As part of the "well-lit path" we have also identified limitations including known edge cases around response sizes and failure modes where more work is required.
@@ -84,9 +84,9 @@ Multi-arch support, smaller images, and hardened configurations ensure a reliabl
8484

8585
Here are some key lessons we learned so far in our progress with llm-d:
8686

87-
* **Low-hanging fruit matters.** Targeted optimizations, like reducing KV‑cache transfer overhead between prefill and decode workers and refining prefix‑aware scheduling, delivered significant gains in throughput and tail latency. These quick wins required minimal change but paved the way for the deeper architectural improvements planned in upcoming releases.
88-
* **Using bleeding-edge libraries is hard.** Many key libraries associated with distributed inference are immature. Through our applied experiments in our well-lit paths and in close collaboration with ecosystem partners, we have improved much of the key infrastructure the larger community relies on in real-world conditions.
89-
* **Build on proven paths.** This validates why llm-d exists: to help users avoid discovering these problems themselves, offering reproducible deployments, performance baselines, and extensibility. llm-d focuses on building these paths so our users don’t need to troubleshoot these complex challenges in isolation.
87+
* **Low-hanging fruit matters.** Targeted optimizations, like reducing KV‑cache transfer overhead between prefill and decode workers and refining prefix‑aware scheduling, delivered significant gains in throughput and tail latency. These quick wins required minimal change but paved the way for the deeper architectural improvements planned in upcoming releases.
88+
* **Using bleeding-edge libraries is hard.** Many key libraries associated with distributed inference are immature. Through our applied experiments in our well-lit paths and in close collaboration with ecosystem partners, we have improved much of the key infrastructure the larger community relies on in real-world conditions.
89+
* **Build on proven paths.** This validates why llm-d exists: to help users avoid discovering these problems themselves, offering reproducible deployments, performance baselines, and extensibility. llm-d focuses on building these paths so our users don’t need to troubleshoot these complex challenges in isolation.
9090
* **Community matters.** Working closely with the NVIDIA Dynamo community, we've tackled NIXL/UCX performance overheads for long context workloads, leading to significant improvements and active upstream contributions.
9191

9292
### Our survey
@@ -99,10 +99,10 @@ Conversational AI (82.9%) and real-time applications (56.1%) stood out as the mo
9999

100100
Today, [llm-d 0.2](https://github.com/llm-d/llm-d/releases/tag/v0.2.0) offers:
101101

102-
* Modular Helm charts and clear deployment workflows.
103-
* Verified support for P/D, DP/EP, pod-per-rank, and heterogeneous GPUs (H200, B200).
104-
* Reproducible performance baselines, now with MoE support.
105-
* New foundations for routing and scheduler extensibility.
102+
* Modular Helm charts and clear deployment workflows.
103+
* Verified support for P/D, DP/EP, pod-per-rank, and heterogeneous GPUs (H200, B200).
104+
* Reproducible performance baselines, now with MoE support.
105+
* New foundations for routing and scheduler extensibility.
106106
* A developer, and researcher-friendly platform with tested examples, with detailed guides on the way.
107107

108108
## A growing community
@@ -111,31 +111,31 @@ The best part of llm-d has been watching the community grow around it. We're thr
111111

112112
Much of the work happens within our seven Special Interest Groups (SIGs), each focused on a key area:
113113

114-
* **Inference Scheduler** – Developing smarter routing and load‑balancing strategies, including KV‑cache‑aware scheduling.
115-
* **P/D Disaggregation** – Advancing phase‑separation strategies to improve resource‑utilization efficiency.
116-
* **KV Disaggregation** – Advancing and optimizing distributed KV‑cache management.
117-
* **Installation** – Streamlining deployment on Kubernetes, from single‑node setups to large multi‑node clusters.
118-
* **Benchmarking** – Building tools to automate performance validation and make scenarios easier to reproduce and extend.
119-
* **Autoscaling** – Adapting resources dynamically based on workload demands.
114+
* **Inference Scheduler** – Developing smarter routing and load‑balancing strategies, including KV‑cache‑aware scheduling.
115+
* **P/D Disaggregation** – Advancing phase‑separation strategies to improve resource‑utilization efficiency.
116+
* **KV Disaggregation** – Advancing and optimizing distributed KV‑cache management.
117+
* **Installation** – Streamlining deployment on Kubernetes, from single‑node setups to large multi‑node clusters.
118+
* **Benchmarking** – Building tools to automate performance validation and make scenarios easier to reproduce and extend.
119+
* **Autoscaling** – Adapting resources dynamically based on workload demands.
120120
* **Observability** – Providing deep visibility into system performance and health.
121121

122122
We're also collaborating with other great open-source communities like vLLM, Dynamo, and LMCache. Every one of these groups is open, and we’d love for you to join in. Whether you want to contribute code, share ideas, or just listen in, you are welcome. You can find details for each SIG, including their leaders and meeting times, on [our community page](https://llm-d.ai/docs/community/sigs).
123123

124-
## What's next:
124+
## What's next:
125125

126126
Looking ahead, our community is focusing on these key areas:
127127

128-
* **Core optimizations**
129-
* TCP-based request dispatch upstream
130-
* Disaggregation protocol refinements, including possible sidecar removal
131-
* CPU cache offloading to expand memory capacity
132-
* KV event awareness baked directly into routing decisions
133-
* SLO-driven scheduling architecture for predictable performance
134-
* **Benchmarking enhancements:**
135-
* Expanded reproducibility guides.
136-
* Complete performance validation for core scenarios.
137-
* **Developer experience improvements:**
138-
* Expanded examples for inference gateway and scheduler extensibility.
128+
* **Core optimizations**
129+
* TCP-based request dispatch upstream
130+
* Disaggregation protocol refinements, including possible sidecar removal
131+
* CPU cache offloading to expand memory capacity
132+
* KV event awareness baked directly into routing decisions
133+
* SLO-driven scheduling architecture for predictable performance
134+
* **Benchmarking enhancements:**
135+
* Expanded reproducibility guides.
136+
* Complete performance validation for core scenarios.
137+
* **Developer experience improvements:**
138+
* Expanded examples for inference gateway and scheduler extensibility.
139139
* Central Helm charts and expanded documentation.
140140

141141
See our [roadmap issue](https://github.com/llm-d/llm-d/issues/146) to see what is coming next and make your voice heard\!
@@ -149,3 +149,6 @@ Community engagement is key to our success:
149149
* [**Join our community calls**](https://red.ht/llm-d-public-calendar) (Wed 12:30pm ET)
150150

151151
Contribute on [GitHub](https://github.com/llm-d), join our community calls, join the SIGs and build with us\!
152+
153+
<script data-goatcounter="https://llm-d-tracker.asgharlabs.io/count"
154+
async src="//llm-d-tracker.asgharlabs.io/count.js"></script>

docs/community/contact_us.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,3 +20,5 @@ You can also find us on
2020
- [**LinkedIn:** https://linkedin.com/company/llm-d ](https://linkedin.com/company/llm-d)
2121
- [**X:** https://x.com/\_llm_d\_](https://x.com/_llm_d_)
2222

23+
<script data-goatcounter="https://llm-d-tracker.asgharlabs.io/count"
24+
async src="//llm-d-tracker.asgharlabs.io/count.js"></script>

src/components/Welcome/index.js

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -37,6 +37,9 @@ export default function Welcome() {
3737
for most models across a diverse and comprehensive set of hardware accelerators.
3838
</p>
3939

40+
<script data-goatcounter="https://llm-d-tracker.asgharlabs.io/count"
41+
async src="//llm-d-tracker.asgharlabs.io/count.js"></script>
42+
4043
</div>
4144

4245
</div>

0 commit comments

Comments
 (0)