Skip to content

Feature Request: Network Topology Aware Scheduling Support in AIBrix #1730

@JesseStutler

Description

@JesseStutler

🚀 Feature Description and Motivation

In LLM inference scenarios, model parallelism and distributed inference have extremely high requirements for network throughput to exchange data, making networking a critical bottleneck. Modern datacenters have diverse network architectures (e.g., IB, RoCE, NVSwitch) with multiple levels of switches having different throughput and latency characteristics.

Network Topology Aware Scheduling allows workloads to be scheduled to the best performance domain with the highest throughput and lowest latency, accelerating data exchange for inference operations.

Therefore, I'm proposing adding Network Topology Aware Scheduling support to AIBrix by integrating with Volcano's network topology aware scheduling features: https://volcano.sh/en/docs/network_topology_aware_scheduling/. This will enable LLM inference workloads to be scheduled within optimal network performance domains, significantly improving inference performance.

Use Case

Consider a datacenter with 8 GPU nodes under a 3-tier switch hierarchy:
Image
S0 to S6 can be considered as switches. If the scheduler is unaware of the underlying network topology, in a 1P1D scenario, the worst-case scenario is that one is on node0 and the other on node7. This requires traversing multiple layers of switches, resulting in long data exchange links and a decrease in inference performance. The best case scenario is that 1P1D is scheduled to the leaf switch, e.g., under S0.

Proposed Solution

Volcano already supports network topology-aware scheduling, and the Kubeflow community also supports configuring training jobs with network topology-aware scheduling.

Therefore, I think StormService could also support network topology configuration, either by adding a custom field or by configuring network topology requirements in StormService annotations. Then, the StormService Controller could automatically create a Volcano PodGroup for the StormService and inherit the StormService's network topology constraints to the PodGroup, allowing the Volcano scheduler to schedule the PodGroup. This would enable Prefill and Decode pods to be scheduled to the more efficient network performance domains.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions