[VLM] Offline scenario, performance-only mode of the reference implementation #2381

wangshangsam · 2025-10-28T15:59:56Z

This is the first PR towards the VLM reference implementation for the v6.0 round.
This PR currenlty supports the Offline scenario + performance-only mode. Server scenario and accuracy mode will be introduced through subsequent PRs.
The issue_query implemenation adopted the purely asyncio-based design from the DSR1 reference implementation, but the code here is simpler mostly because we only access the inference endpoint through OpenAI APIs.

…perf-inference into wangshangsam/vlm-sut-prototype

github-actions · 2025-10-28T16:00:06Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

…perf-inference into wangshangsam/vlm-sut-prototype

multimodal/vl2l/pyproject.toml

multimodal/vl2l/src/mlperf_inference_multimodal_vl2l/task.py

multimodal/vl2l/src/mlperf_inference_multimodal_vl2l/cli.py

johncalesp · 2025-11-05T20:54:27Z

multimodal/vl2l/src/mlperf_inference_multimodal_vl2l/cli.py

+
+    min_duration: Annotated[
+        timedelta,
+        Field(
+            description="The minimum testing duration.",
+        ),
+    ] = timedelta(seconds=5)
+


Can we change this to float? Currently is timedelta. If I try to enter
--settings.min_duration 60 I get:

│ Invalid value for _pydantic_settings_min_duration: Input should be a valid timedelta, "day" identifier in duration not correctly formatted │```

Hmmm this is not supposed to happen. The string format that this flag can take is defined in https://docs.pydantic.dev/2.0/usage/types/datetime/

timedelta fields will accept values of type:

str; the following formats are accepted:
[-][DD ][HH:MM]SS[.ffffff]
[±]P[DD]DT[HH]H[MM]M[SS]S (ISO 8601 format for timedelta)

I see, so the input will be something like: --settings.min_duration 00:01:00

00:01:00 can work but what I was trying to say is that it's supposed to be able to take in an single integer as well (in which case it would mean "seconds")

when, I entered 60 , it threw me the error

johncalesp · 2025-11-05T21:30:06Z

multimodal/vl2l/src/mlperf_inference_multimodal_vl2l/cli.py

+        settings.use_token_latencies = True
+        return settings


Here, when we use settings.use_token_latencies = True, the constrains become:

/// Token latency parameters uint64_t server_ttft_latency = 100000000; uint64_t server_tpot_latency = 100000000;

If we use false the constrain is:

/// \brief The latency constraint for the Server scenario. uint64_t server_target_latency_ns = 100000000;

We may need to add more flags to let the user choose the constrains and values

…the client, event loop and the event loop thread

…perf-inference into wangshangsam/vlm-sut-prototype

wangshangsam and others added 10 commits October 7, 2025 03:25

Initial commit.

81b993d

WIP

37f1f14

[Automated Commit] Format Codebase

d032513

misc

41c94c4

Merge branch 'wangshangsam/vlm-sut-prototype' of github.com:CentML/ml…

40e62bc

…perf-inference into wangshangsam/vlm-sut-prototype

adding pydantic_typer

a240d7c

offline WIP

990503c

[Automated Commit] Format Codebase

0bc8773

Merge branch 'master' into wangshangsam/vlm-sut-prototype

ed021c5

[Automated Commit] Format Codebase

7a7c1bc

wangshangsam and others added 4 commits October 28, 2025 12:46

rename the notebook

ab7eeee

Merge branch 'wangshangsam/vlm-sut-prototype' of github.com:CentML/ml…

e4f5a7e

…perf-inference into wangshangsam/vlm-sut-prototype

clean-up

754207e

[Automated Commit] Format Codebase

83ccca4

wangshangsam marked this pull request as ready for review November 4, 2025 08:31

wangshangsam requested a review from a team as a code owner November 4, 2025 08:31

wangshangsam changed the title ~~VLM reference implementation~~ [VLM] Offline scenario, performance-only mode for the reference implementation Nov 4, 2025

wangshangsam changed the title ~~[VLM] Offline scenario, performance-only mode for the reference implementation~~ [VLM] Offline scenario, performance-only mode of the reference implementation Nov 4, 2025

johncalesp reviewed Nov 4, 2025

View reviewed changes

multimodal/vl2l/pyproject.toml Outdated Show resolved Hide resolved

Downgrade from 3.13 to 3.12

36ba877

wangshangsam requested a review from johncalesp November 4, 2025 20:31

[Automated Commit] Format Codebase

a2176d2

johncalesp reviewed Nov 4, 2025

View reviewed changes

multimodal/vl2l/src/mlperf_inference_multimodal_vl2l/task.py Outdated Show resolved Hide resolved

johncalesp reviewed Nov 4, 2025

View reviewed changes

multimodal/vl2l/src/mlperf_inference_multimodal_vl2l/cli.py Show resolved Hide resolved

johncalesp reviewed Nov 5, 2025

View reviewed changes

multimodal/vl2l/src/mlperf_inference_multimodal_vl2l/cli.py Outdated Show resolved Hide resolved

johncalesp reviewed Nov 5, 2025

View reviewed changes

send the response back to LoadGen one at a time

126f945

johncalesp reviewed Nov 5, 2025

View reviewed changes

Move the ownership of the AsyncOpenAI client into Task, and clean up …

b2400a0

…the client, event loop and the event loop thread

wangshangsam and others added 5 commits November 5, 2025 17:29

Merge branch 'wangshangsam/vlm-sut-prototype' of github.com:CentML/ml…

cdd0a4a

…perf-inference into wangshangsam/vlm-sut-prototype

[Automated Commit] Format Codebase

5ac23a5

fixing typo

0ff5f13

Merge branch 'wangshangsam/vlm-sut-prototype' of github.com:CentML/ml…

272e31d

…perf-inference into wangshangsam/vlm-sut-prototype

[Automated Commit] Format Codebase

ecf95ed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[VLM] Offline scenario, performance-only mode of the reference implementation #2381

[VLM] Offline scenario, performance-only mode of the reference implementation #2381

Uh oh!

wangshangsam commented Oct 28, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Oct 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johncalesp Nov 5, 2025

Uh oh!

wangshangsam Nov 5, 2025

Uh oh!

johncalesp Nov 5, 2025

Uh oh!

wangshangsam Nov 5, 2025

Uh oh!

johncalesp Nov 5, 2025

Uh oh!

johncalesp Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[VLM] Offline scenario, performance-only mode of the reference implementation #2381

Are you sure you want to change the base?

[VLM] Offline scenario, performance-only mode of the reference implementation #2381

Uh oh!

Conversation

wangshangsam commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

johncalesp Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

wangshangsam Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

johncalesp Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

wangshangsam Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

johncalesp Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

johncalesp Nov 5, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

wangshangsam commented Oct 28, 2025 •

edited

Loading

github-actions bot commented Oct 28, 2025 •

edited

Loading