[OPIK-2512][SDK] Add Multimodal Judge Support to Typescript SDK #3849

vincentkoc · 2025-10-29T00:34:00Z

Caution

Typescript E2E tests are not enabled in Github Actions, so I manually verfied my changes (see below). A number of other E2E tests are still failing.

Caution

We should ensure parity with #3848 - lets ensure the changes in that PR match up, including how we handle prompts, test cases and also model support

Important

Avoided the use of a complex signature to test if a model supports image, for TypeScript SDK unlike python we can either use the model json (hard to map to vercel sdk providers), or we just do better error handling when images are not supported, i.e. warning, drop then retry. We should also copy the suffix test of -vl and -vision as well as a method to register a custom model support from the Python SDK.

Details

Following the changes to the FE/BE for multi-modal support. Bring the image processing support to LLM-as-a-judge to the SDK. This is a refactor (cherry-pick) from #3488 and based on the duplicate work done on the Python SDK #3848

Change checklist

User facing
Documentation update

Issues

OPIK-2512

Testing

Added live e2e test for image based evals AND relevant unit tests.

Documentation

Updated where relevant

Passing test results npx vitest run tests/integration/evaluation/evaluatePrompt.test.ts:

sdks/typescript/src/opik/evaluation/utils/renderMessageContent.ts

…scaping Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

Copilot

Pull Request Overview

This PR adds multimodal judge support to the TypeScript SDK, enabling the evaluation of prompts that contain both text and image content. The changes extend the existing evaluation framework to handle image inputs in LLM-as-a-judge scenarios, including support for vision-capable models and fallback rendering for non-vision models.

Key changes implemented:

Added multimodal message support with image URL content parts
Implemented vision capability detection for different model types
Added comprehensive test coverage for image-based evaluations

Reviewed Changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
sdks/typescript/tests/integration/evaluation/evaluatePrompt.test.ts	Adds integration test for multimodal evaluation with image inputs using real image URLs and data URLs
sdks/typescript/tests/evaluation/evaluatePrompt.test.ts	Updates unit test with multimodal message rendering test and enhances mock model to handle structured content
sdks/typescript/src/opik/rest_api/core/fetcher/stream-wrappers/*.ts	Fixes TypeScript type casting for blob parts in stream handling
sdks/typescript/src/opik/evaluation/utils/renderMessageContent.ts	New utility for rendering message content with image support and vision model detection
sdks/typescript/src/opik/evaluation/utils/formatMessages.ts	Updates message formatting to handle structured content with image placeholders
sdks/typescript/src/opik/evaluation/models/modelCapabilities.ts	New model capability detection system for vision support
sdks/typescript/src/opik/evaluation/models/index.ts	Exports new multimodal types and capabilities
sdks/typescript/src/opik/evaluation/models/VercelAIChatModel.ts	Adds message content conversion for multimodal support
sdks/typescript/src/opik/evaluation/models/OpikBaseModel.ts	Extends message interfaces with multimodal content types
sdks/typescript/src/opik/evaluation/evaluatePrompt.ts	Updates evaluation logic to use multimodal message rendering

sdks/typescript/src/opik/evaluation/utils/renderMessageContent.ts

sdks/typescript/src/opik/evaluation/models/VercelAIChatModel.ts

Co-authored-by: Copilot <[email protected]>

This reverts commit 229f448.

vincentkoc added 7 commits October 28, 2025 17:29

chore: model capabilities

c0bf1d6

Create renderMessageContent.ts

9ce1c32

Update formatMessages.ts

5e31732

Update OpikBaseModel.ts

c6778ab

Update evaluatePrompt.ts

29cee09

Update evaluatePrompt.test.ts

e28769a

Update evaluatePrompt.test.ts

4154feb

github-actions bot assigned vincentkoc Oct 29, 2025

chore: patched tests and functions

258fc79

github-advanced-security bot found potential problems Oct 29, 2025

View reviewed changes

sdks/typescript/src/opik/evaluation/utils/renderMessageContent.ts Fixed Show fixed Hide fixed

Potential fix for code scanning alert no. 263: Double escaping or une…

4818898

…scaping Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>

vincentkoc marked this pull request as ready for review October 29, 2025 01:16

vincentkoc requested a review from a team as a code owner October 29, 2025 01:16

Copilot AI review requested due to automatic review settings October 29, 2025 01:16

Copilot AI reviewed Oct 29, 2025

View reviewed changes

sdks/typescript/src/opik/evaluation/utils/renderMessageContent.ts Outdated Show resolved Hide resolved

sdks/typescript/src/opik/evaluation/models/VercelAIChatModel.ts Outdated Show resolved Hide resolved

vincentkoc and others added 10 commits October 28, 2025 18:18

Update sdks/typescript/src/opik/evaluation/utils/renderMessageContent.ts

0bc426e

Co-authored-by: Copilot <[email protected]>

Update sdks/typescript/src/opik/evaluation/models/VercelAIChatModel.ts

bb973e8

Co-authored-by: Copilot <[email protected]>

Delete sdks/typescript/pnpm-lock.yaml

964580f

Update evaluatePrompt.test.ts

94d1b70

Create evaluateImage.test.ts

d58e9eb

chore: fixed image tests

89f2e17

Update evaluateImage.test.ts

6789456

feat: add model capabilities to Typescript SDK

229f448

Update evaluateImage.test.ts

90a01e3

Revert "feat: add model capabilities to Typescript SDK"

e694683

This reverts commit 229f448.

vincentkoc changed the title ~~[NA][SDK] Add Multimodal Judge Support to Typescript SDK~~ [OPIK-2512][SDK] Add Multimodal Judge Support to Typescript SDK Oct 30, 2025

comet-ml deleted a comment from github-actions bot Nov 3, 2025

Merge branch 'main' into vincentkoc/ts-image

1d1006b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[OPIK-2512][SDK] Add Multimodal Judge Support to Typescript SDK #3849

[OPIK-2512][SDK] Add Multimodal Judge Support to Typescript SDK #3849

Uh oh!

vincentkoc commented Oct 29, 2025 •

edited

Loading

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[OPIK-2512][SDK] Add Multimodal Judge Support to Typescript SDK #3849

Are you sure you want to change the base?

[OPIK-2512][SDK] Add Multimodal Judge Support to Typescript SDK #3849

Uh oh!

Conversation

vincentkoc commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Details

Change checklist

Issues

Testing

Documentation

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vincentkoc commented Oct 29, 2025 •

edited

Loading