Skip to content

Conversation

@vincentkoc
Copy link
Member

@vincentkoc vincentkoc commented Oct 29, 2025

Caution

Typescript E2E tests are not enabled in Github Actions, so I manually verfied my changes (see below). A number of other E2E tests are still failing.

Caution

We should ensure parity with #3848 - lets ensure the changes in that PR match up, including how we handle prompts, test cases and also model support

Important

Avoided the use of a complex signature to test if a model supports image, for TypeScript SDK unlike python we can either use the model json (hard to map to vercel sdk providers), or we just do better error handling when images are not supported, i.e. warning, drop then retry. We should also copy the suffix test of -vl and -vision as well as a method to register a custom model support from the Python SDK.

Details

Following the changes to the FE/BE for multi-modal support. Bring the image processing support to LLM-as-a-judge to the SDK. This is a refactor (cherry-pick) from #3488 and based on the duplicate work done on the Python SDK #3848

Change checklist

  • User facing
  • Documentation update

Issues

OPIK-2512

Testing

Added live e2e test for image based evals AND relevant unit tests.

Documentation

Updated where relevant

Passing test results npx vitest run tests/integration/evaluation/evaluatePrompt.test.ts:
Screenshot 2025-10-28 at 19 18 35

…scaping

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
@vincentkoc vincentkoc marked this pull request as ready for review October 29, 2025 01:16
@vincentkoc vincentkoc requested a review from a team as a code owner October 29, 2025 01:16
Copilot AI review requested due to automatic review settings October 29, 2025 01:16
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds multimodal judge support to the TypeScript SDK, enabling the evaluation of prompts that contain both text and image content. The changes extend the existing evaluation framework to handle image inputs in LLM-as-a-judge scenarios, including support for vision-capable models and fallback rendering for non-vision models.

Key changes implemented:

  • Added multimodal message support with image URL content parts
  • Implemented vision capability detection for different model types
  • Added comprehensive test coverage for image-based evaluations

Reviewed Changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
sdks/typescript/tests/integration/evaluation/evaluatePrompt.test.ts Adds integration test for multimodal evaluation with image inputs using real image URLs and data URLs
sdks/typescript/tests/evaluation/evaluatePrompt.test.ts Updates unit test with multimodal message rendering test and enhances mock model to handle structured content
sdks/typescript/src/opik/rest_api/core/fetcher/stream-wrappers/*.ts Fixes TypeScript type casting for blob parts in stream handling
sdks/typescript/src/opik/evaluation/utils/renderMessageContent.ts New utility for rendering message content with image support and vision model detection
sdks/typescript/src/opik/evaluation/utils/formatMessages.ts Updates message formatting to handle structured content with image placeholders
sdks/typescript/src/opik/evaluation/models/modelCapabilities.ts New model capability detection system for vision support
sdks/typescript/src/opik/evaluation/models/index.ts Exports new multimodal types and capabilities
sdks/typescript/src/opik/evaluation/models/VercelAIChatModel.ts Adds message content conversion for multimodal support
sdks/typescript/src/opik/evaluation/models/OpikBaseModel.ts Extends message interfaces with multimodal content types
sdks/typescript/src/opik/evaluation/evaluatePrompt.ts Updates evaluation logic to use multimodal message rendering

@vincentkoc vincentkoc changed the title [NA][SDK] Add Multimodal Judge Support to Typescript SDK [OPIK-2512][SDK] Add Multimodal Judge Support to Typescript SDK Oct 30, 2025
@comet-ml comet-ml deleted a comment from github-actions bot Nov 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants