- 
                Notifications
    
You must be signed in to change notification settings  - Fork 1.2k
 
[OPIK-2512][SDK] Add Multimodal Judge Support to Typescript SDK #3849
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…scaping Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds multimodal judge support to the TypeScript SDK, enabling the evaluation of prompts that contain both text and image content. The changes extend the existing evaluation framework to handle image inputs in LLM-as-a-judge scenarios, including support for vision-capable models and fallback rendering for non-vision models.
Key changes implemented:
- Added multimodal message support with image URL content parts
 - Implemented vision capability detection for different model types
 - Added comprehensive test coverage for image-based evaluations
 
Reviewed Changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description | 
|---|---|
| sdks/typescript/tests/integration/evaluation/evaluatePrompt.test.ts | Adds integration test for multimodal evaluation with image inputs using real image URLs and data URLs | 
| sdks/typescript/tests/evaluation/evaluatePrompt.test.ts | Updates unit test with multimodal message rendering test and enhances mock model to handle structured content | 
| sdks/typescript/src/opik/rest_api/core/fetcher/stream-wrappers/*.ts | Fixes TypeScript type casting for blob parts in stream handling | 
| sdks/typescript/src/opik/evaluation/utils/renderMessageContent.ts | New utility for rendering message content with image support and vision model detection | 
| sdks/typescript/src/opik/evaluation/utils/formatMessages.ts | Updates message formatting to handle structured content with image placeholders | 
| sdks/typescript/src/opik/evaluation/models/modelCapabilities.ts | New model capability detection system for vision support | 
| sdks/typescript/src/opik/evaluation/models/index.ts | Exports new multimodal types and capabilities | 
| sdks/typescript/src/opik/evaluation/models/VercelAIChatModel.ts | Adds message content conversion for multimodal support | 
| sdks/typescript/src/opik/evaluation/models/OpikBaseModel.ts | Extends message interfaces with multimodal content types | 
| sdks/typescript/src/opik/evaluation/evaluatePrompt.ts | Updates evaluation logic to use multimodal message rendering | 
        
          
                sdks/typescript/src/opik/evaluation/utils/renderMessageContent.ts
              
                Outdated
          
            Show resolved
            Hide resolved
        
              
          
                sdks/typescript/src/opik/evaluation/models/VercelAIChatModel.ts
              
                Outdated
          
            Show resolved
            Hide resolved
        
      Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
This reverts commit 229f448.
Caution
Typescript E2E tests are not enabled in Github Actions, so I manually verfied my changes (see below). A number of other E2E tests are still failing.
Caution
We should ensure parity with #3848 - lets ensure the changes in that PR match up, including how we handle prompts, test cases and also model support
Important
Avoided the use of a complex signature to test if a model supports image, for TypeScript SDK unlike python we can either use the model json (hard to map to vercel sdk providers), or we just do better error handling when images are not supported, i.e. warning, drop then retry. We should also copy the suffix test of
-vland-visionas well as a method to register a custom model support from the Python SDK.Details
Following the changes to the FE/BE for multi-modal support. Bring the image processing support to LLM-as-a-judge to the SDK. This is a refactor (cherry-pick) from #3488 and based on the duplicate work done on the Python SDK #3848
Change checklist
Issues
OPIK-2512
Testing
Added live e2e test for image based evals AND relevant unit tests.
Documentation
Updated where relevant
Passing test results

npx vitest run tests/integration/evaluation/evaluatePrompt.test.ts: