feat: Add token usage to OpenAIPrompt and OpenAIEmbedding #2444

ranadeepsingh · 2025-11-06T06:12:24Z

Related Issues/PRs

#xxx

What changes are proposed in this pull request?

Extending usage of OpenAIPrompt and OpenAIEmbedding which are the core usage functions powering PySpark AI functions to have a new boolean returnUsage paramater.
This parameter is False by default returning just string or list response.
When it is true, it returns a strict with the following members:

"response": string for OpenAIPrompt() or DenseVector for OpenAIEmbedding()
"usage": usage tokens as stuct with the following fields:

input_tokens: Long
output_tokens: Long
total_tokens: Long
input_token_details: Map - This field is dynamic and depends on the model. It can contain details like "cached_tokens"
output_token_details: Map - This field is dynamic and depends on the model. It can contain details like "reasoning_tokens"

Screenshot of output types and usage:
Works across all three different openai APIs

OpenAIPrompt

OpenAIEmbedding

How is this patch tested?

I have written tests (not required for typo or doc fix) and confirmed the proposed feature/bug-fix/change works.

Does this PR change any dependencies?

No. You can skip this section.
Yes. Make sure the dependencies are resolved correctly, and list changes here.

Does this PR add a new feature? If so, have you added samples on website?

No. You can skip this section.
Yes. Make sure you have added samples following below steps.

Find the corresponding markdown file for your new feature in website/docs/documentation folder.
Make sure you choose the correct class estimators/transformers and namespace.
Follow the pattern in markdown file and add another section for your new API, including pyspark, scala (and .NET potentially) samples.
Make sure the DocTable points to correct API link.
Navigate to website folder, and run yarn run start to make sure the website renders correctly.
Don't forget to add  before each python code blocks to enable auto-tests for python samples.
Make sure the WebsiteSamplesTests job pass in the pipeline.

ranadeepsingh · 2025-11-06T06:12:33Z

/azp run

github-actions · 2025-11-06T06:12:36Z

Hey @ranadeepsingh 👋!
Thank you so much for contributing to our repository 🙌.
Someone from SynapseML Team will be reviewing this pull request soon.

We use semantic commit messages to streamline the release process.
Before your pull request can be merged, you should make sure your first commit and PR title start with a semantic prefix.
This helps us to create release messages and credit you for your hard work!

Examples of commit messages with semantic prefixes:

fix: Fix LightGBM crashes with empty partitions
feat: Make HTTP on Spark back-offs configurable
docs: Update Spark Serving usage
build: Add codecov support
perf: improve LightGBM memory usage
refactor: make python code generation rely on classes
style: Remove nulls from CNTKModel
test: Add test coverage for CNTKModel

To test your commit locally, please follow our guild on building from source.
Check out the developer guide for additional guidance on testing your change.

azure-pipelines · 2025-11-06T06:12:45Z

Azure Pipelines successfully started running 1 pipeline(s).

codecov-commenter · 2025-11-06T06:27:40Z

Codecov Report

❌ Patch coverage is 94.91525% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 82.43%. Comparing base (df189c8) to head (f0a384b).
⚠️ Report is 1 commits behind head on master.

Files with missing lines	Patch %	Lines
...zure/synapse/ml/services/openai/OpenAIPrompt.scala	83.33%	5 Missing ⚠️
...azure/synapse/ml/services/openai/ReturnUsage.scala	98.38%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2444      +/-   ##
==========================================
- Coverage   84.48%   82.43%   -2.05%     
==========================================
  Files         334      335       +1     
  Lines       17591    17690      +99     
  Branches     1601     1619      +18     
==========================================
- Hits        14862    14583     -279     
- Misses       2729     3107     +378

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

ranadeepsingh · 2025-11-07T00:11:36Z

/azp run

azure-pipelines · 2025-11-07T00:11:48Z

Azure Pipelines successfully started running 1 pipeline(s).

ranadeepsingh · 2025-11-07T03:14:44Z

/azp run

azure-pipelines · 2025-11-07T03:14:56Z

Azure Pipelines successfully started running 1 pipeline(s).

cognitive/src/main/scala/com/microsoft/azure/synapse/ml/services/openai/OpenAIEmbedding.scala

cognitive/src/main/scala/com/microsoft/azure/synapse/ml/services/openai/OpenAIPrompt.scala

cognitive/src/main/scala/com/microsoft/azure/synapse/ml/services/openai/OpenAIEmbedding.scala

…and output_tokens

ranadeepsingh · 2025-11-13T13:23:58Z

/azp run

azure-pipelines · 2025-11-13T13:24:10Z

Azure Pipelines successfully started running 1 pipeline(s).

ranadeepsingh · 2025-11-13T14:40:13Z

/azp run

azure-pipelines · 2025-11-13T14:40:25Z

Azure Pipelines successfully started running 1 pipeline(s).

ranadeepsingh · 2025-11-13T22:03:42Z

/azp run

azure-pipelines · 2025-11-13T22:03:54Z

Azure Pipelines successfully started running 1 pipeline(s).

Copilot

Pull Request Overview

This PR adds token usage tracking functionality to OpenAIPrompt and OpenAIEmbedding transformers. When enabled, these components return structured output containing both the response and usage statistics (input/output tokens, total tokens, and detailed token breakdowns).

Introduces a returnUsage boolean parameter (default: false) to control whether usage statistics are included
Adds a new HasReturnUsage trait and ReturnUsage.scala to standardize usage tracking across different OpenAI APIs
Updates response schemas to include optional usage fields for Chat Completions, Responses, and Embeddings APIs

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
ReturnUsage.scala	New trait and utilities for standardizing usage statistics across different OpenAI API types
OpenAIPrompt.scala	Integrates `HasReturnUsage` trait; modifies transform and transformSchema to conditionally return usage data
OpenAIEmbedding.scala	Integrates `HasReturnUsage` trait; overrides transform and transformSchema to support usage tracking
OpenAISchemas.scala	Updates response case classes to include optional usage fields and token details
OpenAIPromptSuite.scala	Adds tests for default, false, and true `returnUsage` parameter behaviors
OpenAIEmbeddingsSuite.scala	Adds tests for default, false, and true `returnUsage` parameter behaviors

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

cognitive/src/main/scala/com/microsoft/azure/synapse/ml/services/openai/ReturnUsage.scala

ranadeepsingh · 2025-11-14T08:07:02Z

/azp run

azure-pipelines · 2025-11-14T08:07:13Z

Azure Pipelines successfully started running 1 pipeline(s).

Added usage stats with returnUsage

ac97dbc

mhamilton723 approved these changes Nov 6, 2025

View reviewed changes

ranadeepsingh added 2 commits November 6, 2025 16:10

Fix scala style errors

f454a40

Add scala tests

c12297c

Update with returning struct instead

372cfdc

smamindl assigned smamindl and unassigned smamindl Nov 7, 2025

smamindl previously approved these changes Nov 7, 2025

View reviewed changes

levscaut reviewed Nov 11, 2025

View reviewed changes

cognitive/src/main/scala/com/microsoft/azure/synapse/ml/services/openai/OpenAIEmbedding.scala Outdated Show resolved Hide resolved

BrendanWalsh reviewed Nov 11, 2025

View reviewed changes

cognitive/src/main/scala/com/microsoft/azure/synapse/ml/services/openai/OpenAIPrompt.scala Outdated Show resolved Hide resolved

cognitive/src/main/scala/com/microsoft/azure/synapse/ml/services/openai/OpenAIEmbedding.scala Show resolved Hide resolved

Make returnUsage as trait and streamline usage names to input_tokens …

fea19cc

…and output_tokens

ranadeepsingh dismissed smamindl’s stale review via fea19cc November 13, 2025 13:16

ranadeepsingh added 2 commits November 13, 2025 06:30

Working with udfs

bc5eb19

Centeralize usageMappingFor function

d558b4e

ranadeepsingh added 2 commits November 13, 2025 14:00

Fix notebook black formatting

d2cc901

Rename HasReturnUsage.scala to ReturnUsage.scala

a21add0

ranadeepsingh requested a review from Copilot November 14, 2025 08:00

Copilot started reviewing on behalf of ranadeepsingh November 14, 2025 08:01 View session

Copilot finished reviewing on behalf of ranadeepsingh November 14, 2025 08:01

Copilot AI reviewed Nov 14, 2025

View reviewed changes

cognitive/src/main/scala/com/microsoft/azure/synapse/ml/services/openai/ReturnUsage.scala Outdated Show resolved Hide resolved

Add back null

f0a384b

BrendanWalsh merged commit 38b53c6 into microsoft:master Nov 14, 2025
9 of 69 checks passed

feat: Add token usage to OpenAIPrompt and OpenAIEmbedding #2444

feat: Add token usage to OpenAIPrompt and OpenAIEmbedding #2444

Uh oh!

Conversation

ranadeepsingh commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Related Issues/PRs

What changes are proposed in this pull request?

How is this patch tested?

Does this PR change any dependencies?

Does this PR add a new feature? If so, have you added samples on website?

Uh oh!

ranadeepsingh commented Nov 6, 2025

Uh oh!

github-actions bot commented Nov 6, 2025

Uh oh!

azure-pipelines bot commented Nov 6, 2025

Uh oh!

codecov-commenter commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

ranadeepsingh commented Nov 7, 2025

Uh oh!

azure-pipelines bot commented Nov 7, 2025

Uh oh!

ranadeepsingh commented Nov 7, 2025

Uh oh!

azure-pipelines bot commented Nov 7, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ranadeepsingh commented Nov 13, 2025

Uh oh!

azure-pipelines bot commented Nov 13, 2025

Uh oh!

ranadeepsingh commented Nov 13, 2025

Uh oh!

azure-pipelines bot commented Nov 13, 2025

Uh oh!

ranadeepsingh commented Nov 13, 2025

Uh oh!

azure-pipelines bot commented Nov 13, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

ranadeepsingh commented Nov 14, 2025

Uh oh!

azure-pipelines bot commented Nov 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

ranadeepsingh commented Nov 6, 2025 •

edited

Loading

codecov-commenter commented Nov 6, 2025 •

edited

Loading