-
Notifications
You must be signed in to change notification settings - Fork 852
feat: Add token usage to OpenAIPrompt and OpenAIEmbedding #2444
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add token usage to OpenAIPrompt and OpenAIEmbedding #2444
Conversation
|
/azp run |
|
Hey @ranadeepsingh 👋! We use semantic commit messages to streamline the release process. Examples of commit messages with semantic prefixes:
To test your commit locally, please follow our guild on building from source. |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #2444 +/- ##
==========================================
- Coverage 84.48% 82.43% -2.05%
==========================================
Files 334 335 +1
Lines 17591 17690 +99
Branches 1601 1619 +18
==========================================
- Hits 14862 14583 -279
- Misses 2729 3107 +378 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
cognitive/src/main/scala/com/microsoft/azure/synapse/ml/services/openai/OpenAIEmbedding.scala
Outdated
Show resolved
Hide resolved
cognitive/src/main/scala/com/microsoft/azure/synapse/ml/services/openai/OpenAIPrompt.scala
Outdated
Show resolved
Hide resolved
cognitive/src/main/scala/com/microsoft/azure/synapse/ml/services/openai/OpenAIEmbedding.scala
Show resolved
Hide resolved
…and output_tokens
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds token usage tracking functionality to OpenAIPrompt and OpenAIEmbedding transformers. When enabled, these components return structured output containing both the response and usage statistics (input/output tokens, total tokens, and detailed token breakdowns).
- Introduces a
returnUsageboolean parameter (default: false) to control whether usage statistics are included - Adds a new
HasReturnUsagetrait andReturnUsage.scalato standardize usage tracking across different OpenAI APIs - Updates response schemas to include optional usage fields for Chat Completions, Responses, and Embeddings APIs
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| ReturnUsage.scala | New trait and utilities for standardizing usage statistics across different OpenAI API types |
| OpenAIPrompt.scala | Integrates HasReturnUsage trait; modifies transform and transformSchema to conditionally return usage data |
| OpenAIEmbedding.scala | Integrates HasReturnUsage trait; overrides transform and transformSchema to support usage tracking |
| OpenAISchemas.scala | Updates response case classes to include optional usage fields and token details |
| OpenAIPromptSuite.scala | Adds tests for default, false, and true returnUsage parameter behaviors |
| OpenAIEmbeddingsSuite.scala | Adds tests for default, false, and true returnUsage parameter behaviors |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
cognitive/src/main/scala/com/microsoft/azure/synapse/ml/services/openai/ReturnUsage.scala
Outdated
Show resolved
Hide resolved
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Related Issues/PRs
#xxx
What changes are proposed in this pull request?
Extending usage of OpenAIPrompt and OpenAIEmbedding which are the core usage functions powering PySpark AI functions to have a new boolean
returnUsageparamater.This parameter is False by default returning just string or list response.
When it is true, it returns a strict with the following members:
Screenshot of output types and usage:
Works across all three different openai APIs
How is this patch tested?
Does this PR change any dependencies?
Does this PR add a new feature? If so, have you added samples on website?
website/docs/documentationfolder.Make sure you choose the correct class
estimators/transformersand namespace.DocTablepoints to correct API link.yarn run startto make sure the website renders correctly.<!--pytest-codeblocks:cont-->before each python code blocks to enable auto-tests for python samples.WebsiteSamplesTestsjob pass in the pipeline.