-
Notifications
You must be signed in to change notification settings - Fork 325
Fix github issue 2441 selected call #2451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: aaron <[email protected]>
|
Cursor Agent can help with this pull request. Just |
|
The latest updates on your projects. Learn more about Vercel for GitHub. 1 Skipped Deployment
|
|
🌿 Preview your docs: https://boundary-preview-196934ba-c8b2-4a28-bf01-3be0cd104a2d.docs.buildwithfern.com |
|
@cursoragent you should choose the last call in the stack using the timestamp, unless it failed then try to look for the one that passed. The http_request_id is ULID with a timestamp. see define_id!(..) |
|
I refined the LLM call selection logic in
Learn more about Cursor Agents |
Co-authored-by: aaron <[email protected]>
|
🌿 Preview your docs: https://boundary-preview-b5d000ae-3f8e-4b64-92bb-f6d93ed6254c.docs.buildwithfern.com |
…order (#2692) ### Issue Reference This PR fixes #2451 ### Summary Updated LLM call selection logic to correctly pick the earliest successful call by lexicographic `HttpRequestId` ordering. This ensures deterministic selection when multiple successful calls exist. ### Changes This PR addresses an issue where the code prioritized completion time instead of lexicographically earliest ulid uuid. 1. Filter to keep only candidates where `is_success = true` 2. Sort these successful candidates by http request id, ulid uuid, lexicographically) 3. Choose the first candidate from this sorted list ### Details - Filters candidates to only successful calls - Sorts by `request_id.to_string()` lexicographically (since `HttpRequestId` lacks `Ord`) - Picks the first entry in the sorted list - Verified via `cargo test -p baml-runtime -q` <!-- CURSOR_SUMMARY --> --- > [!NOTE] > Refactors LLM call selection to pick the earliest successful call by lexicographic HttpRequestId and marks only that call as selected, with new tests covering success-vs-failure and ordering. > > - **Tracing/FunctionLog (storage.rs)** > - Refactor LLM call assembly: introduce `CallCandidate` to gather data for each `request_id`. > - Selection logic: filter successful candidates, sort by `request_id.to_string()` lexicographically, select the first; mark only that call as `selected` across `Basic` and `Stream` kinds. > - **Tests** > - Add `test_selected_call_prefers_success_over_failure` and `test_selected_call_chooses_earlier_success_if_last_failed` to verify selection behavior. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit 9a3143e. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: Cursor Agent <[email protected]> Co-authored-by: aaron <[email protected]>
Pull Request Template
Thanks for taking the time to fill out this pull request!
Issue Reference
Please link to any related issues
Changes
Please describe the changes proposed in this pull request
This PR addresses an issue where
selected_callcould incorrectly mark a failed LLM response as selected. The logic for determining theselectedcall inengine/baml-runtime/src/tracingv2/storage/storage.rsnow explicitly checks that the LLM response does not contain anerror_message. A new unit test has been added to ensure that if multiple calls occur, a successful call is correctly prioritized as theselected_callover a failed one.Testing
Please describe how you tested these changes
test_selected_call_prefers_success_over_failure)Screenshots
If applicable, add screenshots to help explain your changes
PR Checklist
Please ensure you've completed these items
Additional Notes
Add any other context about the PR here
Tests were not executed in the environment due to missing
cargo.To run tests locally:
cargo test -p baml-runtime -qSlack Thread