-
Notifications
You must be signed in to change notification settings - Fork 53
Fix(llm):When LLM calls are retried, it can lead to inaccurate records. #949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: CLFutureX <[email protected]>
|
@xingyaoww PTAL |
| ) | ||
| def _one_attempt(**retry_kwargs) -> ModelResponse: | ||
| assert self._telemetry is not None | ||
| self._telemetry.on_request(log_ctx=log_ctx) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OpenHands please help me /codereview and tell me what this change will do - how will it fix the issue, and also help me write a new test for it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on it! xingyaoww can track my progress at all-hands.dev
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary
I've completed the code review of PR #949 and created comprehensive tests for the fix. Here's what I found:
Code Review: What the Change Does
The Problem:
When LLM calls were retried due to failures (e.g., API connection errors), telemetry recorded inaccurate latency metrics. The on_request() method was called once before the retry loop, so the recorded latency included all failed attempts plus wait times between retries—not just the successful call.
The Fix:
The PR moves self._telemetry.on_request(log_ctx=log_ctx) from outside the _one_attempt() function (line 462) to inside it (line 476). This ensures:
- ✅
on_request()is called at the START of each retry attempt - ✅ Each attempt gets its own fresh timestamp
- ✅ Only the successful attempt's latency is recorded
- ✅ Failed attempts don't inflate the metrics
Assessment: ✅ Excellent Fix
This is a minimal, well-targeted change that:
- Fixes the accuracy issue without side effects
- Maintains backward compatibility
- Doesn't change the API surface
- Is easy to understand and maintain
New Test Suite Created
I've created tests/sdk/llm/test_llm_retry_telemetry.py with 4 comprehensive tests that verify:
test_telemetry_records_only_successful_attempt_latency- Ensures recorded latency reflects only the successful attempt, not the cumulative timetest_telemetry_on_request_called_per_retry- Verifieson_request()is called for each retry attempttest_telemetry_metrics_accurate_with_retries- Confirms all metrics (tokens, cost, latency) only reflect the successful attempttest_telemetry_no_multiple_records_on_retry- Ensures failed attempts don't create spurious records
✅ All 4 tests pass
✅ Pre-commit checks pass (Ruff, pycodestyle, pyright)
The test file is ready to be committed to the PR branch. Would you like me to commit and push it to update the PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me!
This test suite verifies that telemetry records are accurate when LLM calls are retried, ensuring only the successful attempt's latency is recorded and not the cumulative time including failed attempts. Tests include: - Latency records only successful attempt - on_request() called per retry attempt - All metrics (tokens, cost, latency) are accurate with retries - No multiple records created for failed attempts Co-authored-by: openhands <[email protected]>
|
[Automatic Post]: I have assigned @xingyaoww as a reviewer based on git blame information. Thanks in advance for the help! |
xingyaoww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_gpt_5_mini_2025_08_07
litellm_proxy_deepseek_deepseek_chat
litellm_proxy_claude_sonnet_4_5_20250929
|
When LLM calls are retried, it can lead to inaccurate records.