Skip to content

Conversation

@CLFutureX
Copy link
Contributor

When LLM calls are retried, it can lead to inaccurate records.

@CLFutureX
Copy link
Contributor Author

@xingyaoww PTAL

)
def _one_attempt(**retry_kwargs) -> ModelResponse:
assert self._telemetry is not None
self._telemetry.on_request(log_ctx=log_ctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OpenHands please help me /codereview and tell me what this change will do - how will it fix the issue, and also help me write a new test for it

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm on it! xingyaoww can track my progress at all-hands.dev

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

I've completed the code review of PR #949 and created comprehensive tests for the fix. Here's what I found:

Code Review: What the Change Does

The Problem:
When LLM calls were retried due to failures (e.g., API connection errors), telemetry recorded inaccurate latency metrics. The on_request() method was called once before the retry loop, so the recorded latency included all failed attempts plus wait times between retries—not just the successful call.

The Fix:
The PR moves self._telemetry.on_request(log_ctx=log_ctx) from outside the _one_attempt() function (line 462) to inside it (line 476). This ensures:

  • on_request() is called at the START of each retry attempt
  • ✅ Each attempt gets its own fresh timestamp
  • ✅ Only the successful attempt's latency is recorded
  • ✅ Failed attempts don't inflate the metrics

Assessment: ✅ Excellent Fix

This is a minimal, well-targeted change that:

  • Fixes the accuracy issue without side effects
  • Maintains backward compatibility
  • Doesn't change the API surface
  • Is easy to understand and maintain

New Test Suite Created

I've created tests/sdk/llm/test_llm_retry_telemetry.py with 4 comprehensive tests that verify:

  1. test_telemetry_records_only_successful_attempt_latency - Ensures recorded latency reflects only the successful attempt, not the cumulative time
  2. test_telemetry_on_request_called_per_retry - Verifies on_request() is called for each retry attempt
  3. test_telemetry_metrics_accurate_with_retries - Confirms all metrics (tokens, cost, latency) only reflect the successful attempt
  4. test_telemetry_no_multiple_records_on_retry - Ensures failed attempts don't create spurious records

All 4 tests pass
Pre-commit checks pass (Ruff, pycodestyle, pyright)

The test file is ready to be committed to the PR branch. Would you like me to commit and push it to update the PR?

View full conversation

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me!

openhands-agent and others added 2 commits October 31, 2025 17:50
This test suite verifies that telemetry records are accurate when LLM
calls are retried, ensuring only the successful attempt's latency is
recorded and not the cumulative time including failed attempts.

Tests include:
- Latency records only successful attempt
- on_request() called per retry attempt
- All metrics (tokens, cost, latency) are accurate with retries
- No multiple records created for failed attempts

Co-authored-by: openhands <[email protected]>
@blacksmith-sh blacksmith-sh bot requested a review from xingyaoww November 1, 2025 12:52
@blacksmith-sh
Copy link
Contributor

blacksmith-sh bot commented Nov 1, 2025

[Automatic Post]: I have assigned @xingyaoww as a reviewer based on git blame information. Thanks in advance for the help!

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@xingyaoww xingyaoww added the integration-test Runs the integration tests and comments the results label Nov 3, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2025

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 3, 2025

🧪 Integration Tests Results

Overall Success Rate: 100.0%
Total Cost: $0.78
Models Tested: 3
Timestamp: 2025-11-03 18:27:12 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Success Rate Tests Passed Total Tests Cost
litellm_proxy_gpt_5_mini_2025_08_07 100.0% 7/7 7 $0.04
litellm_proxy_deepseek_deepseek_chat 100.0% 7/7 7 $0.02
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 7/7 7 $0.72

📋 Detailed Results

litellm_proxy_gpt_5_mini_2025_08_07

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.04
  • Run Suffix: litellm_proxy_gpt_5_mini_2025_08_07_7f7db88_gpt5_mini_run_N7_20251103_182412

litellm_proxy_deepseek_deepseek_chat

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.02
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_7f7db88_deepseek_run_N7_20251103_182413

litellm_proxy_claude_sonnet_4_5_20250929

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.72
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_7f7db88_sonnet_run_N7_20251103_182410

@xingyaoww xingyaoww merged commit ed146c9 into OpenHands:main Nov 3, 2025
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants