Fix(llm):When LLM calls are retried, it can lead to inaccurate records. #949

CLFutureX · 2025-10-29T02:34:03Z

When LLM calls are retried, it can lead to inaccurate records.

Signed-off-by: CLFutureX <[email protected]>

CLFutureX · 2025-10-30T01:31:47Z

xingyaoww · 2025-10-30T20:38:49Z

openhands-sdk/openhands/sdk/llm/llm.py

        )
        def _one_attempt(**retry_kwargs) -> ModelResponse:
            assert self._telemetry is not None
+            self._telemetry.on_request(log_ctx=log_ctx)


@OpenHands please help me /codereview and tell me what this change will do - how will it fix the issue, and also help me write a new test for it

I'm on it! xingyaoww can track my progress at all-hands.dev

Summary

I've completed the code review of PR #949 and created comprehensive tests for the fix. Here's what I found:

Code Review: What the Change Does

The Problem:
When LLM calls were retried due to failures (e.g., API connection errors), telemetry recorded inaccurate latency metrics. The on_request() method was called once before the retry loop, so the recorded latency included all failed attempts plus wait times between retries—not just the successful call.

The Fix:
The PR moves self._telemetry.on_request(log_ctx=log_ctx) from outside the _one_attempt() function (line 462) to inside it (line 476). This ensures:

✅ on_request() is called at the START of each retry attempt

✅ Each attempt gets its own fresh timestamp

✅ Only the successful attempt's latency is recorded

✅ Failed attempts don't inflate the metrics

Assessment: ✅ Excellent Fix

This is a minimal, well-targeted change that:

Fixes the accuracy issue without side effects

Maintains backward compatibility

Doesn't change the API surface

Is easy to understand and maintain

New Test Suite Created

I've created tests/sdk/llm/test_llm_retry_telemetry.py with 4 comprehensive tests that verify:

test_telemetry_records_only_successful_attempt_latency - Ensures recorded latency reflects only the successful attempt, not the cumulative time

test_telemetry_on_request_called_per_retry - Verifies on_request() is called for each retry attempt

test_telemetry_metrics_accurate_with_retries - Confirms all metrics (tokens, cost, latency) only reflect the successful attempt

test_telemetry_no_multiple_records_on_retry - Ensures failed attempts don't create spurious records

✅ All 4 tests pass
✅ Pre-commit checks pass (Ruff, pycodestyle, pyright)

The test file is ready to be committed to the PR branch. Would you like me to commit and push it to update the PR?

_{View full conversation}

Makes sense to me!

This test suite verifies that telemetry records are accurate when LLM calls are retried, ensuring only the successful attempt's latency is recorded and not the cumulative time including failed attempts. Tests include: - Latency records only successful attempt - on_request() called per retry attempt - All metrics (tokens, cost, latency) are accurate with retries - No multiple records created for failed attempts Co-authored-by: openhands <[email protected]>

blacksmith-sh · 2025-11-01T12:52:39Z

[Automatic Post]: I have assigned @xingyaoww as a reviewer based on git blame information. Thanks in advance for the help!

xingyaoww

Thank you!

github-actions · 2025-11-03T18:23:49Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-11-03T18:27:14Z

🧪 Integration Tests Results

Overall Success Rate: 100.0%
Total Cost: $0.78
Models Tested: 3
Timestamp: 2025-11-03 18:27:12 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_gpt_5_mini_2025_08_07: 📥 View & Download Logs
litellm_proxy_deepseek_deepseek_chat: 📥 View & Download Logs
litellm_proxy_claude_sonnet_4_5_20250929: 📥 View & Download Logs

📊 Summary

Model	Success Rate	Tests Passed	Total Tests	Cost
litellm_proxy_gpt_5_mini_2025_08_07	100.0%	7/7	7	$0.04
litellm_proxy_deepseek_deepseek_chat	100.0%	7/7	7	$0.02
litellm_proxy_claude_sonnet_4_5_20250929	100.0%	7/7	7	$0.72

📋 Detailed Results

litellm_proxy_gpt_5_mini_2025_08_07

Success Rate: 100.0% (7/7)
Total Cost: $0.04
Run Suffix: litellm_proxy_gpt_5_mini_2025_08_07_7f7db88_gpt5_mini_run_N7_20251103_182412

litellm_proxy_deepseek_deepseek_chat

Success Rate: 100.0% (7/7)
Total Cost: $0.02
Run Suffix: litellm_proxy_deepseek_deepseek_chat_7f7db88_deepseek_run_N7_20251103_182413

litellm_proxy_claude_sonnet_4_5_20250929

Success Rate: 100.0% (7/7)
Total Cost: $0.72
Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_7f7db88_sonnet_run_N7_20251103_182410

CLFutureX and others added 2 commits October 28, 2025 17:54

fix llm retry

3e8c208

Signed-off-by: CLFutureX <[email protected]>

Merge branch 'main' into fix_llm_retry

0c483c2

Merge branch 'main' into fix_llm_retry

9794db0

xingyaoww reviewed Oct 30, 2025

View reviewed changes

openhands-agent and others added 2 commits October 31, 2025 17:50

Merge branch 'main' into fix_llm_retry

eff10e1

blacksmith-sh bot requested a review from xingyaoww November 1, 2025 12:52

Merge branch 'main' into fix_llm_retry

7f7db88

xingyaoww approved these changes Nov 3, 2025

View reviewed changes

xingyaoww added the integration-test Runs the integration tests and comments the results label Nov 3, 2025

xingyaoww merged commit ed146c9 into OpenHands:main Nov 3, 2025
18 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix(llm):When LLM calls are retried, it can lead to inaccurate records. #949

Fix(llm):When LLM calls are retried, it can lead to inaccurate records. #949

Uh oh!

CLFutureX commented Oct 29, 2025

Uh oh!

CLFutureX commented Oct 30, 2025

Uh oh!

xingyaoww Oct 30, 2025

Uh oh!

openhands-ai bot Oct 30, 2025

Uh oh!

openhands-ai bot Oct 30, 2025

Uh oh!

enyst Oct 30, 2025

Uh oh!

blacksmith-sh bot commented Nov 1, 2025

Uh oh!

xingyaoww left a comment

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Fix(llm):When LLM calls are retried, it can lead to inaccurate records. #949

Fix(llm):When LLM calls are retried, it can lead to inaccurate records. #949

Uh oh!

Conversation

CLFutureX commented Oct 29, 2025

Uh oh!

CLFutureX commented Oct 30, 2025

Uh oh!

xingyaoww Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

openhands-ai bot Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

openhands-ai bot Oct 30, 2025

Choose a reason for hiding this comment

Summary

Code Review: What the Change Does

Assessment: ✅ Excellent Fix

New Test Suite Created

Uh oh!

enyst Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

blacksmith-sh bot commented Nov 1, 2025

Uh oh!

xingyaoww left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Nov 3, 2025

Uh oh!

github-actions bot commented Nov 3, 2025

🧪 Integration Tests Results

📁 Detailed Logs & Artifacts

📊 Summary

📋 Detailed Results

litellm_proxy_gpt_5_mini_2025_08_07

litellm_proxy_deepseek_deepseek_chat

litellm_proxy_claude_sonnet_4_5_20250929

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants