ROB-1933: make sure approved or denied tool call triggers a SSE event #1054

nherment · 2025-10-13T09:38:49Z

Ensures SSE messages are sent to the UI for tool results when tools are approved or denied. Otherwise the UI does not get updated.

coderabbitai · 2025-10-13T09:39:08Z

Walkthrough

Process tool-approval decisions now produces both updated messages and STREAM TOOL_RESULT events. call_stream accepts tool_decisions, yields pre-processed tool-result events, then continues normal streaming. Server only forwards tool_decisions into the streaming path. Tests adapted to new return shape.

Changes

Cohort / File(s)	Summary
Tool-calling LLM: decisions & streaming `holmes/core/tool_calling_llm.py`	- `process_tool_decisions(...) -> tuple[list[dict], list[StreamMessage]]`: returns `(updated_messages, events)` and emits `TOOL_RESULT` stream events for approved/denied tool calls. - `call_stream(..., tool_decisions: List[ToolApprovalDecision]
Server: endpoint routing `server.py`	- Removed non-stream pre-processing of tool decisions for chat requests. - Streaming branch passes `tool_decisions` into `ai.call_stream(..., tool_decisions=...)`; non-stream path unchanged.
Tests: return-shape updates `tests/test_approval_workflow.py`	- Mocks updated: `process_tool_decisions` now returns `(updated_messages, [])` (messages plus empty `StreamMessage` list) instead of a single list.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant C as Client
  participant S as Server (/chat)
  participant L as ToolCallingLLM
  participant T as Tool(s)

  C->>S: POST /chat (messages, tool_decisions, stream=true)
  S->>L: call_stream(messages, tool_decisions)
  rect #EFEFEF
    note right of L: process_tool_decisions -> (messages, events)
    L-->>S: yield TOOL_RESULT events (denied/approved)
  end
  alt approved
    L->>T: invoke tool (user_approved=True)
    T-->>L: ToolCallResult(success/data)
    L-->>S: yield TOOL_RESULT (approved + result)
  else denied
    L-->>S: yield TOOL_RESULT (denied)
  end
  note over L: continue normal LLM streaming
  L-->>S: assistant token/events
  S-->>C: SSE stream (TOOL_RESULT + tokens)

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

ROB-1933: issues with SaaS approval flow #1037 — Directly modifies tool_calling_llm.py tool-decision flow and event emission; strongly related.
ROB-1933 tool approval workflow #919 — Adds/adjusts tool-approval handling and streaming of tool call results in tool_calling_llm.py.
ROB-1689 call_stream support for api chat #759 — Related changes to call_stream and StreamMessage-based streaming logic.

Suggested reviewers

moshemorad

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title clearly summarizes the primary change by stating that approved or denied tool calls will now trigger an SSE event, which aligns directly with the implemented updates to process_tool_decisions and call_stream for emitting tool result messages. It is concise, specific, and reflects the developer’s focus without extraneous details.
Description Check	✅ Passed	The description accurately describes the goal of sending SSE messages to the UI for tool approval and denial outcomes, which matches the changes in the pull request that ensure those events are emitted.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch rob-1933_holmes_tool_approval_workflow_3

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

holmes/core/tool_calling_llm.py (1)

337-379: Avoid 500s on stale/mismatched decisions; preserve tool result order on insert

Raising when no pending approvals found will 500 the SSE route. Prefer emitting an ERROR event and returning unchanged messages.
Inserting multiple tool results at the same index reverses their order. Track per-message insert offset to preserve tool_calls order.

-        if not pending_tool_calls:
-            error_message = f"Received {len(tool_decisions)} tool decisions but no pending approvals found"
-            logging.error(error_message)
-            raise Exception(error_message)
+        if not pending_tool_calls:
+            error_message = (
+                f"Received {len(tool_decisions)} tool decisions but no pending approvals found"
+            )
+            logging.warning(error_message)
+            events.append(
+                StreamMessage(
+                    event=StreamEvents.ERROR,
+                    data={"message": error_message},
+                )
+            )
+            return messages, events
@@
-        for tool_call_with_decision in pending_tool_calls:
+        # Preserve insertion order per assistant message
+        insert_offsets: dict[int, int] = {}
+        for tool_call_with_decision in pending_tool_calls:
@@
-            messages.insert(
-                tool_call_with_decision.message_index + 1, tool_call_message
-            )
+            idx = tool_call_with_decision.message_index
+            offset = insert_offsets.get(idx, 0)
+            messages.insert(idx + 1 + offset, tool_call_message)
+            insert_offsets[idx] = offset + 1

🧹 Nitpick comments (3)

tests/test_approval_workflow.py (2)

263-274: Fix Ruff ARG005 and RUF005 in test lambda

Rename unused arg to _tool_decisions.
Prefer iterable unpacking over concatenation.

-    ai.process_tool_decisions = MagicMock(
-        side_effect=lambda messages, tool_decisions: (
-            messages
-            + [
+    ai.process_tool_decisions = MagicMock(
+        side_effect=lambda messages, _tool_decisions: (
+            [
+                *messages,
                 {
                     "tool_call_id": "tool_call_123",
                     "role": "tool",
                     "name": "kubectl_delete",
                     "content": "pod 'dangerous-pod' deleted",
                 }
-            ],
+            ],
             [],  # Empty list for StreamMessages
         )
     )

409-420: Fix Ruff ARG005 and RUF005 in rejection test lambda

Rename unused arg to _tool_decisions.
Use iterable unpacking.

-    ai.process_tool_decisions = MagicMock(
-        side_effect=lambda messages, tool_decisions: (
-            messages
-            + [
+    ai.process_tool_decisions = MagicMock(
+        side_effect=lambda messages, _tool_decisions: (
+            [
+                *messages,
                 {
                     "tool_call_id": "tool_call_123",
                     "role": "tool",
                     "name": "kubectl_delete",
                     "content": "Tool execution was denied by the user.",
                 }
-            ],
+            ],
             [],  # Empty list for StreamMessages
         )
     )

holmes/core/tool_calling_llm.py (1)

293-306: Update docstring to reflect new return type

The function now returns (messages, events) but docstring says “Updated messages list”. Align it to avoid confusion and mypy hinting issues.
-        Returns:
-            Updated messages list with tool execution results
+        Returns:
+            A tuple of:
+            - Updated messages with tool execution results
+            - Stream events (TOOL_RESULT) emitted for each decision

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b65307d and 7c1dba5.

📒 Files selected for processing (3)

holmes/core/tool_calling_llm.py (6 hunks)
server.py (1 hunks)
tests/test_approval_workflow.py (2 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml) for all Python code
Type hints are required; code should pass mypy (configured in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods

Files:

server.py
tests/test_approval_workflow.py
holmes/core/tool_calling_llm.py

tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Only use pytest markers that are defined in pyproject.toml; never introduce undefined markers/tags

Files:

tests/test_approval_workflow.py

tests/**

📄 CodeRabbit inference engine (CLAUDE.md)

Test files should mirror the source structure under tests/

Files:

tests/test_approval_workflow.py

🧬 Code graph analysis (1)

holmes/core/tool_calling_llm.py (5)

holmes/utils/stream.py (2)

StreamMessage (25-27)

StreamEvents (15-22)

holmes/core/models.py (4)

ToolCallResult (23-60)

as_streaming_tool_result_response (50-60)

as_tool_call_message (30-36)

ToolApprovalDecision (177-181)

holmes/core/tracing.py (1)

DummySpan (101-122)

holmes/core/tools.py (2)

StructuredToolResult (79-103)

StructuredToolResultStatus (52-76)

holmes/core/safeguards.py (1)

prevent_overly_repeated_tool_call (78-120)

🪛 Ruff (0.13.3)

tests/test_approval_workflow.py

263-263: Unused lambda argument: tool_decisions

(ARG005)

264-272: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

409-409: Unused lambda argument: tool_decisions

(ARG005)

410-418: Consider iterable unpacking instead of concatenation

Replace with iterable unpacking

(RUF005)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: llm_evals
GitHub Check: Pre-commit checks
GitHub Check: build

🔇 Additional comments (3)

holmes/core/tool_calling_llm.py (2)

684-691: Good: skip redundancy safeguard when user approved

Guarding prevent_overly_repeated_tool_call behind user_approved avoids blocking explicit approvals.

899-912: Pre-processing and streaming decisions: looks correct

Processes decisions, yields TOOL_RESULT events, and carries updated msgs into the loop. Nice.

server.py (1)

367-368: No action needed: ChatRequest.tool_decisions is defined on a Pydantic BaseModel, so incoming lists of dicts are coerced into ToolApprovalDecision instances.

moshemorad · 2025-10-13T10:19:56Z

holmes/core/tool_calling_llm.py

-            tool_calls=previous_tool_calls,
-        )
+        tool_response = None
+        if not user_approved:


why we changed it? if the user approved it but it was already ran once we still would like to run it?

I changed it because I noticed it. My thinking is that if a user actively approves a command then it should run, regardless of any restriction, including repeated commands.

Not sure i fully agreed with that. maybe worth discuss it with product and make sure?

holmes/core/tool_calling_llm.py

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

holmes/core/tool_calling_llm.py (3)
292-292: Use consistent type annotation style.

The return type mixes lowercase built-in types (tuple, list) with uppercase typing module types (List, Dict). For consistency with the rest of the codebase which imports and uses List from the typing module (line 5), use Tuple[List[Dict[str, Any]], List[StreamMessage]].

Apply this diff:
-    ) -> tuple[List[Dict[str, Any]], list[StreamMessage]]:
+    ) -> Tuple[List[Dict[str, Any]], List[StreamMessage]]:
And add Tuple to the imports on line 5:
-from typing import Dict, List, Optional, Type, Union, Callable, Any
+from typing import Dict, List, Optional, Type, Union, Callable, Any, Tuple
303-303: Use List[StreamMessage] for consistency.

The variable declaration uses lowercase list[StreamMessage] while the codebase consistently uses List from the typing module.

Apply this diff:
-        events: list[StreamMessage] = []
+        events: List[StreamMessage] = []
899-910: Mixed union syntax.

Line 899 uses the newer union syntax List[ToolApprovalDecision] | None while the rest of the codebase uses Optional[List[ToolApprovalDecision]]. While both work in Python 3.10+, consistency with the existing codebase style is preferred.

The logic for processing tool decisions and yielding events is correct and aligns with the PR objectives.

Apply this diff for consistency:
-        tool_decisions: List[ToolApprovalDecision] | None = None,
+        tool_decisions: Optional[List[ToolApprovalDecision]] = None,

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7c1dba5 and e267b9a.

📒 Files selected for processing (1)

holmes/core/tool_calling_llm.py (6 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.py: Use Ruff for formatting and linting (configured in pyproject.toml) for all Python code
Type hints are required; code should pass mypy (configured in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods

Files:

holmes/core/tool_calling_llm.py

🧬 Code graph analysis (1)

holmes/core/tool_calling_llm.py (5)

holmes/utils/stream.py (2)

StreamMessage (25-27)

StreamEvents (15-22)

holmes/core/models.py (4)

ToolCallResult (23-60)

as_streaming_tool_result_response (50-60)

as_tool_call_message (30-36)

ToolApprovalDecision (177-181)

holmes/core/tracing.py (1)

DummySpan (101-122)

holmes/core/tools.py (2)

StructuredToolResult (79-103)

StructuredToolResultStatus (52-76)

holmes/core/safeguards.py (1)

prevent_overly_repeated_tool_call (78-120)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)

GitHub Check: Pre-commit checks
GitHub Check: llm_evals
GitHub Check: build

🔇 Additional comments (2)

holmes/core/tool_calling_llm.py (2)

342-378: LGTM! Tool approval logic correctly emits SSE events.

The implementation properly handles both approved and denied tools, generating TOOL_RESULT events in both cases and deriving the tool call message from the result. This aligns perfectly with the PR objective to ensure SSE messages are sent for tool results.

684-690: LGTM! Approved tools correctly bypass the repeated-call safeguard.

When user_approved=True, the safeguard is skipped, which makes sense—if a user explicitly approves a tool, it should run regardless of previous executions.

github-actions · 2025-10-13T12:23:19Z

Results of HolmesGPT evals

ask_holmes: 33/35 test cases were successful, 0 regressions, 1 setup failures

Test suite	Test case	Status
ask	01_how_many_pods	✅
ask	02_what_is_wrong_with_pod	✅
ask	04_related_k8s_events	✅
ask	05_image_version	✅
ask	09_crashpod	✅
ask	10_image_pull_backoff	✅
ask	110_k8s_events_image_pull	✅
ask	11_init_containers	✅
ask	13a_pending_node_selector_basic	✅
ask	14_pending_resources	✅
ask	15_failed_readiness_probe	✅
ask	17_oom_kill	✅
ask	19_detect_missing_app_details	✅
ask	20_long_log_file_search	✅
ask	24_misconfigured_pvc	✅
ask	24a_misconfigured_pvc_basic	✅
ask	28_permissions_error	🚧
ask	39_failed_toolset	✅
ask	41_setup_argo	✅
ask	42_dns_issues_steps_new_tools	⚠️
ask	43_current_datetime_from_prompt	✅
ask	45_fetch_deployment_logs_simple	✅
ask	51_logs_summarize_errors	✅
ask	53_logs_find_term	✅
ask	54_not_truncated_when_getting_pods	✅
ask	59_label_based_counting	✅
ask	60_count_less_than	✅
ask	61_exact_match_counting	✅
ask	63_fetch_error_logs_no_errors	✅
ask	79_configmap_mount_issue	✅
ask	83_secret_not_found	✅
ask	86_configmap_like_but_secret	✅
ask	93_calling_datadog[0]	✅
ask	93_calling_datadog[1]	✅
ask	93_calling_datadog[2]	✅

Legend

✅ the test was successful
:minus: the test was skipped
⚠️ the test failed but is known to be flaky or known to fail
🚧 the test had a setup failure (not a code regression)
🔧 the test failed due to mock data issues (not a code regression)
🚫 the test was throttled by API rate limits/overload
❌ the test failed and should be fixed before merging the PR

fix: make sure approved or denied tool call triggers a SSE event

7c1dba5

coderabbitai bot reviewed Oct 13, 2025

View reviewed changes

nherment requested a review from moshemorad October 13, 2025 09:58

moshemorad previously approved these changes Oct 13, 2025

View reviewed changes

chore: address PR comments

e267b9a

nherment dismissed moshemorad’s stale review via e267b9a October 13, 2025 12:13

nherment enabled auto-merge (squash) October 13, 2025 12:13

coderabbitai bot reviewed Oct 13, 2025

View reviewed changes

nherment requested a review from moshemorad October 14, 2025 05:38

moshemorad approved these changes Oct 15, 2025

View reviewed changes

nherment merged commit 3b0eee7 into master Oct 15, 2025
8 checks passed

nherment deleted the rob-1933_holmes_tool_approval_workflow_3 branch October 15, 2025 07:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ROB-1933: make sure approved or denied tool call triggers a SSE event #1054

ROB-1933: make sure approved or denied tool call triggers a SSE event #1054

nherment commented Oct 13, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Oct 13, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

moshemorad Oct 13, 2025

Uh oh!

nherment Oct 13, 2025

Uh oh!

moshemorad Oct 15, 2025

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

github-actions bot commented Oct 13, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ROB-1933: make sure approved or denied tool call triggers a SSE event #1054

ROB-1933: make sure approved or denied tool call triggers a SSE event #1054

Conversation

nherment commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

moshemorad Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

nherment Oct 13, 2025

Choose a reason for hiding this comment

Uh oh!

moshemorad Oct 15, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Oct 13, 2025

Results of HolmesGPT evals

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nherment commented Oct 13, 2025 •

edited

Loading

coderabbitai bot commented Oct 13, 2025 •

edited

Loading