-
Notifications
You must be signed in to change notification settings - Fork 179
ROB-1933: make sure approved or denied tool call triggers a SSE event #1054
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughProcess tool-approval decisions now produces both updated messages and STREAM Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Client
participant S as Server (/chat)
participant L as ToolCallingLLM
participant T as Tool(s)
C->>S: POST /chat (messages, tool_decisions, stream=true)
S->>L: call_stream(messages, tool_decisions)
rect #EFEFEF
note right of L: process_tool_decisions -> (messages, events)
L-->>S: yield TOOL_RESULT events (denied/approved)
end
alt approved
L->>T: invoke tool (user_approved=True)
T-->>L: ToolCallResult(success/data)
L-->>S: yield TOOL_RESULT (approved + result)
else denied
L-->>S: yield TOOL_RESULT (denied)
end
note over L: continue normal LLM streaming
L-->>S: assistant token/events
S-->>C: SSE stream (TOOL_RESULT + tokens)
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
holmes/core/tool_calling_llm.py (1)
337-379
: Avoid 500s on stale/mismatched decisions; preserve tool result order on insert
- Raising when no pending approvals found will 500 the SSE route. Prefer emitting an ERROR event and returning unchanged messages.
- Inserting multiple tool results at the same index reverses their order. Track per-message insert offset to preserve tool_calls order.
- if not pending_tool_calls: - error_message = f"Received {len(tool_decisions)} tool decisions but no pending approvals found" - logging.error(error_message) - raise Exception(error_message) + if not pending_tool_calls: + error_message = ( + f"Received {len(tool_decisions)} tool decisions but no pending approvals found" + ) + logging.warning(error_message) + events.append( + StreamMessage( + event=StreamEvents.ERROR, + data={"message": error_message}, + ) + ) + return messages, events @@ - for tool_call_with_decision in pending_tool_calls: + # Preserve insertion order per assistant message + insert_offsets: dict[int, int] = {} + for tool_call_with_decision in pending_tool_calls: @@ - messages.insert( - tool_call_with_decision.message_index + 1, tool_call_message - ) + idx = tool_call_with_decision.message_index + offset = insert_offsets.get(idx, 0) + messages.insert(idx + 1 + offset, tool_call_message) + insert_offsets[idx] = offset + 1
🧹 Nitpick comments (3)
tests/test_approval_workflow.py (2)
263-274
: Fix Ruff ARG005 and RUF005 in test lambda
- Rename unused arg to
_tool_decisions
.- Prefer iterable unpacking over concatenation.
- ai.process_tool_decisions = MagicMock( - side_effect=lambda messages, tool_decisions: ( - messages - + [ + ai.process_tool_decisions = MagicMock( + side_effect=lambda messages, _tool_decisions: ( + [ + *messages, { "tool_call_id": "tool_call_123", "role": "tool", "name": "kubectl_delete", "content": "pod 'dangerous-pod' deleted", } - ], + ], [], # Empty list for StreamMessages ) )
409-420
: Fix Ruff ARG005 and RUF005 in rejection test lambda
- Rename unused arg to
_tool_decisions
.- Use iterable unpacking.
- ai.process_tool_decisions = MagicMock( - side_effect=lambda messages, tool_decisions: ( - messages - + [ + ai.process_tool_decisions = MagicMock( + side_effect=lambda messages, _tool_decisions: ( + [ + *messages, { "tool_call_id": "tool_call_123", "role": "tool", "name": "kubectl_delete", "content": "Tool execution was denied by the user.", } - ], + ], [], # Empty list for StreamMessages ) )holmes/core/tool_calling_llm.py (1)
293-306
: Update docstring to reflect new return typeThe function now returns (messages, events) but docstring says “Updated messages list”. Align it to avoid confusion and mypy hinting issues.
- Returns: - Updated messages list with tool execution results + Returns: + A tuple of: + - Updated messages with tool execution results + - Stream events (TOOL_RESULT) emitted for each decision
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
holmes/core/tool_calling_llm.py
(6 hunks)server.py
(1 hunks)tests/test_approval_workflow.py
(2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py
: Use Ruff for formatting and linting (configured in pyproject.toml) for all Python code
Type hints are required; code should pass mypy (configured in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods
Files:
server.py
tests/test_approval_workflow.py
holmes/core/tool_calling_llm.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Only use pytest markers that are defined in pyproject.toml; never introduce undefined markers/tags
Files:
tests/test_approval_workflow.py
tests/**
📄 CodeRabbit inference engine (CLAUDE.md)
Test files should mirror the source structure under tests/
Files:
tests/test_approval_workflow.py
🧬 Code graph analysis (1)
holmes/core/tool_calling_llm.py (5)
holmes/utils/stream.py (2)
StreamMessage
(25-27)StreamEvents
(15-22)holmes/core/models.py (4)
ToolCallResult
(23-60)as_streaming_tool_result_response
(50-60)as_tool_call_message
(30-36)ToolApprovalDecision
(177-181)holmes/core/tracing.py (1)
DummySpan
(101-122)holmes/core/tools.py (2)
StructuredToolResult
(79-103)StructuredToolResultStatus
(52-76)holmes/core/safeguards.py (1)
prevent_overly_repeated_tool_call
(78-120)
🪛 Ruff (0.13.3)
tests/test_approval_workflow.py
263-263: Unused lambda argument: tool_decisions
(ARG005)
264-272: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
409-409: Unused lambda argument: tool_decisions
(ARG005)
410-418: Consider iterable unpacking instead of concatenation
Replace with iterable unpacking
(RUF005)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: llm_evals
- GitHub Check: Pre-commit checks
- GitHub Check: build
🔇 Additional comments (3)
holmes/core/tool_calling_llm.py (2)
684-691
: Good: skip redundancy safeguard when user approvedGuarding prevent_overly_repeated_tool_call behind user_approved avoids blocking explicit approvals.
899-912
: Pre-processing and streaming decisions: looks correctProcesses decisions, yields TOOL_RESULT events, and carries updated msgs into the loop. Nice.
server.py (1)
367-368
: No action needed: ChatRequest.tool_decisions is defined on a Pydantic BaseModel, so incoming lists of dicts are coerced into ToolApprovalDecision instances.
tool_calls=previous_tool_calls, | ||
) | ||
tool_response = None | ||
if not user_approved: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why we changed it? if the user approved it but it was already ran once we still would like to run it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I changed it because I noticed it. My thinking is that if a user actively approves a command then it should run, regardless of any restriction, including repeated commands.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure i fully agreed with that. maybe worth discuss it with product and make sure?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
holmes/core/tool_calling_llm.py (3)
292-292
: Use consistent type annotation style.The return type mixes lowercase built-in types (
tuple
,list
) with uppercase typing module types (List
,Dict
). For consistency with the rest of the codebase which imports and usesList
from thetyping
module (line 5), useTuple[List[Dict[str, Any]], List[StreamMessage]]
.Apply this diff:
- ) -> tuple[List[Dict[str, Any]], list[StreamMessage]]: + ) -> Tuple[List[Dict[str, Any]], List[StreamMessage]]:And add
Tuple
to the imports on line 5:-from typing import Dict, List, Optional, Type, Union, Callable, Any +from typing import Dict, List, Optional, Type, Union, Callable, Any, Tuple
303-303
: UseList[StreamMessage]
for consistency.The variable declaration uses lowercase
list[StreamMessage]
while the codebase consistently usesList
from thetyping
module.Apply this diff:
- events: list[StreamMessage] = [] + events: List[StreamMessage] = []
899-910
: Mixed union syntax.Line 899 uses the newer union syntax
List[ToolApprovalDecision] | None
while the rest of the codebase usesOptional[List[ToolApprovalDecision]]
. While both work in Python 3.10+, consistency with the existing codebase style is preferred.The logic for processing tool decisions and yielding events is correct and aligns with the PR objectives.
Apply this diff for consistency:
- tool_decisions: List[ToolApprovalDecision] | None = None, + tool_decisions: Optional[List[ToolApprovalDecision]] = None,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
holmes/core/tool_calling_llm.py
(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
**/*.py
: Use Ruff for formatting and linting (configured in pyproject.toml) for all Python code
Type hints are required; code should pass mypy (configured in pyproject.toml)
ALWAYS place Python imports at the top of the file, not inside functions or methods
Files:
holmes/core/tool_calling_llm.py
🧬 Code graph analysis (1)
holmes/core/tool_calling_llm.py (5)
holmes/utils/stream.py (2)
StreamMessage
(25-27)StreamEvents
(15-22)holmes/core/models.py (4)
ToolCallResult
(23-60)as_streaming_tool_result_response
(50-60)as_tool_call_message
(30-36)ToolApprovalDecision
(177-181)holmes/core/tracing.py (1)
DummySpan
(101-122)holmes/core/tools.py (2)
StructuredToolResult
(79-103)StructuredToolResultStatus
(52-76)holmes/core/safeguards.py (1)
prevent_overly_repeated_tool_call
(78-120)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Pre-commit checks
- GitHub Check: llm_evals
- GitHub Check: build
🔇 Additional comments (2)
holmes/core/tool_calling_llm.py (2)
342-378
: LGTM! Tool approval logic correctly emits SSE events.The implementation properly handles both approved and denied tools, generating
TOOL_RESULT
events in both cases and deriving the tool call message from the result. This aligns perfectly with the PR objective to ensure SSE messages are sent for tool results.
684-690
: LGTM! Approved tools correctly bypass the repeated-call safeguard.When
user_approved=True
, the safeguard is skipped, which makes sense—if a user explicitly approves a tool, it should run regardless of previous executions.
Ensures SSE messages are sent to the UI for tool results when tools are approved or denied. Otherwise the UI does not get updated.