-
Notifications
You must be signed in to change notification settings - Fork 42
refactor: standardize Observation base class #929
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…atus helpers - Add ObservationStatus enum and result_status property - Add standardized output and error fields with default to_llm_content - Keep subclasses backward compatible by using bool(error) in has_error Co-authored-by: openhands <[email protected]>
…mpat - FinishObservation: rename message->output, add message alias; rely on base to_llm_content/visualize - ThinkObservation: rename content->output, add content alias; rely on base defaults - MCPToolObservation: adopt base error semantics while preserving is_error; unify has_error and formatting - TaskTrackerObservation: rename content->output; use base to_llm_content; keep rich visualize - FileEditorObservation: standardize error formatting prefix as 'Error: ' This aligns all observation classes with the unified Observation base (output: str, error: str|None, has_error, result_status) and removes duplicated logic where possible. Co-authored-by: openhands <[email protected]>
Coverage Report •
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
- Fix remaining assertion in test_registry to use .output - Ensure subclass output fields provide defaults to satisfy pyright - Adjust ExecuteBashObservation output default Co-authored-by: openhands <[email protected]>
…ed Observation Co-authored-by: openhands <[email protected]>
… base to_llm_content; update tests Co-authored-by: openhands <[email protected]>
|
@OpenHands Let's add a small modification: Also let's be more consistent with how observations sub-class the base observation observations should not override 'to_llm_content': to_llm_content is flexible enough so that any error message or any content can be sent back to the llm |
|
I'm on it! simonrosenberg can track my progress at all-hands.dev |
- Update _format_error to use 'Tool Execution Error:' prefix - Remove command field from DelegateObservation and TaskTrackerObservation - Remove unnecessary to_llm_content override in DelegateObservation - Update delegate/impl.py to use error field for errors (not output) - Prefix success messages with command name (spawn:/delegate:/plan:/view:) - Update TaskTrackerExecutor to handle errors via error field - Update tests to match new observation structure - Ensure output is not populated when error is present These changes make observations more consistent by: 1. Eliminating the redundant command field 2. Ensuring errors and outputs are mutually exclusive 3. Relying on base class to_llm_content when possible 4. Keeping command context in the message text when needed Co-authored-by: openhands <[email protected]>
SummaryI've successfully improved observation consistency across the codebase as requested. The changes have been committed and pushed to PR #929 ( Changes Made✅ All requested modifications completed:
Files Modified
Test Results✅ All 77 SDK tool tests pass The code is now more consistent, with clear separation between errors and outputs, and observations that follow a standardized pattern. |
- Remove custom to_llm_content override from MCPToolObservation - Rename convrted_content to converted_content (typo fix) - Populate error field with full content when MCP result has error - Populate output field with formatted content for successful results - Update tests to match new behavior (single TextContent vs list) - Error messages now include: header + error notice + content - Success messages now include: header + content This makes MCPToolObservation consistent with other observations by relying on the base class to_llm_content and properly separating error vs output fields. Co-authored-by: openhands <[email protected]>
- Remove output field from multiple commands error case - Move full error message to error field instead - Update test to check error field instead of output field - Fix line length by extracting commands_list variable This makes ExecuteBashObservation consistent with the pattern that errors should use the error field, not the output field. Co-authored-by: openhands <[email protected]>
The FileEditorObservation.to_llm_content was simply returning error or output, which is exactly what the base Observation class now provides by default. Removing this override allows the observation to use the standardized base implementation. Co-authored-by: openhands <[email protected]>
…interface - Restored 'command' field to TaskTrackerObservation - Changed 'output' back to 'content' field - Restored original visualize method that uses command field to determine styling - Added custom to_llm_content implementation - Updated executor to populate command field in observations This preserves the original interface while maintaining consistency with the base observation standardization changes. Co-authored-by: openhands <[email protected]>
…o_llm_content
- Added command: str | None field to base Observation class
- Updated to_llm_content to prepend 'Command: {command}' when command is present
- Removed to_llm_content overrides from simple observations (Glob, Grep, TaskTracker)
- Updated executors to populate output field instead of defining custom to_llm_content
- Kept complex to_llm_content overrides (Browser, ExecuteBash) that handle images and metadata
Co-authored-by: openhands <[email protected]>
- Remove 'output', 'error', 'command' field redeclarations from observation subclasses where possible - Update MCPToolObservation to use output field instead of _content field - Store text content in output, images in separate images field - Update MCP tests to match new structure - Keep FileEditorObservation.command field (required for internal logic) - When error occurs, populate only error field, not output Co-authored-by: openhands <[email protected]>
- Update ThinkExecutor to set output='Your thought has been logged.' - Update FinishExecutor to set output=action.message - Fix test_view_batch_atomicity.py to use output instead of content - Fix test_mcp_tool_kind_field.py import issues - Add Sequence import to glob and grep definitions These changes ensure observations consistently use the base output field rather than custom fields, aligning with the standardization effort. Co-authored-by: openhands <[email protected]>
- Changed base Observation._format_error to use 'Tool Execution Error:' prefix - Ensured observations don't populate output when there is an error - Removed command parameter from ExecuteBashObservation instantiations - Updated ExecuteBashObservation to override output field as str - Updated ExecuteBashObservation.to_llm_content to return error without output - Fixed terminal_session.py to only populate error field in error cases - Removed 'ERROR:' prefix from error messages (base class adds prefix) - Updated test_bash_session.py to check error field instead of output - Updated all observation tests to work with new error handling Co-authored-by: openhands <[email protected]>
This comment has been minimized.
This comment has been minimized.
- Removed custom content field, use base class output field instead - Converted content items are now stored directly in output field - Error messages include all content combined into error string - Removed unnecessary complexity from from_call_tool_result - Updated tests to match new structure (header + content items) Co-authored-by: openhands <[email protected]>
This comment has been minimized.
This comment has been minimized.
|
@OpenHands MCP For openhands-sdk/openhands/sdk/tool/builtins/finish.py For openhands-tools/openhands/tools/delegate/definition.py openhands-tools/openhands/tools/delegate/impl.py For openhands-tools/openhands/tools/execute_bash/terminal/terminal_session.py For file Editor openhands-tools/openhands/tools/file_editor/definition.py For task tracker Make sure all tests and pre-commit hooks are passing |
|
I'm on it! simonrosenberg can track my progress at all-hands.dev |
…fields - Updated all Observation subclasses to use standardized output and error fields - Added command property to all observations with type narrowing where needed - MCP observations: removed redundant to_llm_content overrides, removed error header duplication - FinishObservation/ThinkObservation: empty output list as specified - DelegateObservation: added cmd field and command property with Literal type - ExecuteBashObservation: added cmd field and command property, use raw_output for command output - FileEditorObservation: added cmd field with Literal type narrowing - TaskTrackerObservation: added cmd field with Literal type narrowing - BrowserObservation: wrapped output strings in TextContent - GrepObservation/GlobObservation: wrapped output strings in TextContent - Updated all tests to use new field names (cmd, raw_output) and TextContent wrapping Co-authored-by: openhands <[email protected]>
The base Observation class now always has content as a list of TextContent/ImageContent. Updated test_real_mcp_tool_execution_without_kind_field to properly extract text from the content list instead of treating it as a string. Co-authored-by: openhands <[email protected]>
SummaryI've successfully fixed the failing actions on PR #929 ( What Was FixedRoot Cause: The test Change Made: Updated the test to consistently treat Verification✅ Pre-commit checks: All passing (Format YAML, Ruff format, Ruff lint, PEP8 style check, Type check with pyright) Changes Committed and Pushed
The PR should now pass all CI checks. The failing actions were caused by this single test not being adapted to the refactored Observation base class structure. |
|
Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly. |
🧪 Integration Tests ResultsOverall Success Rate: 100.0% 📁 Detailed Logs & ArtifactsClick the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.
📊 Summary
📋 Detailed Resultslitellm_proxy_deepseek_deepseek_chat
litellm_proxy_gpt_5_mini_2025_08_07
litellm_proxy_claude_sonnet_4_5_20250929
|
🔄 Running Examples with
|
| Example | Status | Duration | Cost |
|---|---|---|---|
| 01_standalone_sdk/01_hello_world.py | ✅ PASS | 98s | $0.03 |
| 01_standalone_sdk/02_custom_tools.py | ✅ PASS | 64s | $0.02 |
| 01_standalone_sdk/03_activate_skill.py | ✅ PASS | 58s | $0.01 |
| 01_standalone_sdk/05_use_llm_registry.py | ✅ PASS | 9s | $0.01 |
| 01_standalone_sdk/07_mcp_integration.py | ✅ PASS | 55s | $0.03 |
| 01_standalone_sdk/09_pause_example.py | ✅ PASS | 13s | $0.01 |
| 01_standalone_sdk/10_persistence.py | ✅ PASS | 110s | $0.03 |
| 01_standalone_sdk/11_async.py | ✅ PASS | 61s | $0.02 |
| 01_standalone_sdk/12_custom_secrets.py | ✅ PASS | 18s | $0.01 |
| 01_standalone_sdk/13_get_llm_metrics.py | ✅ PASS | 34s | $0.02 |
| 01_standalone_sdk/14_context_condenser.py | ✅ PASS | 696s | $0.57 |
| 01_standalone_sdk/17_image_input.py | ✅ PASS | 20s | $0.02 |
| 01_standalone_sdk/18_send_message_while_processing.py | ✅ PASS | 21s | $0.02 |
| 01_standalone_sdk/19_llm_routing.py | ✅ PASS | 17s | $0.01 |
| 01_standalone_sdk/20_stuck_detector.py | ✅ PASS | 18s | $0.01 |
| 01_standalone_sdk/21_generate_extraneous_conversation_costs.py | ✅ PASS | 14s | $0.01 |
| 01_standalone_sdk/22_anthropic_thinking.py | ✅ PASS | 74s | $0.02 |
| 01_standalone_sdk/23_responses_reasoning.py | ✅ PASS | 47s | $0.01 |
| 01_standalone_sdk/24_planning_agent_workflow.py | ✅ PASS | 419s | $0.38 |
| 02_remote_agent_server/01_convo_with_local_agent_server.py | ✅ PASS | 131s | $0.07 |
| 02_remote_agent_server/02_convo_with_docker_sandboxed_server.py | ✅ PASS | 220s | $0.04 |
| 02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py | ✅ PASS | 82s | $0.04 |
✅ All tests passed!
Total: 22 | Passed: 22 | Failed: 0 | Total Cost: $1.38
|
Looks like there are a few issues preventing this PR from being merged!
If you'd like me to help, just leave a comment, like Feel free to include any additional details that might help me get this PR into a better state. You can manage your notification settings |
xingyaoww
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot!! This really help cleans up the codebase a bunch
Summary
Motivation and Context
There were 9 different observation classes with minor variations in how they handled errors and outputs. This refactor centralizes those patterns in the base class to reduce duplication and ensure consistent agent-visible content.
What changed
Backward Compatibility
Tests
Related Issues
Checklist
Co-authored-by: openhands [email protected]
@simonrosenberg can click here to continue refining the PR
Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.12-nodejs22golang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:9909597-pythonRun
All tags pushed for this build
About Multi-Architecture Support
9909597-python) is a multi-arch manifest supporting both amd64 and arm649909597-python-amd64) are also available if needed