Skip to content

Conversation

@simonrosenberg
Copy link
Collaborator

@simonrosenberg simonrosenberg commented Oct 27, 2025

Summary

  • Standardized the Observation base class to unify error handling and output formatting across tools.
  • Introduced ObservationStatus enum and result_status property to compute success/error state.
  • Added standardized output: str and error: str | None fields in the base.
  • Implemented a default to_llm_content that returns a formatted error when present, otherwise the output.
  • Kept backward compatibility by having has_error use bool(self.error) to support both string and boolean-style error flags in existing subclasses.

Motivation and Context
There were 9 different observation classes with minor variations in how they handled errors and outputs. This refactor centralizes those patterns in the base class to reduce duplication and ensure consistent agent-visible content.

What changed

  • openhands-sdk/openhands/sdk/tool/schema.py
    • Added ObservationStatus enum
    • Added standardized output and error fields
    • Added has_error property and result_status computation
    • Added _format_error helper and default to_llm_content implementation
    • Kept visualize behavior consistent with to_llm_content

Backward Compatibility

  • Subclasses can still define their own fields and override to_llm_content/visualize as needed
  • has_error uses bool(self.error) to accommodate legacy subclasses that used error: bool or is_error: bool in their own models without breaking existing behavior
  • No changes were required to existing tools or tests; existing custom to_llm_content implementations (e.g., FileEditor, ExecuteBash, Browser, MCP) continue to work as before and take precedence over the default

Tests

  • Ran pre-commit and the full test suite locally. Only unrelated e2e/environment-dependent tests failed. Schema immutability tests pass with the new base properties.

Related Issues

Checklist

  • Pre-commit hooks pass for modified files
  • Types checked with pyright
  • Minimal, focused change touching only the Observation base to standardize behavior

Co-authored-by: openhands [email protected]

@simonrosenberg can click here to continue refining the PR


Agent Server images for this PR

GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant Architectures Base Image Docs / Tags
java amd64, arm64 eclipse-temurin:17-jdk Link
python amd64, arm64 nikolaik/python-nodejs:python3.12-nodejs22 Link
golang amd64, arm64 golang:1.21-bookworm Link

Pull (multi-arch manifest)

# Each variant is a multi-arch manifest supporting both amd64 and arm64
docker pull ghcr.io/openhands/agent-server:9909597-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-9909597-python \
  ghcr.io/openhands/agent-server:9909597-python

All tags pushed for this build

ghcr.io/openhands/agent-server:9909597-golang-amd64
ghcr.io/openhands/agent-server:v1.0.0a5_golang_tag_1.21-bookworm_binary-amd64
ghcr.io/openhands/agent-server:9909597-golang-arm64
ghcr.io/openhands/agent-server:v1.0.0a5_golang_tag_1.21-bookworm_binary-arm64
ghcr.io/openhands/agent-server:9909597-java-amd64
ghcr.io/openhands/agent-server:v1.0.0a5_eclipse-temurin_tag_17-jdk_binary-amd64
ghcr.io/openhands/agent-server:9909597-java-arm64
ghcr.io/openhands/agent-server:v1.0.0a5_eclipse-temurin_tag_17-jdk_binary-arm64
ghcr.io/openhands/agent-server:9909597-python-amd64
ghcr.io/openhands/agent-server:v1.0.0a5_nikolaik_s_python-nodejs_tag_python3.12-nodejs22_binary-amd64
ghcr.io/openhands/agent-server:9909597-python-arm64
ghcr.io/openhands/agent-server:v1.0.0a5_nikolaik_s_python-nodejs_tag_python3.12-nodejs22_binary-arm64
ghcr.io/openhands/agent-server:9909597-golang
ghcr.io/openhands/agent-server:9909597-java
ghcr.io/openhands/agent-server:9909597-python

About Multi-Architecture Support

  • Each variant tag (e.g., 9909597-python) is a multi-arch manifest supporting both amd64 and arm64
  • Docker automatically pulls the correct architecture for your platform
  • Individual architecture tags (e.g., 9909597-python-amd64) are also available if needed

…atus helpers

- Add ObservationStatus enum and result_status property
- Add standardized output and error fields with default to_llm_content
- Keep subclasses backward compatible by using bool(error) in has_error

Co-authored-by: openhands <[email protected]>
…mpat

- FinishObservation: rename message->output, add message alias; rely on base to_llm_content/visualize
- ThinkObservation: rename content->output, add content alias; rely on base defaults
- MCPToolObservation: adopt base error semantics while preserving is_error; unify has_error and formatting
- TaskTrackerObservation: rename content->output; use base to_llm_content; keep rich visualize
- FileEditorObservation: standardize error formatting prefix as 'Error: '

This aligns all observation classes with the unified Observation base (output: str, error: str|None, has_error, result_status) and removes duplicated logic where possible.

Co-authored-by: openhands <[email protected]>
@github-actions
Copy link
Contributor

github-actions bot commented Oct 27, 2025

Coverage

Coverage Report •
FileStmtsMissCoverMissing
openhands-sdk/openhands/sdk/mcp
   definition.py462643%47, 61–62, 65–69, 75, 79, 88, 90–92, 94–96, 98–106
   tool.py885438%39–40, 50–51, 55–58, 62, 65, 68–71, 83, 107–109, 111, 114–115, 126, 144–145, 148–152, 154–156, 162, 183, 185–186, 190–192, 200–201, 209, 219–221, 226, 233–234, 236, 261–262, 266–268
openhands-sdk/openhands/sdk/tool
   schema.py1243968%24–31, 37–39, 121, 134–135, 137–138, 140–143, 146, 150–152, 154–155, 157, 164, 250, 264, 266–268, 270–273, 275–276
openhands-sdk/openhands/sdk/tool/builtins
   finish.py27774%27–30, 43, 66, 91
   think.py331360%29, 32–33, 36, 38–42, 44, 57, 78, 103
openhands-tools/openhands/tools/browser_use
   definition.py1161983%37, 40–41, 44–46, 50–59, 61–62, 64
   impl.py17711833%25, 39, 43–46, 48, 58–61, 66, 68–70, 72–73, 80–82, 84–88, 102, 114, 154–155, 166–167, 181, 187, 201–202, 204–213, 216–225, 227–228, 230–234, 238, 240–241, 246–247, 251–252, 257–258, 262–263, 267–268, 272, 274–275, 277–280, 283–284, 288, 290, 292, 297–298, 302–303, 307–308, 313–314, 320–324, 328–333, 337–340, 342–344, 347, 351–354
openhands-tools/openhands/tools/delegate
   definition.py19478%90, 93, 99, 102
   impl.py978116%29, 31–32, 41–42, 46, 52–53, 56–59, 61, 80–81, 87–88, 98–102, 104, 106, 112, 119–120, 122–124, 129–131, 148–149, 156–158, 167, 169–171, 173, 175–178, 181–184, 186–187, 191–194, 197–199, 204–205, 208–209, 212, 214–218, 220, 223–225, 227–228, 231, 233, 238–240
openhands-tools/openhands/tools/execute_bash
   definition.py1036437%54, 57, 60–61, 63, 66–68, 70–72, 74–76, 78, 102, 106, 109–110, 113, 115–122, 124, 129, 131–133, 136, 138, 140–142, 144, 148–149, 152–154, 156–157, 160–163, 167–169, 174, 178–183, 185–186, 188, 253
   impl.py732565%59, 62, 70–71, 98–100, 102–103, 109, 111, 115, 131, 134, 137–138, 143–144, 147–148, 150, 160, 176–177, 179
openhands-tools/openhands/tools/execute_bash/terminal
   terminal_session.py1826365%90, 96, 100–102, 124–125, 152, 167–168, 202–204, 209, 212–213, 217, 223, 237–239, 244, 247–248, 252, 258, 277, 279, 282, 284, 300, 315, 321, 330, 333, 367, 371, 374, 377–378, 384–385, 391, 397–398, 404–405, 464–466, 470, 475, 480–481, 485–486, 489–492, 498–499, 502
openhands-tools/openhands/tools/file_editor
   definition.py611083%101–102, 105, 127, 130, 133, 140, 142, 144, 146
   editor.py23817028%88, 105, 108, 118–124, 130–138, 140, 158–159, 179–180, 184, 188–189, 198, 202–205, 213–214, 218–220, 226, 229, 234, 237, 240–241, 244, 247–248, 252, 256, 271–273, 282, 285, 291, 296–297, 305–309, 311, 313, 317–318, 322–323, 331–332, 334–337, 339, 346–347, 353–355, 363–367, 371, 373–374, 381, 384, 389–390, 392, 416–417, 439–440, 442–443, 449, 452, 456–462, 465–466, 469–474, 477, 480–481, 485, 488–489, 492, 494–495, 501, 505, 524, 529–532, 534, 542, 549, 556, 567–570, 572, 574, 601–604, 613–614, 643–645, 647–656, 661–664, 677–678, 683, 688, 694, 700
   impl.py331457%44–46, 68–69, 88–89, 91–93, 102–103, 106–107
openhands-tools/openhands/tools/glob
   definition.py25772%80, 82–84, 87, 90, 97
   impl.py927716%37–40, 55–56, 58–60, 62, 65, 69–70, 78–79, 81, 84–85, 90–91, 95–96, 101, 109, 111–113, 115–117, 119, 141, 151, 156–160, 162–163, 165, 167, 183–185, 189, 192, 195, 199–200, 202–204, 207–208, 210, 212, 214, 246–247, 250, 253, 255, 258–259, 262–266, 268, 270–271, 273, 275–276, 278
openhands-tools/openhands/tools/grep
   definition.py27774%82, 84–86, 89, 92, 99
   impl.py806716%35–38, 53, 55–58, 67, 70–73, 82–83, 85, 87, 89–91, 93–95, 97, 115–116, 119, 124–126, 130–131, 135, 142, 152–153, 156, 161–165, 167–168, 170, 172, 180, 194, 207–208, 211, 216–219, 222–223, 225–227, 229–231, 233, 235, 243
openhands-tools/openhands/tools/task_tracker
   definition.py1328634%54, 57–59, 61–62, 65–66, 68, 84, 86–88, 90, 92–93, 96, 99–101, 103–104, 107–113, 115–117, 120, 122–125, 127, 130, 133–134, 136–137, 139–140, 142, 171, 173, 175–177, 184, 186–187, 192–193, 199, 211–212, 214–216, 220–221, 223–226, 228, 233, 239–243, 247, 251–252, 254–255, 257, 259–263
TOTAL11766547453% 

openhands-agent and others added 4 commits October 27, 2025 22:58
- Fix remaining assertion in test_registry to use .output
- Ensure subclass output fields provide defaults to satisfy pyright
- Adjust ExecuteBashObservation output default

Co-authored-by: openhands <[email protected]>
… base to_llm_content; update tests

Co-authored-by: openhands <[email protected]>
@simonrosenberg
Copy link
Collaborator Author

@OpenHands Let's add a small modification:
def _format_error(self) -> TextContent:
return TextContent(text=f"Tool Execution Error: {self.error}")

Also let's be more consistent with how observations sub-class the base observation

observations should not override 'to_llm_content': to_llm_content is flexible enough so that any error message or any content can be sent back to the llm
when there is an error, output must not be populated in the Observation
For delegate/definition.py, this is useless return super().to_llm_content
For delegation: this should be an error not an output output=f"Unsupported command: {action.command}",. Same for all other errors.
Remove 'command' from the observation. Simply add it in the message if needed. This PR should not modify too much what is sent to the llm "to_llm_content"

@openhands-ai
Copy link

openhands-ai bot commented Oct 28, 2025

I'm on it! simonrosenberg can track my progress at all-hands.dev

- Update _format_error to use 'Tool Execution Error:' prefix
- Remove command field from DelegateObservation and TaskTrackerObservation
- Remove unnecessary to_llm_content override in DelegateObservation
- Update delegate/impl.py to use error field for errors (not output)
- Prefix success messages with command name (spawn:/delegate:/plan:/view:)
- Update TaskTrackerExecutor to handle errors via error field
- Update tests to match new observation structure
- Ensure output is not populated when error is present

These changes make observations more consistent by:
1. Eliminating the redundant command field
2. Ensuring errors and outputs are mutually exclusive
3. Relying on base class to_llm_content when possible
4. Keeping command context in the message text when needed

Co-authored-by: openhands <[email protected]>
@openhands-ai
Copy link

openhands-ai bot commented Oct 28, 2025

Summary

I've successfully improved observation consistency across the codebase as requested. The changes have been committed and pushed to PR #929 (openhands/standardize-observation-base).

Changes Made

✅ All requested modifications completed:

  1. Updated error format in _format_error():

    • Changed from "Error: {self.error}" to "Tool Execution Error: {self.error}"
  2. Removed unnecessary to_llm_content overrides:

    • Removed useless super().to_llm_content call from DelegateObservation
    • Left MCPToolObservation's override intact (necessary for structured content)
  3. Ensured errors and outputs are mutually exclusive:

    • When error is set, output is not populated (and vice versa)
    • All error cases now use the error field
    • All success cases now use the output field
  4. Fixed delegation error handling:

    • Moved all error messages from output to error field
    • Unsupported commands, spawn errors, and delegate errors all use error field
  5. Removed command field from observations:

    • Removed from DelegateObservation
    • Removed from TaskTrackerObservation
    • Added command context as message prefix (e.g., "spawn:", "delegate:", "plan:", "view:")
  6. Minimized LLM content changes:

    • Only removed the unnecessary override
    • Base class to_llm_content handles most cases
    • Custom implementations remain where needed (structured content)

Files Modified

  • openhands-sdk/openhands/sdk/tool/schema.py - Updated error format
  • openhands-tools/openhands/tools/delegate/definition.py - Removed command field and unnecessary override
  • openhands-tools/openhands/tools/delegate/impl.py - Fixed error handling, added command prefixes
  • openhands-tools/openhands/tools/task_tracker/definition.py - Removed command field, updated executor
  • tests/tools/delegation/test_delegation.py - Updated tests to match new structure

Test Results

✅ All 77 SDK tool tests pass
✅ All 189+ tool tests pass
✅ All pre-commit checks pass (format, lint, type check)
✅ Net reduction of 21 lines of code

The code is now more consistent, with clear separation between errors and outputs, and observations that follow a standardized pattern.

View full conversation

openhands-agent and others added 9 commits October 28, 2025 11:20
- Remove custom to_llm_content override from MCPToolObservation
- Rename convrted_content to converted_content (typo fix)
- Populate error field with full content when MCP result has error
- Populate output field with formatted content for successful results
- Update tests to match new behavior (single TextContent vs list)
- Error messages now include: header + error notice + content
- Success messages now include: header + content

This makes MCPToolObservation consistent with other observations by
relying on the base class to_llm_content and properly separating
error vs output fields.

Co-authored-by: openhands <[email protected]>
- Remove output field from multiple commands error case
- Move full error message to error field instead
- Update test to check error field instead of output field
- Fix line length by extracting commands_list variable

This makes ExecuteBashObservation consistent with the pattern that
errors should use the error field, not the output field.

Co-authored-by: openhands <[email protected]>
The FileEditorObservation.to_llm_content was simply returning error or output,
which is exactly what the base Observation class now provides by default.
Removing this override allows the observation to use the standardized base
implementation.

Co-authored-by: openhands <[email protected]>
…interface

- Restored 'command' field to TaskTrackerObservation
- Changed 'output' back to 'content' field
- Restored original visualize method that uses command field to determine styling
- Added custom to_llm_content implementation
- Updated executor to populate command field in observations

This preserves the original interface while maintaining consistency with the
base observation standardization changes.

Co-authored-by: openhands <[email protected]>
…o_llm_content

- Added command: str | None field to base Observation class
- Updated to_llm_content to prepend 'Command: {command}' when command is present
- Removed to_llm_content overrides from simple observations (Glob, Grep, TaskTracker)
- Updated executors to populate output field instead of defining custom to_llm_content
- Kept complex to_llm_content overrides (Browser, ExecuteBash) that handle images and metadata

Co-authored-by: openhands <[email protected]>
- Remove 'output', 'error', 'command' field redeclarations from observation subclasses where possible
- Update MCPToolObservation to use output field instead of _content field
- Store text content in output, images in separate images field
- Update MCP tests to match new structure
- Keep FileEditorObservation.command field (required for internal logic)
- When error occurs, populate only error field, not output

Co-authored-by: openhands <[email protected]>
- Update ThinkExecutor to set output='Your thought has been logged.'
- Update FinishExecutor to set output=action.message
- Fix test_view_batch_atomicity.py to use output instead of content
- Fix test_mcp_tool_kind_field.py import issues
- Add Sequence import to glob and grep definitions

These changes ensure observations consistently use the base output field
rather than custom fields, aligning with the standardization effort.

Co-authored-by: openhands <[email protected]>
- Changed base Observation._format_error to use 'Tool Execution Error:' prefix
- Ensured observations don't populate output when there is an error
- Removed command parameter from ExecuteBashObservation instantiations
- Updated ExecuteBashObservation to override output field as str
- Updated ExecuteBashObservation.to_llm_content to return error without output
- Fixed terminal_session.py to only populate error field in error cases
- Removed 'ERROR:' prefix from error messages (base class adds prefix)
- Updated test_bash_session.py to check error field instead of output
- Updated all observation tests to work with new error handling

Co-authored-by: openhands <[email protected]>
@blacksmith-sh

This comment has been minimized.

openhands-agent and others added 3 commits October 28, 2025 14:36
- Removed custom content field, use base class output field instead
- Converted content items are now stored directly in output field
- Error messages include all content combined into error string
- Removed unnecessary complexity from from_call_tool_result
- Updated tests to match new structure (header + content items)

Co-authored-by: openhands <[email protected]>
@blacksmith-sh

This comment has been minimized.

@simonrosenberg
Copy link
Collaborator Author

@OpenHands
I updated the schema of the tool Observation: openhands-sdk/openhands/sdk/tool/schema.py
I added a property ‘command: str’. This property can be narrowed in some subtypes to Literal[ …] 
I also updated the to_llm_content to serialize command, error and output.
Your job is to edit all subclasses of Observation and make sure all subclasses adhere to that base behavior.
Some subclasses had a “command” field on the main branch. Make sure to add it back as a command property. It’s now a property because we need type narrowing: some classes had command: str while others had command: Literal[…]

MCP
openhands-sdk/openhands/sdk/mcp/definition.py
openhands-sdk/openhands/sdk/mcp/tool.py
No need to do initial_message += "\n[An error occurred during execution.]"
Because format error already adds a error msg header

For openhands-sdk/openhands/sdk/tool/builtins/finish.py
And openhands-sdk/openhands/sdk/tool/builtins/think.py
Do NOT populate any field in FinishObservation. The output must be an empty list.

For openhands-tools/openhands/tools/delegate/definition.py
Add command: Literal[“spawn”, “delegate”]

openhands-tools/openhands/tools/delegate/impl.py
When creating DelegateObservation, put back the command=action.command

For openhands-tools/openhands/tools/execute_bash/terminal/terminal_session.py
Reinsert command everywhere in all ExecuteBashObservation
return ExecuteBashObservation(
output=command_output,
metadata=metadata,
)

For file Editor openhands-tools/openhands/tools/file_editor/definition.py
Add command as Literal and make sure implementation works

For task tracker
openhands-tools/openhands/tools/task_tracker/definition.py
Make sure command works well

Make sure all tests and pre-commit hooks are passing

@openhands-ai
Copy link

openhands-ai bot commented Oct 31, 2025

I'm on it! simonrosenberg can track my progress at all-hands.dev

…fields

- Updated all Observation subclasses to use standardized output and error fields
- Added command property to all observations with type narrowing where needed
- MCP observations: removed redundant to_llm_content overrides, removed error header duplication
- FinishObservation/ThinkObservation: empty output list as specified
- DelegateObservation: added cmd field and command property with Literal type
- ExecuteBashObservation: added cmd field and command property, use raw_output for command output
- FileEditorObservation: added cmd field with Literal type narrowing
- TaskTrackerObservation: added cmd field with Literal type narrowing
- BrowserObservation: wrapped output strings in TextContent
- GrepObservation/GlobObservation: wrapped output strings in TextContent
- Updated all tests to use new field names (cmd, raw_output) and TextContent wrapping

Co-authored-by: openhands <[email protected]>
The base Observation class now always has content as a list of TextContent/ImageContent.
Updated test_real_mcp_tool_execution_without_kind_field to properly extract text from
the content list instead of treating it as a string.

Co-authored-by: openhands <[email protected]>
@openhands-ai
Copy link

openhands-ai bot commented Nov 5, 2025

Summary

I've successfully fixed the failing actions on PR #929 (openhands/standardize-observation-base).

What Was Fixed

Root Cause: The test test_real_mcp_tool_execution_without_kind_field in tests/sdk/mcp/test_mcp_tool_kind_field.py wasn't updated to handle the new Observation base class structure where content is always a list of TextContent | ImageContent objects.

Change Made: Updated the test to consistently treat observation.content as a list and extract text from it properly, removing the conditional logic that tried to handle it as a string in error cases.

Verification

Pre-commit checks: All passing (Format YAML, Ruff format, Ruff lint, PEP8 style check, Type check with pyright)
MCP tests: All 4 tests in test_mcp_tool_kind_field.py passing
Bash observation tests: Tests mentioned in PR comments passing (test_cmd_output_observation_properties, test_bash_reset_basic, etc.)
Comprehensive test suite: 1312+ tests passing across tools and SDK

Changes Committed and Pushed

  • Commit: e62771b8 - "fix: update test to handle observation.content as list"
  • Status: Already pushed to origin/openhands/standardize-observation-base

The PR should now pass all CI checks. The failing actions were caused by this single test not being adapted to the refactored Observation base class structure.

View full conversation

@simonrosenberg simonrosenberg added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation. review-this This label triggers a PR review by OpenHands labels Nov 5, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2025

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2025

🧪 Integration Tests Results

Overall Success Rate: 100.0%
Total Cost: $0.35
Models Tested: 3
Timestamp: 2025-11-05 19:14:26 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

📊 Summary

Model Success Rate Tests Passed Total Tests Cost
litellm_proxy_deepseek_deepseek_chat 100.0% 7/7 7 $0.02
litellm_proxy_gpt_5_mini_2025_08_07 100.0% 7/7 7 $0.04
litellm_proxy_claude_sonnet_4_5_20250929 100.0% 7/7 7 $0.29

📋 Detailed Results

litellm_proxy_deepseek_deepseek_chat

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.02
  • Run Suffix: litellm_proxy_deepseek_deepseek_chat_4293eb1_deepseek_run_N7_20251105_191143

litellm_proxy_gpt_5_mini_2025_08_07

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.04
  • Run Suffix: litellm_proxy_gpt_5_mini_2025_08_07_4293eb1_gpt5_mini_run_N7_20251105_191141

litellm_proxy_claude_sonnet_4_5_20250929

  • Success Rate: 100.0% (7/7)
  • Total Cost: $0.29
  • Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_4293eb1_sonnet_run_N7_20251105_191144

@xingyaoww xingyaoww added the test-examples Run all applicable "examples/" files. Expensive operation. label Nov 5, 2025
@github-actions
Copy link
Contributor

github-actions bot commented Nov 5, 2025

🔄 Running Examples with openhands/claude-haiku-4-5-20251001

Last updated: 2025-11-05 20:23:18 UTC

Example Status Duration Cost
01_standalone_sdk/01_hello_world.py ✅ PASS 98s $0.03
01_standalone_sdk/02_custom_tools.py ✅ PASS 64s $0.02
01_standalone_sdk/03_activate_skill.py ✅ PASS 58s $0.01
01_standalone_sdk/05_use_llm_registry.py ✅ PASS 9s $0.01
01_standalone_sdk/07_mcp_integration.py ✅ PASS 55s $0.03
01_standalone_sdk/09_pause_example.py ✅ PASS 13s $0.01
01_standalone_sdk/10_persistence.py ✅ PASS 110s $0.03
01_standalone_sdk/11_async.py ✅ PASS 61s $0.02
01_standalone_sdk/12_custom_secrets.py ✅ PASS 18s $0.01
01_standalone_sdk/13_get_llm_metrics.py ✅ PASS 34s $0.02
01_standalone_sdk/14_context_condenser.py ✅ PASS 696s $0.57
01_standalone_sdk/17_image_input.py ✅ PASS 20s $0.02
01_standalone_sdk/18_send_message_while_processing.py ✅ PASS 21s $0.02
01_standalone_sdk/19_llm_routing.py ✅ PASS 17s $0.01
01_standalone_sdk/20_stuck_detector.py ✅ PASS 18s $0.01
01_standalone_sdk/21_generate_extraneous_conversation_costs.py ✅ PASS 14s $0.01
01_standalone_sdk/22_anthropic_thinking.py ✅ PASS 74s $0.02
01_standalone_sdk/23_responses_reasoning.py ✅ PASS 47s $0.01
01_standalone_sdk/24_planning_agent_workflow.py ✅ PASS 419s $0.38
02_remote_agent_server/01_convo_with_local_agent_server.py ✅ PASS 131s $0.07
02_remote_agent_server/02_convo_with_docker_sandboxed_server.py ✅ PASS 220s $0.04
02_remote_agent_server/03_browser_use_with_docker_sandboxed_server.py ✅ PASS 82s $0.04

✅ All tests passed!

Total: 22 | Passed: 22 | Failed: 0 | Total Cost: $1.38

View full workflow run

@openhands-ai
Copy link

openhands-ai bot commented Nov 5, 2025

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • Run tests

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #929 at branch `openhands/standardize-observation-base`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot!! This really help cleans up the codebase a bunch

@xingyaoww xingyaoww enabled auto-merge (squash) November 5, 2025 20:10
@xingyaoww xingyaoww merged commit 41d8d80 into main Nov 5, 2025
16 checks passed
@xingyaoww xingyaoww deleted the openhands/standardize-observation-base branch November 5, 2025 20:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

integration-test Runs the integration tests and comments the results test-examples Run all applicable "examples/" files. Expensive operation.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

refactor: standardize Observation class

5 participants