Skip to content

GPT-5 codex ends conversation early #78

@juanmichelini

Description

@juanmichelini

Observed on: 57a49ef

Happens with GPT-5 codex, does not happens with other LLMs with same commit.

Steps to reproduce:

Run

SKIP_BUILD=0 uv run swebench-infer .llm_config/gpt-5-codex.json --num-workers 10 --n-limit 10 --max-iterations 500 --note 20251105-codex-error --max-attempts 1 --critic finish_with_patch --max-retries 3

Notice that conversations end after one message and that output.jsonl contains empty patch.

Output logs:

╭───────────────────────────────────────────────────────────────────────────────────────────── Message from Agent ─────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
Phase 1 – READING
│ │
│ 1. Rewording the Problem
│ The scikit-learn Pipeline class supports slicing thanks to a recent feature addition, but it still lacks a __len__ method. This omission causes code such as len(pipe)—where pipe is a Pipeline
│ instance—to raise an error. We need to implement __len__ for Pipeline, thereby making it consistent with other iterable-like behaviors (indexing and slicing). │
│ │
│ 2. Conventions and Details from the Description
│ - Example usage shows constructing a pipeline via Pipeline([('anova', anova_filter), ('svc', clf)]). │
│ - Attempting len(pipe) on such an instance currently fails, even though slicing like pipe[:] is supported. │
│ - Environment details from the original report confirm the issue appears in scikit-learn 0.21.dev0 with Python 3.6.7 on macOS. │
│ │
│ 3. Clear Explanation of the Issue
│ The Pipeline object behaves like a sequence (supports iteration/indexing/slicing) but does not implement __len__. Python’s len() function depends on __len__; without it, calling len(pipe)
│ raises TypeError. Implementing __len__ should return the number of steps in the pipeline. │
│ │
│ 4. Steps to Reproduce the Problem
│ - Import scikit-learn components (Pipeline, SelectKBest, SVC, etc.). │
│ - Build a pipeline with one or more steps. │
│ - Call len on the pipeline (len(pipe)). │
│ - Observe TypeError: object of type 'Pipeline' has no len(). │
│ │
│ 5. Best Practices for Testing/Fixing
│ - Ensure the new __len__ method coexists with existing pipeline behaviors (iteration, indexing, slicing). │
│ - Return the count of steps (len(self.steps) or equivalent). │
│ - Handle edge cases—pipelines with zero steps should return 0. │
│ - Keep the implementation minimal and consistent with scikit-learn’s coding style (no redundant comments, clear concise code). │
│ │
│ Reasoning: │
│ │
│ │
╰────────────────────────────────────────────────────────────────────────── Tokens: ↑ input 0 • cache hit N/A • ↓ output 0 • $ 0.00 ───────────────────────────────────────────────────────────────────────────╯
[11/06/25 20:05:08] INFO run() triggered successfully: <Response [200 OK]> remote_conversation.py:589
[DOCKER] {"asctime": "2025-11-06 20:05:08,606", "levelname": "INFO", "name": "uvicorn.access", "client_addr": null, "request_line": null, "status_code": null}

╭──────────────────────────────────────────────────────────────────────────────── UNKNOWN Event: ConversationStateUpdateEvent ─────────────────────────────────────────────────────────────────────────────────╮
│ │
│ Unknown event type: ConversationStateUpdateEvent │
│ {'kind': 'ConversationStateUpdateEvent', 'id': '9107f0a8-83c5-416f-8732-c0248682438c', 'timestamp': '2025-11-06T20:05:08.606179', 'source': 'environment', 'key': 'full_state', 'value': {'id': │
│ '1a6f38b8-3ee1-4db4-9ef8-7209cb12daa8', 'agent': {'kind': 'Agent', 'llm': {'model': 'litellm_proxy/openai/gpt-5-codex', 'api_key': '**********', 'base_url': 'https://llm-proxy.eval.all-hands.dev', │
│ 'openrouter_site_url': 'https://docs.all-hands.dev/', 'openrouter_app_name': 'OpenHands', 'num_retries': 5, 'retry_multiplier': 8.0, 'retry_min_wait': 8, 'retry_max_wait': 64, 'max_message_chars': 30000, │
│ 'temperature': 1.0, 'max_input_tokens': 272000, 'max_output_tokens': 128000, 'drop_params': True, 'modify_params': True, 'disable_stop_word': False, 'caching_prompt': True, 'log_completions': False, │
│ 'log_completions_folder': 'logs/completions', 'native_tool_calling': True, 'reasoning_effort': 'high', 'enable_encrypted_reasoning': False, 'extended_thinking_budget': 200000, 'usage_id': 'default', │
│ 'litellm_extra_body': {}, 'OVERRIDE_ON_SERIALIZE': ['api_key', 'aws_access_key_id', 'aws_secret_access_key']}, 'tools': [{'name': 'terminal', 'params': {}}, {'name': 'file_editor', 'params': {}}, {'name': │
│ 'task_tracker', 'params': {}}], 'mcp_config': {}, 'system_prompt_filename': 'system_prompt.j2', 'system_prompt_kwargs': {'cli_mode': True}}, 'workspace': {'kind': 'LocalWorkspace', 'working_dir': │
│ '/workspace'}, 'persistence_dir': 'workspace/conversations/1a6f38b83ee14db49ef87209cb12daa8', 'max_iterations': 500, 'stuck_detection': True, 'execution_status': 'finished', 'confirmation_policy': │
│ {'kind': 'NeverConfirm'}, 'activated_knowledge_skills': [], 'stats': {'usage_to_metrics': {'default': {'model_name': 'litellm_proxy/openai/gpt-5-codex', 'accumulated_cost': 0.014467500000000001, │
│ 'accumulated_token_usage': {'model': 'litellm_proxy/openai/gpt-5-codex', 'prompt_tokens': 6342, 'completion_tokens': 654, 'cache_read_tokens': 0, 'cache_write_tokens': 0, 'reasoning_tokens': 192, │
│ 'context_window': 0, 'per_turn_token': 6996, 'response_id': ''}, 'costs': [{'model': 'litellm_proxy/openai/gpt-5-codex', 'cost': 0.014467500000000001, 'timestamp': 1762459508.6032553}], │
│ 'response_latencies': [{'model': 'litellm_proxy/openai/gpt-5-codex', 'latency': 16.847777605056763, 'response_id': │
│ 'resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOmxpdGVsbG1fcHJveHk7bW9kZWxfaWQ6Tm9uZTtyZXNwb25zZV9pZDpyZXNwX2JHbDBaV3hzYlRwamRYTjBiMjFmYkd4dFgzQnliM1pwWkdWeU9tOXdaVzVoYVR0dGIyUmxiRjlwWkRveVptTXdaR1ZoWkdGallUWm │
│ pabVkxTVdFeVpHVXlZV0ZpWTJRek1XSmhNalkwWWprMVlUZzFOelF6T1dVd01EUmxZakEyTVdSak5XRXpZV1l6Wmpoak8zSmxjM0J2Ym5ObFgybGtPbkpsYzNCZk1HVm1aVFEzTldSbE9HRmtNbU5qTWpBeE5qa3dZMlptTmpObFptVTRPREU1TW1FeU1qTmhNVFF4TUdSbE │
│ 56UTJZbVk9'}], 'token_usages': [{'model': 'litellm_proxy/openai/gpt-5-codex', 'prompt_tokens': 6342, 'completion_tokens': 654, 'cache_read_tokens': 0, 'cache_write_tokens': 0, 'reasoning_tokens': 192, │
│ 'context_window': 0, 'per_turn_token': 6996, 'response_id': │
│ 'resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOmxpdGVsbG1fcHJveHk7bW9kZWxfaWQ6Tm9uZTtyZXNwb25zZV9pZDpyZXNwX2JHbDBaV3hzYlRwamRYTjBiMjFmYkd4dFgzQnliM1pwWkdWeU9tOXdaVzVoYVR0dGIyUmxiRjlwWkRveVptTXdaR1ZoWkdGallUWm │
│ pabVkxTVdFeVpHVXlZV0ZpWTJRek1XSmhNalkwWWprMVlUZzFOelF6T1dVd01EUmxZakEyTVdSak5XRXpZV1l6Wmpoak8zSmxjM0J2Ym5ObFgybGtPbkpsYzNCZk1HVm1aVFEzTldSbE9HRmtNbU5qTWpBeE5qa3dZMlptTmpObFptVTRPREU1TW1FeU1qTmhNVFF4TUdSbE │
│ 56UTJZbVk9'}]}}}, 'secret_registry': {'secret_sources': {}}}} │
│ │
╰─────────────────────────────────────────────────────────────────────────────────────────────── (environment) ────────────────────────────────────────────────────────────────────────────────────────────────╯

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions