GPT-5 codex ends conversation early

Observed on: 57a49ef0dd813a41f990b99872af6ae3391f16cf

Happens with GPT-5 codex, does not happens with other LLMs with same commit.

Steps to reproduce:

Run

`SKIP_BUILD=0 uv run swebench-infer .llm_config/gpt-5-codex.json --num-workers 10 --n-limit 10 --max-iterations 500 --note 20251105-codex-error --max-attempts 1 --critic finish_with_patch  --max-retries 3`

Notice that conversations end after one message and that output.jsonl contains empty patch.

Output logs:

╭───────────────────────────────────────────────────────────────────────────────────────────── Message from Agent ─────────────────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                                                                              │
│ **Phase 1 – READING**                                                                                                                                                                                        │
│                                                                                                                                                                                                              │
│ 1. **Rewording the Problem**                                                                                                                                                                                 │
│    The scikit-learn `Pipeline` class supports slicing thanks to a recent feature addition, but it still lacks a `__len__` method. This omission causes code such as `len(pipe)`—where `pipe` is a `Pipeline` │
│ instance—to raise an error. We need to implement `__len__` for `Pipeline`, thereby making it consistent with other iterable-like behaviors (indexing and slicing).                                           │
│                                                                                                                                                                                                              │
│ 2. **Conventions and Details from the Description**                                                                                                                                                          │
│    - Example usage shows constructing a pipeline via `Pipeline([('anova', anova_filter), ('svc', clf)])`.                                                                                                    │
│    - Attempting `len(pipe)` on such an instance currently fails, even though slicing like `pipe[:]` is supported.                                                                                            │
│    - Environment details from the original report confirm the issue appears in scikit-learn 0.21.dev0 with Python 3.6.7 on macOS.                                                                            │
│                                                                                                                                                                                                              │
│ 3. **Clear Explanation of the Issue**                                                                                                                                                                        │
│    The `Pipeline` object behaves like a sequence (supports iteration/indexing/slicing) but does not implement `__len__`. Python’s `len()` function depends on `__len__`; without it, calling `len(pipe)`     │
│ raises `TypeError`. Implementing `__len__` should return the number of steps in the pipeline.                                                                                                                │
│                                                                                                                                                                                                              │
│ 4. **Steps to Reproduce the Problem**                                                                                                                                                                        │
│    - Import scikit-learn components (`Pipeline`, `SelectKBest`, `SVC`, etc.).                                                                                                                                │
│    - Build a pipeline with one or more steps.                                                                                                                                                                │
│    - Call `len` on the pipeline (`len(pipe)`).                                                                                                                                                               │
│    - Observe `TypeError: object of type 'Pipeline' has no len()`.                                                                                                                                            │
│                                                                                                                                                                                                              │
│ 5. **Best Practices for Testing/Fixing**                                                                                                                                                                     │
│    - Ensure the new `__len__` method coexists with existing pipeline behaviors (iteration, indexing, slicing).                                                                                               │
│    - Return the count of steps (`len(self.steps)` or equivalent).                                                                                                                                            │
│    - Handle edge cases—pipelines with zero steps should return 0.                                                                                                                                            │
│    - Keep the implementation minimal and consistent with scikit-learn’s coding style (no redundant comments, clear concise code).                                                                            │
│                                                                                                                                                                                                              │
│ Reasoning:                                                                                                                                                                                                   │
│                                                                                                                                                                                                              │
│                                                                                                                                                                                                              │
╰────────────────────────────────────────────────────────────────────────── Tokens: ↑ input 0 • cache hit N/A • ↓ output 0 • $ 0.00 ───────────────────────────────────────────────────────────────────────────╯
[11/06/25 20:05:08] INFO     run() triggered successfully: <Response [200 OK]>                                                                                                        remote_conversation.py:589
[DOCKER] {"asctime": "2025-11-06 20:05:08,606", "levelname": "INFO", "name": "uvicorn.access", "client_addr": null, "request_line": null, "status_code": null}

╭──────────────────────────────────────────────────────────────────────────────── UNKNOWN Event: ConversationStateUpdateEvent ─────────────────────────────────────────────────────────────────────────────────╮
│                                                                                                                                                                                                              │
│ Unknown event type: ConversationStateUpdateEvent                                                                                                                                                             │
│ {'kind': 'ConversationStateUpdateEvent', 'id': '9107f0a8-83c5-416f-8732-c0248682438c', 'timestamp': '2025-11-06T20:05:08.606179', 'source': 'environment', 'key': 'full_state', 'value': {'id':              │
│ '1a6f38b8-3ee1-4db4-9ef8-7209cb12daa8', 'agent': {'kind': 'Agent', 'llm': {'model': 'litellm_proxy/openai/gpt-5-codex', 'api_key': '**********', 'base_url': 'https://llm-proxy.eval.all-hands.dev',         │
│ 'openrouter_site_url': 'https://docs.all-hands.dev/', 'openrouter_app_name': 'OpenHands', 'num_retries': 5, 'retry_multiplier': 8.0, 'retry_min_wait': 8, 'retry_max_wait': 64, 'max_message_chars': 30000,  │
│ 'temperature': 1.0, 'max_input_tokens': 272000, 'max_output_tokens': 128000, 'drop_params': True, 'modify_params': True, 'disable_stop_word': False, 'caching_prompt': True, 'log_completions': False,       │
│ 'log_completions_folder': 'logs/completions', 'native_tool_calling': True, 'reasoning_effort': 'high', 'enable_encrypted_reasoning': False, 'extended_thinking_budget': 200000, 'usage_id': 'default',       │
│ 'litellm_extra_body': {}, 'OVERRIDE_ON_SERIALIZE': ['api_key', 'aws_access_key_id', 'aws_secret_access_key']}, 'tools': [{'name': 'terminal', 'params': {}}, {'name': 'file_editor', 'params': {}}, {'name': │
│ 'task_tracker', 'params': {}}], 'mcp_config': {}, 'system_prompt_filename': 'system_prompt.j2', 'system_prompt_kwargs': {'cli_mode': True}}, 'workspace': {'kind': 'LocalWorkspace', 'working_dir':          │
│ '/workspace'}, 'persistence_dir': 'workspace/conversations/1a6f38b83ee14db49ef87209cb12daa8', 'max_iterations': 500, 'stuck_detection': True, 'execution_status': 'finished', 'confirmation_policy':         │
│ {'kind': 'NeverConfirm'}, 'activated_knowledge_skills': [], 'stats': {'usage_to_metrics': {'default': {'model_name': 'litellm_proxy/openai/gpt-5-codex', 'accumulated_cost': 0.014467500000000001,           │
│ 'accumulated_token_usage': {'model': 'litellm_proxy/openai/gpt-5-codex', 'prompt_tokens': 6342, 'completion_tokens': 654, 'cache_read_tokens': 0, 'cache_write_tokens': 0, 'reasoning_tokens': 192,          │
│ 'context_window': 0, 'per_turn_token': 6996, 'response_id': ''}, 'costs': [{'model': 'litellm_proxy/openai/gpt-5-codex', 'cost': 0.014467500000000001, 'timestamp': 1762459508.6032553}],                    │
│ 'response_latencies': [{'model': 'litellm_proxy/openai/gpt-5-codex', 'latency': 16.847777605056763, 'response_id':                                                                                           │
│ 'resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOmxpdGVsbG1fcHJveHk7bW9kZWxfaWQ6Tm9uZTtyZXNwb25zZV9pZDpyZXNwX2JHbDBaV3hzYlRwamRYTjBiMjFmYkd4dFgzQnliM1pwWkdWeU9tOXdaVzVoYVR0dGIyUmxiRjlwWkRveVptTXdaR1ZoWkdGallUWm │
│ pabVkxTVdFeVpHVXlZV0ZpWTJRek1XSmhNalkwWWprMVlUZzFOelF6T1dVd01EUmxZakEyTVdSak5XRXpZV1l6Wmpoak8zSmxjM0J2Ym5ObFgybGtPbkpsYzNCZk1HVm1aVFEzTldSbE9HRmtNbU5qTWpBeE5qa3dZMlptTmpObFptVTRPREU1TW1FeU1qTmhNVFF4TUdSbE │
│ 56UTJZbVk9'}], 'token_usages': [{'model': 'litellm_proxy/openai/gpt-5-codex', 'prompt_tokens': 6342, 'completion_tokens': 654, 'cache_read_tokens': 0, 'cache_write_tokens': 0, 'reasoning_tokens': 192,     │
│ 'context_window': 0, 'per_turn_token': 6996, 'response_id':                                                                                                                                                  │
│ 'resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOmxpdGVsbG1fcHJveHk7bW9kZWxfaWQ6Tm9uZTtyZXNwb25zZV9pZDpyZXNwX2JHbDBaV3hzYlRwamRYTjBiMjFmYkd4dFgzQnliM1pwWkdWeU9tOXdaVzVoYVR0dGIyUmxiRjlwWkRveVptTXdaR1ZoWkdGallUWm │
│ pabVkxTVdFeVpHVXlZV0ZpWTJRek1XSmhNalkwWWprMVlUZzFOelF6T1dVd01EUmxZakEyTVdSak5XRXpZV1l6Wmpoak8zSmxjM0J2Ym5ObFgybGtPbkpsYzNCZk1HVm1aVFEzTldSbE9HRmtNbU5qTWpBeE5qa3dZMlptTmpObFptVTRPREU1TW1FeU1qTmhNVFF4TUdSbE │
│ 56UTJZbVk9'}]}}}, 'secret_registry': {'secret_sources': {}}}}                                                                                                                                                │
│                                                                                                                                                                                                              │
╰─────────────────────────────────────────────────────────────────────────────────────────────── (environment) ────────────────────────────────────────────────────────────────────────────────────────────────╯


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPT-5 codex ends conversation early #78

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

GPT-5 codex ends conversation early #78

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions