-
Notifications
You must be signed in to change notification settings - Fork 7
Description
Observed on: 57a49ef
Happens with GPT-5 codex, does not happens with other LLMs with same commit.
Steps to reproduce:
Run
SKIP_BUILD=0 uv run swebench-infer .llm_config/gpt-5-codex.json --num-workers 10 --n-limit 10 --max-iterations 500 --note 20251105-codex-error --max-attempts 1 --critic finish_with_patch --max-retries 3
Notice that conversations end after one message and that output.jsonl contains empty patch.
Output logs:
╭───────────────────────────────────────────────────────────────────────────────────────────── Message from Agent ─────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
│ Phase 1 – READING │
│ │
│ 1. Rewording the Problem │
│ The scikit-learn Pipeline class supports slicing thanks to a recent feature addition, but it still lacks a __len__ method. This omission causes code such as len(pipe)—where pipe is a Pipeline │
│ instance—to raise an error. We need to implement __len__ for Pipeline, thereby making it consistent with other iterable-like behaviors (indexing and slicing). │
│ │
│ 2. Conventions and Details from the Description │
│ - Example usage shows constructing a pipeline via Pipeline([('anova', anova_filter), ('svc', clf)]). │
│ - Attempting len(pipe) on such an instance currently fails, even though slicing like pipe[:] is supported. │
│ - Environment details from the original report confirm the issue appears in scikit-learn 0.21.dev0 with Python 3.6.7 on macOS. │
│ │
│ 3. Clear Explanation of the Issue │
│ The Pipeline object behaves like a sequence (supports iteration/indexing/slicing) but does not implement __len__. Python’s len() function depends on __len__; without it, calling len(pipe) │
│ raises TypeError. Implementing __len__ should return the number of steps in the pipeline. │
│ │
│ 4. Steps to Reproduce the Problem │
│ - Import scikit-learn components (Pipeline, SelectKBest, SVC, etc.). │
│ - Build a pipeline with one or more steps. │
│ - Call len on the pipeline (len(pipe)). │
│ - Observe TypeError: object of type 'Pipeline' has no len(). │
│ │
│ 5. Best Practices for Testing/Fixing │
│ - Ensure the new __len__ method coexists with existing pipeline behaviors (iteration, indexing, slicing). │
│ - Return the count of steps (len(self.steps) or equivalent). │
│ - Handle edge cases—pipelines with zero steps should return 0. │
│ - Keep the implementation minimal and consistent with scikit-learn’s coding style (no redundant comments, clear concise code). │
│ │
│ Reasoning: │
│ │
│ │
╰────────────────────────────────────────────────────────────────────────── Tokens: ↑ input 0 • cache hit N/A • ↓ output 0 • $ 0.00 ───────────────────────────────────────────────────────────────────────────╯
[11/06/25 20:05:08] INFO run() triggered successfully: <Response [200 OK]> remote_conversation.py:589
[DOCKER] {"asctime": "2025-11-06 20:05:08,606", "levelname": "INFO", "name": "uvicorn.access", "client_addr": null, "request_line": null, "status_code": null}
╭──────────────────────────────────────────────────────────────────────────────── UNKNOWN Event: ConversationStateUpdateEvent ─────────────────────────────────────────────────────────────────────────────────╮
│ │
│ Unknown event type: ConversationStateUpdateEvent │
│ {'kind': 'ConversationStateUpdateEvent', 'id': '9107f0a8-83c5-416f-8732-c0248682438c', 'timestamp': '2025-11-06T20:05:08.606179', 'source': 'environment', 'key': 'full_state', 'value': {'id': │
│ '1a6f38b8-3ee1-4db4-9ef8-7209cb12daa8', 'agent': {'kind': 'Agent', 'llm': {'model': 'litellm_proxy/openai/gpt-5-codex', 'api_key': '**********', 'base_url': 'https://llm-proxy.eval.all-hands.dev', │
│ 'openrouter_site_url': 'https://docs.all-hands.dev/', 'openrouter_app_name': 'OpenHands', 'num_retries': 5, 'retry_multiplier': 8.0, 'retry_min_wait': 8, 'retry_max_wait': 64, 'max_message_chars': 30000, │
│ 'temperature': 1.0, 'max_input_tokens': 272000, 'max_output_tokens': 128000, 'drop_params': True, 'modify_params': True, 'disable_stop_word': False, 'caching_prompt': True, 'log_completions': False, │
│ 'log_completions_folder': 'logs/completions', 'native_tool_calling': True, 'reasoning_effort': 'high', 'enable_encrypted_reasoning': False, 'extended_thinking_budget': 200000, 'usage_id': 'default', │
│ 'litellm_extra_body': {}, 'OVERRIDE_ON_SERIALIZE': ['api_key', 'aws_access_key_id', 'aws_secret_access_key']}, 'tools': [{'name': 'terminal', 'params': {}}, {'name': 'file_editor', 'params': {}}, {'name': │
│ 'task_tracker', 'params': {}}], 'mcp_config': {}, 'system_prompt_filename': 'system_prompt.j2', 'system_prompt_kwargs': {'cli_mode': True}}, 'workspace': {'kind': 'LocalWorkspace', 'working_dir': │
│ '/workspace'}, 'persistence_dir': 'workspace/conversations/1a6f38b83ee14db49ef87209cb12daa8', 'max_iterations': 500, 'stuck_detection': True, 'execution_status': 'finished', 'confirmation_policy': │
│ {'kind': 'NeverConfirm'}, 'activated_knowledge_skills': [], 'stats': {'usage_to_metrics': {'default': {'model_name': 'litellm_proxy/openai/gpt-5-codex', 'accumulated_cost': 0.014467500000000001, │
│ 'accumulated_token_usage': {'model': 'litellm_proxy/openai/gpt-5-codex', 'prompt_tokens': 6342, 'completion_tokens': 654, 'cache_read_tokens': 0, 'cache_write_tokens': 0, 'reasoning_tokens': 192, │
│ 'context_window': 0, 'per_turn_token': 6996, 'response_id': ''}, 'costs': [{'model': 'litellm_proxy/openai/gpt-5-codex', 'cost': 0.014467500000000001, 'timestamp': 1762459508.6032553}], │
│ 'response_latencies': [{'model': 'litellm_proxy/openai/gpt-5-codex', 'latency': 16.847777605056763, 'response_id': │
│ 'resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOmxpdGVsbG1fcHJveHk7bW9kZWxfaWQ6Tm9uZTtyZXNwb25zZV9pZDpyZXNwX2JHbDBaV3hzYlRwamRYTjBiMjFmYkd4dFgzQnliM1pwWkdWeU9tOXdaVzVoYVR0dGIyUmxiRjlwWkRveVptTXdaR1ZoWkdGallUWm │
│ pabVkxTVdFeVpHVXlZV0ZpWTJRek1XSmhNalkwWWprMVlUZzFOelF6T1dVd01EUmxZakEyTVdSak5XRXpZV1l6Wmpoak8zSmxjM0J2Ym5ObFgybGtPbkpsYzNCZk1HVm1aVFEzTldSbE9HRmtNbU5qTWpBeE5qa3dZMlptTmpObFptVTRPREU1TW1FeU1qTmhNVFF4TUdSbE │
│ 56UTJZbVk9'}], 'token_usages': [{'model': 'litellm_proxy/openai/gpt-5-codex', 'prompt_tokens': 6342, 'completion_tokens': 654, 'cache_read_tokens': 0, 'cache_write_tokens': 0, 'reasoning_tokens': 192, │
│ 'context_window': 0, 'per_turn_token': 6996, 'response_id': │
│ 'resp_bGl0ZWxsbTpjdXN0b21fbGxtX3Byb3ZpZGVyOmxpdGVsbG1fcHJveHk7bW9kZWxfaWQ6Tm9uZTtyZXNwb25zZV9pZDpyZXNwX2JHbDBaV3hzYlRwamRYTjBiMjFmYkd4dFgzQnliM1pwWkdWeU9tOXdaVzVoYVR0dGIyUmxiRjlwWkRveVptTXdaR1ZoWkdGallUWm │
│ pabVkxTVdFeVpHVXlZV0ZpWTJRek1XSmhNalkwWWprMVlUZzFOelF6T1dVd01EUmxZakEyTVdSak5XRXpZV1l6Wmpoak8zSmxjM0J2Ym5ObFgybGtPbkpsYzNCZk1HVm1aVFEzTldSbE9HRmtNbU5qTWpBeE5qa3dZMlptTmpObFptVTRPREU1TW1FeU1qTmhNVFF4TUdSbE │
│ 56UTJZbVk9'}]}}}, 'secret_registry': {'secret_sources': {}}}} │
│ │
╰─────────────────────────────────────────────────────────────────────────────────────────────── (environment) ────────────────────────────────────────────────────────────────────────────────────────────────╯