WIP: 1.0 #1421

yuyutaotao · 2025-11-04T07:24:21Z

No description provided.

* Initial plan * fix(cli): allow duplicate YAML files in config.yaml Co-authored-by: quanru <[email protected]> * fix(cli): deep clone YAML script to prevent mutation issues * fix(yaml): prevent mutation of flowItem by creating a new object for processing --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: quanru <[email protected]> Co-authored-by: quanruzhuoxiu <[email protected]>

….x (#1325) * refactor(core): remove non-OpenAI SDK support and upgrade to OpenAI 6.x This commit removes support for Anthropic SDK and Azure OpenAI, simplifying the codebase to use only the standard OpenAI SDK with OpenAI-style APIs. Changes: - Remove Anthropic SDK (@anthropic-ai/sdk) dependency - Remove Azure OpenAI specific code and @azure/identity dependency - Remove langsmith wrapper support - Remove proxy agent support (https-proxy-agent, socks-proxy-agent) - Upgrade OpenAI SDK from 4.81.0 to 6.3.0 - Simplify createChatClient function to only create standard OpenAI clients - Remove 'style' parameter from createChatClient return type - Remove all Anthropic-specific message handling code - Add openai 6.3.0 as devDependency to @midscene/shared Benefits: - Cleaner, more maintainable codebase - Reduced dependencies (removed 5 packages) - All AI providers can now be accessed through OpenAI-compatible APIs Breaking Changes: - Anthropic SDK mode no longer supported - Azure OpenAI specific configuration removed - MIDSCENE_LANGSMITH_DEBUG no longer supported - httpAgent/socksProxy removed from createChatClient 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * refactor(core): model provider documentation and remove Azure and Anthropic configurations * Apply suggestion from @Copilot Co-authored-by: Copilot <[email protected]> * feat(core): add proxy support for OpenAI client with HTTP and SOCKS configurations * feat(core): add qwen-vl specific configuration for high resolution images --------- Co-authored-by: Claude <[email protected]> Co-authored-by: yuyutaotao <[email protected]> Co-authored-by: Copilot <[email protected]>

This change ensures that Planning functionality only supports vision language models (VL mode) and removes DOM-based planning support. Changes: - Add validation in ModelConfigManager.getModelConfig() to require VL mode for Planning intent - Remove DOM mode logic from llm-planning.ts (describeUserPage, markupImageForLLM) - Simplify image processing to only support VL mode paths - Add comprehensive JSDoc documentation for Planning VL mode requirement - Add 6 new unit tests covering Planning VL mode validation in both isolated and normal modes - Fix existing tests to provide VL mode for Planning intent Breaking Change: - Planning without VL mode configured will now throw an error with clear instructions - Error message includes all supported VL modes and configuration examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

* chore(core): remove warning msg for gpt-4 * chore(core): remove dom-based locator

* chore(core): refine recorder loop * feat(core): update implementation of recorder

* refactor(core,web-integration,docs): rename API methods for clarity BREAKING CHANGE: Renamed aiAction() to aiAct() and logScreenshot() to recordToReport() for improved naming consistency. The aiAction() method is kept as deprecated for backward compatibility. Changes: - Renamed aiAction() to aiAct() across core and web-integration - Renamed logScreenshot() to recordToReport() - Updated all English and Chinese documentation - Updated code examples in README files - Updated Playwright fixture to support new method names - Added deprecation warning for aiAction() method - Updated all test files and examples This improves API consistency and clarity while maintaining backward compatibility through deprecated methods. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat(yaml): add backward compatibility for aiAction method in YAML flow * fix(core): conditionally add httpAgent to OpenAI client options Fix TypeScript compilation error where httpAgent property doesn't exist in OpenAI 6.x ClientOptions type. Only include httpAgent when a proxy is configured, and use type assertion to bypass the strict type check. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>

* chore(core): update implementation of insight * chore(core): refine error plan * chore(core): refine error plan * chore(core): split tasks into multiple parts * fix(core): fix ci

* chore(release): upgrade all packages to v1.0.0 - Bump version from 0.30.4 to 1.0.0 for all packages - Update Chrome extension manifest version to 0.136 - Update internal package dependencies to 1.0.0 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat(release): add validation to prevent 1.x stable releases - Block publishing of 1.x versions with 'latest' tag - Allow publishing 1.x beta versions (prepatch) - Allow publishing stable versions for other major versions (0.x, 2.x, etc.) This ensures that 1.x releases can only be published as beta versions, preventing accidental stable releases while still allowing testing and pre-release distributions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>

* refactor(core): remove unused getXpathsById method This method was not being used in the codebase. Removed: - Core implementation in shared/src/extractor/locator.ts - Export from shared/src/extractor/index.ts - Implementations in puppeteer/base-page.ts, chrome-extension/page.ts, and static/static-page.ts - All related unit tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * refactor(types): rename AndroidPullParam and AndroidLongPressParam to PullParam and LongPressParam --------- Co-authored-by: Claude <[email protected]>

…1341) * feat(core): support custom OpenAI client instances for observability Enable users to provide custom OpenAI client factory function through AgentOpt.createOpenAIClient, allowing integration with observability tools like langsmith and langfuse. Key changes: - Add CreateOpenAIClientFn type in @midscene/shared/env for creating custom OpenAI clients - Extend AgentOpt interface with optional createOpenAIClient callback - Pass callback through Agent -> ModelConfigManager -> IModelConfig - Inject createOpenAIClient during config initialization for better performance - Update createChatClient to use custom client factory when provided Benefits: - Users can wrap OpenAI clients with langsmith's wrapOpenAI() for tracing - Users can wrap with langfuse's observeOpenAI() for logging - Support different clients for different intents (planning, grounding, VQA, default) - Zero runtime overhead - injection happens during config initialization 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * test(core): add unit tests for custom OpenAI client integration in ModelConfigManager and service-caller * Update packages/shared/tests/unit-test/env/modle-config-manager.test.ts Co-authored-by: Copilot <[email protected]> * refactor(core): remove unused MIDSCENE_API_TYPE constant from service-caller and types --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Copilot <[email protected]>

* chore(ci): enable workflows for PRs targeting 1.0 branch Add 1.0 branch to pull_request triggers in CI and lint workflows to ensure PRs targeting the 1.0 branch run the same checks as PRs to main. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * tests(shared, web-integration): update tests to use runner instead of executor and improve environment setup --------- Co-authored-by: Claude <[email protected]>

* docs(awesome): add midscene java sdk (#1324) * fix(core): support number type for aiInput value field (#1339) * fix(core): support number type for aiInput value field This change allows aiInput.value to accept both string and number types, addressing scenarios where: 1. AI models return numeric values instead of strings 2. YAML files contain unquoted numbers that parse as number type Changes: - Updated type definitions to accept string | number - Added Zod schema transformation to convert numbers to strings - Updated runtime validation to accept both types - Added explicit conversion in YAML player as fallback All conversions happen internally and are transparent to users. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(core): update aiInput type signatures to accept number values Update the TypeScript method signatures for aiInput to accept string | number for the value parameter, matching the runtime implementation. Changes: - New signature: opt parameter now accepts { value: string | number } - Legacy signature: first parameter now accepts string | number - Implementation signature: locatePromptOrValue now accepts TUserPrompt | string | number - Type assertion updated from `as string` to `as string | number` This ensures type safety and allows users to pass number values directly without TypeScript errors, while maintaining backward compatibility with existing string-based usage. Fixes type errors in test cases that use number values. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]> * fix(report): prevent sidebar jitter when expanding case selector (#1344) Fixed sidebar shifting 1-2 pixels when clicking to expand the playwright case selector. The issue was caused by adding a border only in the expanded state, causing a sudden height change. Solution: Added transparent border to the collapsed state, ensuring consistent height across both states. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]> * refactor(core): unify cache config parameters (#1346) Simplified `processCacheConfig` function signature from 3 to 2 parameters. Unified `fallbackId` and `cacheId` into single `cacheId` parameter. BREAKING CHANGE: processCacheConfig signature changed Changed from: processCacheConfig(cache, fallbackId, cacheId?) To: processCacheConfig(cache, cacheId) The cacheId parameter now serves dual purpose: 1. Fallback ID when cache is true or cache object lacks ID 2. Legacy cacheId when cache is undefined (requires MIDSCENE_CACHE env) Updated call sites: - packages/core/src/agent/agent.ts - packages/web-integration/src/playwright/ai-fixture.ts - packages/cli/src/create-yaml-player.ts (4 locations) Added comprehensive test coverage for legacy compatibility mode: - process-cache-config.test.ts: 18 tests passing - create-yaml-player.test.ts: 13 tests passing (6 new) - playwright-ai-fixture-cache.test.ts: 8 tests passing (3 new) Benefits: - Simpler API with fewer parameters - Unified semantics for new and legacy use cases - Full backward compatibility maintained - Better test coverage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]> * fix(core,web-integration): fix unit tests after merging main branch This commit fixes unit test failures that occurred after merging the main branch into the 1.0 branch. The issues were caused by temporal conflicts between commits that added new features and subsequent refactoring. Root Cause: - Commit 13b4f1d added aiInput number support with tests using 'executor' - Commit c9b385b refactored Executor → TaskRunner in the 1.0 branch - When main was merged, tests still referenced 'executor' but code used 'runner' Changes: 1. Fix YAML player aiInput number conversion (packages/core/src/yaml/player.ts): - Extract 'value' field separately to prevent spread override - Ensure number values are converted to strings via String(value) - Maintain backward compatibility for empty string handling 2. Fix test mock structure (packages/web-integration/tests/unit-test/ai-input-number-value.test.ts): - Update all mock objects from 'executor' to 'runner' - Aligns with TaskRunner API refactoring 3. Fix cache config test (packages/web-integration/tests/unit-test/playwright-ai-fixture-cache.test.ts): - Move vi.mock() before imports to ensure proper module hoisting - Fixes legacy mode environment variable checks 4. Add value conversion in agent.ts (optional improvement): - Explicitly convert number to string in aiInput method - Improves code clarity and test stability All tests now pass (195 passed, 1 skipped). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: yuyutaotao <[email protected]> Co-authored-by: Claude <[email protected]>

* chore(core): update types of task executor * chore(core): update sleep tasks * chore(core): update types for planning * feat(core): update subTask flag

* chore(lint): fix linting and formatting issues - Fix useless switch case in modle-config-manager.test.ts - Format package.json files for consistency - Apply code formatting across core, agent, and related files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * chore(deps): update openai package to version 6.3.0 --------- Co-authored-by: Claude <[email protected]>

* feat(chrome-extension): enable hot reload for development This commit adds hot reload support for chrome-extension development, significantly improving the development experience. Main changes: - Add web-ext integration for automatic extension reloading - Add wait-for-build.js script to ensure build completes first - Update dev script to use concurrently for build watch + web-ext - Add web-ext-config.cjs for web-ext configuration To fix build stability during hot reload: - Replace npm-watch with rslib native watch mode in visualizer - Standardize dev/build:watch script relationship across packages - This prevents dist directory deletion during rebuilds The rslib native watch mode performs incremental builds without deleting the dist directory, preventing "Module not found" errors when chrome-extension references @midscene/visualizer. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(chrome-extension): wait for JS bundles before starting web-ext The previous implementation only checked for static files (manifest.json, index.html) which are copied early in the build process. This caused web-ext to start before the JavaScript bundles were built, resulting in errors. Now we check for the actual build outputs: - dist/static/js/index.js - dist/static/js/popup.js This ensures web-ext only starts after Rsbuild has completed the full build. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * chore(deps): align Rsbuild plugin versions across workspace Update all Rsbuild plugins to use consistent versions: - @rsbuild/plugin-less: 1.5.0 - @rsbuild/plugin-node-polyfill: 1.4.2 - @rsbuild/plugin-react: 1.4.1 - @rsbuild/plugin-svgr: 1.2.2 - @rsbuild/plugin-type-check: 1.2.4 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>

* docs(awesome): add midscene java sdk (#1324) * fix(core): support number type for aiInput value field (#1339) * fix(core): support number type for aiInput value field This change allows aiInput.value to accept both string and number types, addressing scenarios where: 1. AI models return numeric values instead of strings 2. YAML files contain unquoted numbers that parse as number type Changes: - Updated type definitions to accept string | number - Added Zod schema transformation to convert numbers to strings - Updated runtime validation to accept both types - Added explicit conversion in YAML player as fallback All conversions happen internally and are transparent to users. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(core): update aiInput type signatures to accept number values Update the TypeScript method signatures for aiInput to accept string | number for the value parameter, matching the runtime implementation. Changes: - New signature: opt parameter now accepts { value: string | number } - Legacy signature: first parameter now accepts string | number - Implementation signature: locatePromptOrValue now accepts TUserPrompt | string | number - Type assertion updated from `as string` to `as string | number` This ensures type safety and allows users to pass number values directly without TypeScript errors, while maintaining backward compatibility with existing string-based usage. Fixes type errors in test cases that use number values. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]> * fix(report): prevent sidebar jitter when expanding case selector (#1344) Fixed sidebar shifting 1-2 pixels when clicking to expand the playwright case selector. The issue was caused by adding a border only in the expanded state, causing a sudden height change. Solution: Added transparent border to the collapsed state, ensuring consistent height across both states. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]> * refactor(core): unify cache config parameters (#1346) Simplified `processCacheConfig` function signature from 3 to 2 parameters. Unified `fallbackId` and `cacheId` into single `cacheId` parameter. BREAKING CHANGE: processCacheConfig signature changed Changed from: processCacheConfig(cache, fallbackId, cacheId?) To: processCacheConfig(cache, cacheId) The cacheId parameter now serves dual purpose: 1. Fallback ID when cache is true or cache object lacks ID 2. Legacy cacheId when cache is undefined (requires MIDSCENE_CACHE env) Updated call sites: - packages/core/src/agent/agent.ts - packages/web-integration/src/playwright/ai-fixture.ts - packages/cli/src/create-yaml-player.ts (4 locations) Added comprehensive test coverage for legacy compatibility mode: - process-cache-config.test.ts: 18 tests passing - create-yaml-player.test.ts: 13 tests passing (6 new) - playwright-ai-fixture-cache.test.ts: 8 tests passing (3 new) Benefits: - Simpler API with fewer parameters - Unified semantics for new and legacy use cases - Full backward compatibility maintained - Better test coverage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]> * release: v0.30.5 * docs(site): optimize v0.30 changelog with user-focused improvements (#1352) Improved the v0.30 changelog to be more user-centric and less promotional: - Reduced hyperbolic language ("comprehensive upgrade" → "improved", etc.) - Reorganized content structure with clearer user value sections - Added specific usage scenarios and examples for cache strategies - Enhanced mobile platform sections with iOS and Android subsections - Simplified technical descriptions to be more objective - Added cross-platform consistency section for ClearInput feature - Translated optimized content to English version These changes make the changelog more professional and easier for users to understand the actual benefits of the update. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]> * fix(ios): correct horizontal scroll direction and improve swipe implementation (#1358) * fix(ios): correct horizontal scroll direction and improve swipe implementation Fixed two issues with iOS horizontal scrolling: 1. **Corrected scroll direction semantics** - scrollLeft now swipes right (brings left content into view) - scrollRight now swipes left (brings right content into view) - This aligns with Android and Web scroll behavior where the direction indicates which content enters the viewport 2. **Improved swipe implementation** - Implemented W3C Actions API for better scroll support - Falls back to dragfromtoforduration if Actions API fails - Increased scroll distance from width/3 to width*0.7 (70%) to prevent bounce-back 3. **Fixed scrollUntilBoundary directions** - Corrected left/right swipe directions in boundary detection 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * refactor(ios): remove fallback from swipe method, use W3C Actions API only --------- Co-authored-by: Claude <[email protected]> * feat(android-playground): enable alwaysFetchScreenInfo for AndroidDevice (#1363) * fix(docs): add alwaysFetchScreenInfo parameter to AndroidDevice constructor documentation * feat(android-playground): enable alwaysFetchScreenInfo for AndroidDevice Configure AndroidDevice instance with alwaysFetchScreenInfo option set to true to ensure screen information is always fetched during device operations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(android): rename alwaysFetchScreenInfo to alwaysRefreshScreenInfo for consistency --------- Co-authored-by: Claude <[email protected]> * fix(core): handle ZodEffects and ZodUnion in schema parsing (#1359) * fix(core): handle ZodEffects and ZodUnion in schema parsing - Add support for ZodEffects (transformations) in getTypeName and getDescription - Add support for ZodUnion types with proper type display (type1 | type2) - Fixes "failed to parse Zod type" warning on first execution with caching 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * test(core): add tests for descriptionForAction with ZodEffects and ZodUnion * chore(core): update test cases --------- Co-authored-by: Claude <[email protected]> Co-authored-by: yutao <[email protected]> * feat(playground): implement task cancellation for Android/iOS playgrounds (#1355) * feat(playground): implement task cancellation for Android/iOS playgrounds This PR implements task cancellation functionality for Android and iOS playgrounds using a singleton + recreation pattern. When users clicked the "Stop" button in Android/iOS playground, the task continued to execute and control the device via ADB commands. This was because: - Agent instances were global singletons created at server startup - The /cancel endpoint only deleted progress tips without stopping execution - There was no mechanism to interrupt ongoing tasks Implemented a singleton + recreation pattern: - PlaygroundServer now accepts factory functions instead of instances - Added task locking mechanism (currentTaskId) to prevent concurrent tasks - When cancel is triggered, the agent is destroyed and recreated - Device operations stop immediately as destroyed agents reject new commands 1. **PlaygroundServer** (packages/playground/src/server.ts) - Added factory function support for page and agent creation - Added `recreateAgent()` method to destroy and recreate agent - Added `currentTaskId` to track running tasks - Enhanced `/execute` endpoint with task conflict detection - Enhanced `/cancel` endpoint to recreate agent on cancellation - Backward compatible with existing instance-based usage 2. **Android Playground** (packages/android-playground/src/bin.ts) - Updated to use factory pattern for server creation - Each recreation creates fresh AndroidDevice and AndroidAgent instances 3. **iOS Playground** (packages/ios/src/bin.ts) - Updated to use factory pattern for server creation - Each recreation creates fresh IOSDevice and IOSAgent instances - Added test script `test-cancel-android.sh` for automated testing - Manual testing confirmed device operations stop when cancel is triggered ``` User clicks Stop ↓ Frontend calls /cancel/:requestId ↓ Server checks if current running task ↓ Call recreateAgent() ├─ Destroy old agent (agent.destroy()) ├─ Destroy old device (device.destroy()) ├─ Create new device (pageFactory()) └─ Create new agent (agentFactory(device)) ↓ Clear task lock and progress tips ↓ Device stops operations ✅ ``` - ✅ Simple implementation (minimal code changes) - ✅ Effective cancellation (destroy() immediately sets destroyed flag) - ✅ Backward compatible (still accepts instances) - ✅ Natural serialization (one task at a time per device) ```bash pnpm run android:playground ./test-cancel-android.sh ``` 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(page): ensure keyboard actions return promises for better async handling * refactor(playground): update PlaygroundServer to use agent factories and simplify server creation * fix(ios): round coordinates for tap and swipe actions to improve accuracy * fix(android): round coordinates in scrolling and gesture methods for improved accuracy * refactor(playground): simplify PlaygroundServer instantiation and improve code readability --------- Co-authored-by: Claude <[email protected]> * fix(yaml): skip environment variable interpolation in YAML comments (#1361) * Initial plan * fix(yaml): skip environment variable interpolation in YAML comments * style(yaml): apply biome linting fixes Co-authored-by: quanru <[email protected]> --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: quanru <[email protected]> * fix(core): handle null data in WaitFor and support array keyName in KeyboardPress (#1354) * fix(core): handle null data in WaitFor and support array keyName in KeyboardPress This commit fixes two critical bugs: 1. **Fix null data handling in task execution** - Fixed TypeError when AI extract() returns null for WaitFor operations - Added null/undefined check before accessing data properties - WaitFor operations now return false when data is null (condition not met) - Other operations (Assert, Query, String, Number) return null when data is null - Location: src/agent/tasks.ts:936-938 2. **Add array support for keyName in KeyboardPress** - Updated actionKeyboardPressParamSchema to accept string | string[] - Allows key combinations like ['Control', 'A'] for keyboard shortcuts - Maintains backward compatibility with string format - Updated type definitions in aiKeyboardPress method - Locations: - src/device/index.ts:197-199 - src/agent/agent.ts:575-622 **Test Coverage:** - Added comprehensive unit tests for null data handling (8 test cases) - Added unit tests for keyName array validation (7 test cases) - All tests verify edge cases and expected behavior Fixes issue where executor crashed with: "TypeError: Cannot read properties of null (reading 'StatementIsTruthy')" And fixes parameter validation error: "Invalid parameters for action KeyboardPress: Expected string, received array" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(ios,android): handle array keyName in KeyboardPress action - Updated iOS and Android device implementations to handle keyName as string | string[] - For mobile devices, array keys are joined with '+' (e.g., ['Control', 'A'] becomes 'Control+A') - This fixes TypeScript compilation errors in iOS and Android packages - Maintains backward compatibility with string format Related to the KeyboardPress array support added in the previous commit. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * refactor(ios,android): improve KeyboardPress array handling - Remove incorrect join('+') approach that doesn't work on mobile devices - Use last key from array instead (e.g., ['Control', 'A'] → 'A') - Add clear warning messages when array input is used on mobile platforms - Mobile devices don't support keyboard combinations, this is a graceful degradation This makes the behavior more predictable and provides better feedback to developers. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * test(core): fix TaskExecutor constructor arguments in null data tests - Fixed TaskExecutor constructor call to match actual signature - Constructor requires (interface, insight, options) instead of (insight, interface) - All 8 tests now passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(ios,android): improve logging for unsupported key combinations in device input * fix(core): handle null data in WaitFor and improve keyName parameter description This commit fixes the null data handling bug and improves the KeyboardPress parameter description. ## Changes: ### 1. Fix null data handling in task execution - Fixed TypeError when AI extract() returns null for WaitFor operations - Added null/undefined check before accessing data properties (tasks.ts:936-938) - WaitFor operations now return false when data is null (condition not met) - Other operations (Assert, Query, String, Number) return null when data is null ### 2. Improve KeyboardPress parameter description - Reverted keyName to only accept string type (not array) - Added clear description: "Use '+' for key combinations, e.g., 'Control+A', 'Shift+Enter'" - This provides better guidance to AI for generating key combinations - Simplified iOS/Android implementations (no special array handling needed) ### 3. Test coverage - Added 8 unit tests for null data handling - Updated KeyboardPress tests to validate string-only format - Added test for key combination strings (e.g., 'Control+A') - Added test to verify arrays are rejected - Fixed unused variable warning in test file ## Fixed Issues: **Issue 1:** Executor crashes with null data ``` TypeError: Cannot read properties of null (reading 'StatementIsTruthy') ``` **Issue 2:** Unclear how to specify key combinations - Now clearly documented in parameter description with examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs(core): align KeyboardPress action description with parameter schema Updated the KeyboardPress action description to explicitly mention support for key combinations (e.g., "Control+A", "Shift+Enter"), making it consistent with the keyName parameter description that already documented this functionality. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(core): handle null and undefined data in WaitFor output processing --------- Co-authored-by: Claude <[email protected]> * perf(android): optimize clearInput performance by batching keyevents (#1366) * perf(android): optimize clearInput performance by batching keyevents Replace serial keyevent(67) calls with clearTextField() method from appium-adb library, which batches all keyevents into a single shell command. Performance improvement: - Before: ~50 seconds (100 sequential shell calls, ~500ms each) - After: ~1-2 seconds (single batched shell command) - Speedup: 25-50x Changes: - Use adb.clearTextField(100) instead of repeat(() => adb.keyevent(67)) - Add clearTextField mock to unit tests for compatibility All 75 unit tests passing, build successful. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(android): include device pixel ratio in size calculation for AndroidDevice --------- Co-authored-by: Claude <[email protected]> * release: v0.30.6 * fix(tests): enhance null data handling tests by adding uiContext parameter --------- Co-authored-by: yuyutaotao <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: yutao <[email protected]> Co-authored-by: Copilot <[email protected]> Co-authored-by: quanru <[email protected]>

…ication (#1365) * feat(bridge-mode): add remote access support for cross-machine communication This commit implements remote access capability for Bridge Mode, enabling communication between server and client on different machines. ## Changes ### Core Features - Server side: Added `allowRemoteAccess` option to bind server to 0.0.0.0 - Server side: Added `host` and `port` options for custom configuration - Client side: Added server URL configuration UI in Chrome extension - Configuration priority: host > allowRemoteAccess > default (127.0.0.1) ### Modified Files - packages/web-integration/src/bridge-mode/: - common.ts: Added getBridgeServerHost() helper function - io-server.ts: Modified to support custom host binding - agent-cli-side.ts: Added remote access options to constructor - page-browser-side.ts: Added server endpoint parameter support - apps/chrome-extension/src/: - extension/bridge/index.tsx: Added server URL configuration UI - extension/bridge/index.less: Added styles for configuration section - utils/bridgeConnector.ts: Support custom server endpoint - packages/web-integration/tests/: - ai/bridge/remote-access.test.ts: Added comprehensive tests - unit-test/bridge/io.test.ts: Updated tests for new API ### Documentation - Updated docs in apps/site/docs/{en,zh}/bridge-mode-by-chrome-extension.mdx - Added remote access configuration section with examples - Added security warnings for remote access usage ## API Changes New constructor options: - allowRemoteAccess: Enable remote access - host: Custom host (optional) - port: Custom port (optional) ## Backward Compatibility - All existing code works without modification - Default behavior unchanged (localhost only) - All unit tests passing ## Security - Default remains secure (127.0.0.1 only) - Remote access requires explicit opt-in - Documentation includes security warnings 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(bridge): resolve race condition in server initialization Fix the 'xhr poll error' by ensuring all Socket.IO middleware and event handlers are set up BEFORE calling httpServer.listen(). This eliminates the race condition where clients could attempt to connect before the server was fully ready. Changes: - Moved Socket.IO middleware setup before httpServer.listen() - Moved Socket.IO connection handlers before httpServer.listen() - Moved httpServer.listen() to the end of initialization sequence Fixes failing unit tests in packages/web-integration/tests/unit-test/ bridge/io.test.ts (all 15 tests now passing) * fix(web-integration): add delay to ensure Socket.IO is fully ready in server initialization * fix(bridge-server): improve HTTP server setup and event handling order * fix(bridge): improve server URL handling and localStorage management * feat(bridge): enhance server configuration UI with expandable section and improved styling * Update packages/web-integration/tests/ai/bridge/remote-access.test.ts Co-authored-by: Copilot <[email protected]> * Update packages/web-integration/tests/ai/bridge/remote-access.test.ts Co-authored-by: Copilot <[email protected]> * Update packages/web-integration/tests/ai/bridge/remote-access.test.ts Co-authored-by: Copilot <[email protected]> * Update packages/web-integration/tests/ai/bridge/remote-access.test.ts Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Copilot <[email protected]>

#1377) ## Problem The previous nano-staged configuration had two issues: 1. Used `biome check .` which checked the entire project instead of only staged files 2. nano-staged doesn't automatically re-stage fixed files, causing commits to fail ## Solution Switched to lint-staged which: - Automatically passes only staged files to biome - Re-stages files after fixes are applied - More mature and widely adopted ## Changes - Replaced nano-staged with lint-staged in pre-commit hook - Updated biome command to remove project-wide checks - Added lint-staged as dev dependency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

* feat(yaml): support all device options in YAML configuration This PR enables YAML scripts to use all Android and iOS device options by centralizing device option types and ensuring runtime configuration propagation. Changes: - Created packages/core/src/device/device-options.ts to centralize all device option type definitions (AndroidDeviceOpt, IOSDeviceOpt) - Updated MidsceneYamlScriptAndroidEnv and MidsceneYamlScriptIOSEnv to extend device options using Omit<> to exclude programmatic fields - Fixed runtime configuration passing in create-yaml-player.ts to forward all YAML config options to device constructors - Simplified agent creation functions to pass entire options object instead of manually listing each parameter YAML scripts can now configure: Android: - androidAdbPath, remoteAdbHost, remoteAdbPort - imeStrategy, displayId, usePhysicalDisplayIdForScreenshot - screenshotResizeScale, alwaysFetchScreenInfo - autoDismissKeyboard, keyboardDismissStrategy iOS: - deviceId, useWDA, wdaPort, wdaHost - autoDismissKeyboard 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * test(yaml): add unit tests for device options propagation Add comprehensive unit tests to verify that all device options are correctly passed from YAML configuration to device constructors. Tests include: - Android device options propagation from YAML to agentFromAdbDevice - iOS device options propagation from YAML to agentFromWebDriverAgent - Type definitions for AndroidDeviceOpt and IOSDeviceOpt - YAML environment types (MidsceneYamlScriptAndroidEnv, MidsceneYamlScriptIOSEnv) - Validation that customActions is excluded from YAML types - IME strategy and keyboard dismiss strategy type validations - Minimal and full configuration scenarios All 31 tests passing (17 in CLI, 14 in Core). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(android): ensure empty object is passed when opts is undefined Fix failing unit tests by ensuring an empty object is passed to AndroidDevice and IOSDevice constructors when opts is undefined, maintaining backward compatibility with existing tests. Changes: - Updated agentFromAdbDevice to pass opts || {} to AndroidDevice - Updated agentFromWebDriverAgent to pass opts || {} to IOSDevice This ensures the constructors always receive an object instead of undefined, which is what the existing tests expect. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(device-options): rename alwaysFetchScreenInfo to alwaysRefreshScreenInfo for clarity * docs(site): update Android and iOS sections to include all configuration options from their respective constructors --------- Co-authored-by: Claude <[email protected]>

Update the task type display names in report sidebar and detail views: - Change "Insight / Query" and "Insight / Assert" to "Insight" - Change "Action / {subType}" to "Action Space / {subType}" - Show "Planning / Plan" instead of just "Planning" - Keep other task types unchanged (e.g., "Planning / Locate") This provides clearer and more consistent naming for different task types in the report UI, making it easier to understand the task hierarchy and categorization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

…1381) This change improves code consistency by using clonedYamlScript.agent instead of mixing yamlScript.agent and clonedYamlScript for other properties throughout the agent initialization code. Changes: - Use clonedYamlScript.agent consistently across all agent types (puppeteer, bridge mode, Android, iOS, and interface) - This ensures all configuration comes from the same cloned instance, preventing potential mutation issues when the same YAML file is executed multiple times - Added comprehensive unit tests to verify aiActionContext is properly passed to Android, iOS, and bridge mode agents This is a code quality improvement that makes the codebase more maintainable and aligns with the original design intent of using structuredClone to isolate each ScriptPlayer instance. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

…1375) * refactor(env): modernize model configuration environment variables This PR refactors the model configuration system with improved naming conventions and better type safety while maintaining backward compatibility. Key Changes: 1. Environment Variable Naming Convention Updates: - Renamed OPENAI_* → MODEL_* for public API variables * OPENAI_API_KEY → MODEL_API_KEY (deprecated, backward compatible) * OPENAI_BASE_URL → MODEL_BASE_URL (deprecated, backward compatible) - Renamed MIDSCENE_*_VL_MODE → MIDSCENE_*_LOCATOR_MODE across all intents * MIDSCENE_VL_MODE → MIDSCENE_LOCATOR_MODE * MIDSCENE_VQA_VL_MODE → MIDSCENE_VQA_LOCATOR_MODE * MIDSCENE_PLANNING_VL_MODE → MIDSCENE_PLANNING_LOCATOR_MODE * MIDSCENE_GROUNDING_VL_MODE → MIDSCENE_GROUNDING_LOCATOR_MODE - Updated all internal MIDSCENE_*_OPENAI_* → MIDSCENE_*_MODEL_* * MIDSCENE_VQA_OPENAI_API_KEY → MIDSCENE_VQA_MODEL_API_KEY * MIDSCENE_PLANNING_OPENAI_API_KEY → MIDSCENE_PLANNING_MODEL_API_KEY * MIDSCENE_GROUNDING_OPENAI_API_KEY → MIDSCENE_GROUNDING_MODEL_API_KEY * (and corresponding BASE_URL variables) 2. Type System Improvements: - Split TModelConfigFn into public and internal types - Public API (TModelConfigFn) no longer exposes 'intent' parameter - Internal type (TModelConfigFnInternal) maintains intent parameter - Users can still optionally use intent parameter via type casting 3. Backward Compatibility: - Maintained compatibility for documented public variables (OPENAI_API_KEY, OPENAI_BASE_URL) - New variables take precedence, fallback to legacy names if not set - Only public documented variables are deprecated, internal variables renamed directly 4. Updated Files: - packages/shared/src/env/types.ts - Type definitions and constants - packages/shared/src/env/constants.ts - Config key mappings - packages/shared/src/env/decide-model-config.ts - Compatibility logic - packages/shared/src/env/model-config-manager.ts - Type casting implementation - packages/shared/src/env/init-debug.ts - Debug variable updates - All test files updated to use new variable names Testing: - All 24 model-config-manager tests passing - Overall test suite: 241 tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Update packages/shared/src/env/constants.ts Co-authored-by: Copilot <[email protected]> * test(env): add comprehensive backward compatibility tests for OPENAI_* variables - Added test suite to verify MODEL_API_KEY/MODEL_BASE_URL take precedence - Added test to ensure OPENAI_API_KEY/OPENAI_BASE_URL still work as fallback - Fixed compatibility logic to prioritize new variables over legacy ones - All 13 tests passing, including 5 new backward compatibility tests Test coverage: ✓ Using only legacy variables (OPENAI_API_KEY) ✓ Using only new variables (MODEL_API_KEY) ✓ Mixing new and legacy variables (new takes precedence) ✓ Individual precedence for API_KEY and BASE_URL 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(test): reset MIDSCENE_CACHE in beforeEach to avoid .env interference The test 'should return the correct value from override' was failing because .env file sets MIDSCENE_CACHE=1. This was polluting the test environment and causing the test to expect false but receive true. Fixed by explicitly resetting MIDSCENE_CACHE to empty string in beforeEach. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs(site): update environment variable names and add advanced configuration examples for agents --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Copilot <[email protected]>

* refactor(core): remove tree info in uiContext * chore(core): fix lint * chore(core): remove dom-based locator * fix(core): test cases * chore(core): fix lint * fix(core): test cases

* feat(core): update signature of warp-openai * docs(site): update createOpenAIClient API documentation Update the documentation for createOpenAIClient to reflect the new signature: - Changed from factory function to wrapper function - Now receives base OpenAI instance and options - Returns Promise<OpenAI | undefined> - Updated examples to show async wrapper pattern - Removed unnecessary OpenAI import from examples 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: quanruzhuoxiu <[email protected]> Co-authored-by: Claude <[email protected]>

@deprecated

…nt variables (#1388) Add backward compatibility support for legacy MIDSCENE_OPENAI_* environment variables: - MIDSCENE_OPENAI_INIT_CONFIG_JSON (now MIDSCENE_MODEL_INIT_CONFIG_JSON) - MIDSCENE_OPENAI_HTTP_PROXY (now MIDSCENE_MODEL_HTTP_PROXY) - MIDSCENE_OPENAI_SOCKS_PROXY (now MIDSCENE_MODEL_SOCKS_PROXY) Changes: - Add deprecated constants to types.ts with @deprecated tags - Add legacy variables to MODEL_ENV_KEYS for overrideAIConfig support - Update DEFAULT_MODEL_CONFIG_KEYS_LEGACY to use legacy variable names - Implement priority fallback logic in decide-model-config.ts (new variables take precedence) - Update documentation (zh/en model-provider.mdx) with deprecation notices All 139 tests pass, confirming backward compatibility works correctly. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

) * feat(android): add screenshot polling fallback for remote devices Implement automatic fallback to screenshot polling mode when connecting to remote Android devices (IP:Port format), since scrcpy cannot connect to remote adb devices. Changes: - Refactor ScreenshotViewer to shared component in @midscene/visualizer with function-based props - Add /api/screenshot endpoint in ScrcpyServer using adb screencap - Add device type detection to distinguish local vs remote devices - Conditionally render ScrcpyPlayer (real-time) for local devices or ScreenshotViewer (polling) for remote devices - Update playground app to use new shared ScreenshotViewer component 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(visualizer): import ExecutionTaskInsightLocate from types module Fix TypeScript build error by importing ExecutionTaskInsightLocate directly from @midscene/core/types instead of the main export. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(visualizer): define local ExecutionTaskInsightLocate interface Define ExecutionTaskInsightLocate as a local interface instead of importing from @midscene/core to resolve TypeScript build errors. This type is not properly exported from the core package's type declarations. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * refactor(android): use PlaygroundServer screenshot API instead of duplicating in ScrcpyServer Remove duplicate screenshot implementation from ScrcpyServer and use the existing PlaygroundServer /screenshot endpoint which already calls AndroidDevice.screenshotBase64(). This eliminates code duplication and leverages the existing infrastructure. Changes: - Remove /api/screenshot endpoint from ScrcpyServer - Update App.tsx to call PlaygroundServer's /screenshot endpoint (port 9412) - Also use PlaygroundServer's /interface-info endpoint for consistency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>

…nents (#1392) This change consolidates all PlaygroundSDK creation logic for report components into a single shared utility module. Changes: - Created `apps/report/src/utils/report-playground-utils.ts` with `getReportPlaygroundSDK(serviceMode, agent?)` function - Removed duplicate `getPlaygroundSDK` implementations from playground.tsx and playground/index.tsx - Updated open-in-playground/index.tsx to use the shared function - Removed unnecessary `createReportPlaygroundSDK` wrapper function - All report components now use `PLAYGROUND_SERVER_PORT` constant from shared package Benefits: - Single source of truth for PlaygroundSDK creation in report components - Static report files always connect to localhost:5800 - Reduced code duplication and improved maintainability 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

* refactor(core): rename Insight class to Service This is a comprehensive refactoring that renames the Insight class and all related types to Service for better semantic clarity. Changes: - Renamed directories: insight/ -> service/ - Renamed test files: insight.test.ts -> service.test.ts - Updated 50+ type definitions - Modified 18+ source files - Synchronized all test files - Updated external package dependencies Core updates: - Class: Insight -> Service - Interface: InsightOptions -> ServiceOptions - All InsightX types -> ServiceX types - String literal 'Insight' -> 'Service' Affected files: - src/index.ts, src/yaml.ts, src/task-runner.ts - src/agent/*.ts (agent, tasks, task-builder, ui-utils) - tests/utils.ts and all test files - External: chrome-extension, evaluation, report Verification: - TypeScript: 0 errors - Lint: 530 files passed - Build: successful (341.1 kB) 🤖 Generated with Claude Code Co-Authored-By: Claude <[email protected]> * fix(visualizer): update Insight references to Service - Updated ExecutionTaskInsightLocate to ExecutionTaskServiceLocate - Changed task.type check from 'Insight' to 'Service' - Renamed insightTask variable to serviceTask for consistency * fix(report): update Insight references to Service - Updated ExecutionTaskInsightLocate to ExecutionTaskServiceLocate in sidebar, detail-side, and detail-panel components - Changed task.type checks from 'Insight' to 'Service' - Updated ExecutionTaskInsightAssertion to ExecutionTaskServiceAssertion - Ensures report UI displays Service tasks correctly * chore(tests): update comments from Insight to Service * fix(tests): change task type from 'Insight' to 'Service' in tests - Updated aiaction-cacheable.test.ts - Updated page-task-executor-waitFor.test.ts - Completes the Insight to Service refactoring * fix(tests): update test expectations from 'Insight' to 'Service' - Updated task-builder.test.ts expectations - Updated page-task-executor-rightclick.test.ts expectations - Fixes CI test failures * refactor(core): use 'Insight' for ExecutionTask types Keep Service class name but restore ExecutionTask type to 'Insight' for consistency with UI display requirements. Changes: - ExecutionTaskType: 'Service' → 'Insight' - All ExecutionTaskService* types → ExecutionTaskInsight* - Runtime checks: task.type === 'Service' → task.type === 'Insight' - ui-utils.ts: Removed special handling for Query/Assert subtypes to display "Insight / Query" and "Insight / Assert" correctly Type display now follows the expected pattern: - Planning / Plan - Planning / Locate - Action Space / {interface} - Insight / Query - Insight / Assert - Insight / Locate Files modified: - packages/core/src/types.ts - packages/core/src/agent/*.ts - packages/core/src/task-runner.ts - packages/visualizer/src/utils/replay-scripts.ts - apps/report/src/components/**/*.tsx - All test files 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>

Fixed ambiguous descriptions about sequential vs parallel execution: - Updated --files parameter description to clearly state that files execute sequentially by default (when --concurrent=1) and can run concurrently with --concurrent parameter - Removed misleading "run in parallel" text from example that doesn't use --concurrent parameter 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

Add explicit error throwing for failed Assert tasks with detailed assertion failure messages including the AI's thought process. This change brings the 1.0 branch in line with the main branch commit 4761a6c, ensuring that Assert tasks fail explicitly when the AI cannot verify the condition, rather than silently returning null values. Changes: - Add error throwing for failed Assert tasks in tasks.ts - Update test to expect error instead of null output 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

* feat(core): make aiActionContext as a param for planning * feat(core): make aiActionContext as a param for planning

* Initial plan * feat(core): add runAdbShell support to YAML schema - Added MidsceneYamlFlowItemRunAdbShell type definition in yaml.ts - Added runAdbShell flow item handler in player.ts - Added comprehensive unit tests for runAdbShell functionality - Added sample YAML fixture demonstrating runAdbShell usage - All tests pass, no breaking changes introduced Fixes issue where runAdbShell was not supported in YAML mode * refactor(core): replace hardcoded YAML actions with ActionSpace mechanism - Modified DeviceAction to support return values (Promise<any> instead of Promise<void>) - Enhanced Player to automatically detect locate-based vs non-locate actions - Moved runAdbShell and launch to AndroidDevice.actionSpace as platform-specific actions - Removed hardcoded runAdbShell handling from Player - Removed MidsceneYamlFlowItemRunAdbShell type definition - Updated tests to use ActionSpace approach with proper paramSchema Benefits: - No need to modify core code when adding new platform-specific methods - Type-safe through Zod schema validation - Automatic action discovery via ActionSpace - YAML syntax remains backward compatible 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat(ios): add platform-specific actions for WebDriverAgent API requests * feat(core): enhance runWdaRequest support in ScriptPlayer with parameter handling * feat(docs): add platform-specific actions for Android and iOS in YAML scripts * test(core): update YAML player test snapshots for ActionSpace Updated test snapshots to reflect the new parameter structure after ActionSpace refactoring. The changes align with how locate parameters are now built and passed to actions. * refactor(tests): move platform-specific YAML tests to respective packages - Move yaml-runAdbShell.test.ts from core to packages/android - Move yaml-runWdaRequest.test.ts from core to packages/ios - Update imports to use @midscene/core and @midscene/core/yaml - Add zod as devDependency to android and ios packages - Remove unused runAdbShell-test.yaml fixture file - All tests passing (79 tests in android, 82 tests in ios) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(core): add delayAfterRunner property to DeviceAction interface Add missing delayAfterRunner optional property to DeviceAction interface to fix TypeScript compilation error in task-builder.ts 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat(tests): add device info checks for Android and iOS in YAML scripts * feat(docs): update Android and iOS integration guides to include YAML script usage and new agent methods * feat(android, ios): enhance type safety for action parameters and responses in agents and devices * feat(tests): add environment variable setup for IOSAgent model configuration * feat(android, ios): refactor platform-specific action definitions to use createPlatformActions for better maintainability * refactor(android, ios): consolidate platform-specific action definitions for improved maintainability * feat(core): update design pattern for action wrapper * feat(core): update design pattern for action wrapper * feat(core): update design pattern for action wrapper * feat(core): update design pattern for action wrapper * feat(ios, android): refactor action methods to use WrappedAction type for improved type safety * feat(tests): enhance locate parameter structure for RightClick and Tap actions in player tests * feat(core): refine Zod schema handling in descriptionForAction for improved type extraction --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: quanruzhuoxiu <[email protected]> Co-authored-by: Claude <[email protected]> Co-authored-by: yutao <[email protected]>

* refactor(core): change Locate task from Insight to Planning type This change reclassifies the Locate task from Insight type to Planning type for better semantic alignment. The UI will now display "Planning/Locate" instead of "Insight/Locate". Changes: - Added new ExecutionTaskPlanningLocate types in core/types.ts - Updated task-builder.ts to create Planning type Locate tasks - Modified task-runner.ts to handle Locate in Planning branch - Updated ui-utils.ts to process Planning Locate tasks - Fixed all type references in visualizer and report components - Updated unit tests to reflect the new type structure 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat(core): update types of action * refactor(core): simplify ExecutionTaskType definition * refactor(tests): update task type from Insight to Planning in task assertions --------- Co-authored-by: Claude <[email protected]> Co-authored-by: yutao <[email protected]>

* chore(core): refine error processing of agent * fix(core): error processing of aiWaitFor * chore(core): update error processing * chore(core): update error processing * chore(core): fix lint * feat(core): update executor callback * chore(core): clean up imports and simplify method signature in TaskRunner * fix(core): test cases * fix(core): ci * fix(core): report style --------- Co-authored-by: quanruzhuoxiu <[email protected]>

packages/visualizer/src/utils/index.ts

@@ -4,6 +4,10 @@
  return timestamp ? dayjs(timestamp).format('YYYY-MM-DD HH:mm:ss') : '-';
 }

+export function fullTimeStrWithMilliseconds(timestamp?: number) {
+  return timestamp ? dayjs(timestamp).format('YYYY-MM-DD HH:mm:ss.SSS') : '-';


netlify · 2025-11-04T07:24:28Z

✅ Deploy Preview for midscene ready!

Name	Link
🔨 Latest commit	`43e3316`
🔍 Latest deploy log	https://app.netlify.com/projects/midscene/deploys/69131ced54e14a0008890e41
😎 Deploy Preview	https://deploy-preview-1421--midscene.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

CLAassistant · 2025-11-04T07:24:29Z

Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you all sign our Contributor License Agreement before we can accept your contribution.
3 out of 4 committers have signed the CLA.

✅ quanru
✅ yuyutaotao
✅ EAGzzyCSL
❌ Copilot
_{You have signed the CLA already but the status is still pending? Let us recheck it.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2025-11-04T07:30:15Z

packages/core/src/agent/tasks.ts

        (!result && `No result from assertion: ${textPrompt}`) ||
        `unknown error when waiting for assertion: ${textPrompt}`;
      const now = Date.now();
-      if (now - startTime < checkIntervalMs) {
-        const timeRemaining = checkIntervalMs - (now - startTime);
-        const sleepTask = await this.taskForSleep(timeRemaining, modelConfig);
-        await taskExecutor.append(sleepTask);
+      if (now - currentCheckStart < checkIntervalMs) {
+        const timeRemaining = checkIntervalMs - (now - currentCheckStart);
+        const sleepTask = this.taskBuilder.createSleepTask({
+          timeMs: timeRemaining,
+        });
+        await session.append(sleepTask);
      }
    }


waitFor may overshoot timeout by one polling cycle

The new waitFor loop breaks based on lastCheckStart - overallStartTime <= timeoutMs, where lastCheckStart is updated only at the beginning of each iteration. If a check plus its sleep already runs past timeoutMs, the while condition still passes because it compares the previous iteration’s start time, so another full sleep+query cycle executes before timing out. The previous implementation used Date.now() - overallStartTime directly and exited immediately once the wall‑clock exceeded timeoutMs. This regression can delay raising the timeout by up to checkIntervalMs plus query latency. Consider computing the loop condition against the current time instead of the previous start time.

Useful? React with 👍 / 👎.

* feat(android,ios): expose mobile system navigation actions * chore(core): fix lint

…1416) * fix(android): correct orientation handling for displayId screenshots When `displayId` is set with `alwaysRefreshScreenInfo: true`, screenshots captured in landscape are misreported as portrait, causing dimension mismatches between the actual screenshot and reported size. Root cause: Screen dimensions from `dumpsys display` (used with `displayId`) are already in current orientation, but the code was unconditionally swapping them as if they were in native orientation like `wm size` output. Changes: - Add `isCurrentOrientation` flag to `getScreenSize()` return type to distinguish dimension orientation context - Set flag `true` for `dumpsys display` paths (displayId lookups), `false` for `wm size` fallback - Conditionally swap dimensions in `size()` only when `isCurrentOrientation !== true` and device is landscape This ensures screenshot dimensions match reported size regardless of device rotation and whether displayId is configured. Based on PR #1416 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(visualizer): handle expected animation cancellation errors in Player component * feat(tests): enhance IOSDevice tests with additional mock methods for app termination and URL handling --------- Co-authored-by: quanruzhuoxiu <[email protected]> Co-authored-by: Claude <[email protected]>

* docs(site): remove unreleased model env names * chore(core): fix lint

Implemented fallback logic to support WebDriverAgent 5.x through 7.x: - tap(): Tries new endpoint (WDA 6.0+) first, falls back to legacy endpoint (WDA 5.x) - getScreenScale(): Tries /wda/screen endpoint first, calculates from screenshot if unavailable This implementation follows Python facebook-wda's compatibility approach with try-catch fallback strategy. Changes: - Enhanced tap() with dual-endpoint support (new: /wda/tap, legacy: /wda/tap/0) - Enhanced getScreenScale() with calculation fallback using screenshot dimensions - Added comprehensive unit tests covering all fallback scenarios - All comments in English for consistency 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

* docs(core): add docs for system buttons * docs(core): merge mcp docs * docs(core): update toc * docs(core): update docs for cli * docs(core): update docs * docs(core): add docs * docs(core): fix all dead links * chore(core): merge latest code

…1430) Remove z.void() from Android and iOS navigation actions and make paramSchema optional in defineAction. This fixes a breaking change where parseActionParam converted undefined to {}, causing z.void() validation to fail. Changes: - Updated defineAction to allow optional paramSchema - Removed paramSchema from Android navigation actions (AndroidBackButton, AndroidHomeButton, AndroidRecentAppsButton) - Removed paramSchema from iOS navigation actions (IOSHomeButton, IOSAppSwitcher) - Updated parseActionParam to return undefined when no schema provided - Added tests for actions without paramSchema 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

* chore(core): refine error processing of agent * chore(core): fix lint * chore(core): fix lint

…cts (#1433) * fix(blackboard): update highlight effect with glow filter and pulsing animation * feat(detail-side): enhance element rendering with detailed info and nested structure support * feat(detail-side): add YAML content rendering with special handling for cache context

* feat(shared): unify VQA and grounding models into insight model Unified MIDSCENE_VQA_MODEL_* and MIDSCENE_GROUNDING_MODEL_* environment variables into a single MIDSCENE_INSIGHT_MODEL_* configuration. Changes: - Updated type definitions to use 'insight' intent instead of 'VQA' and 'grounding' - Unified 12 environment variables into 6 INSIGHT variables - Updated all agent code to use 'insight' intent - Fixed all test cases (140/140 passing) - Added comprehensive documentation for intent-based model configuration - Fixed duplicate case clause warnings in test files Breaking changes: - Replaced TIntent type: 'VQA' | 'grounding' -> 'insight' - Environment variables MIDSCENE_VQA_MODEL_* and MIDSCENE_GROUNDING_MODEL_* are no longer supported Documentation updates: - Added detailed intent-based configuration guide in model-provider.mdx (EN/ZH) - Updated API documentation with modelConfig examples (EN/ZH) - Updated choose-a-model.mdx with task type configuration section (EN/ZH) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * Update packages/shared/tests/unit-test/env/modle-config-manager.test.ts Co-authored-by: Copilot <[email protected]> * Update packages/shared/tests/unit-test/env/modle-config-manager.test.ts Co-authored-by: Copilot <[email protected]> * fix(tests): remove unnecessary blank line in ModelConfigManager test * fix(docs): update advanced configuration parameters in API reference --------- Co-authored-by: Claude <[email protected]> Co-authored-by: Copilot <[email protected]>

* feat(report): add comprehensive dark mode support - Add dark mode toggle with URL query parameter support (?darkMode=true/false) - Implement theme switching with localStorage persistence - Update all components with dark mode color schemes: * Report UI components (sidebar, detail panels, timeline) * Visualizer components (player, blackboard, screenshot viewer) - Fix all SVG icons to use currentColor for theme compatibility - Add dynamic logo switching between light/dark variants - Add custom theme toggle button with sun/moon icons - Ensure PIXI.js canvas adapts to theme changes in timeline - Fix checkbox visibility in dark mode - Add dark mode styles for all text, backgrounds, and borders The dark mode can be controlled via: 1. Toggle button in the navigation bar 2. URL parameter: ?darkMode=true or ?darkMode=false 3. Preference is saved in localStorage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(visualizer): improve theme detection code formatting --------- Co-authored-by: Claude <[email protected]>

* feat(report): improve dark mode UI styling - Remove table header background in both light and dark modes for cleaner appearance - Add proper dark mode text color for screenshot item titles - Adjust detail side panel padding for better spacing - Ensure consistent transparent backgrounds across table headers 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * feat(report): add dark mode text color for detail-content - Set detail-content text color to rgba(255, 255, 255, 0.85) in dark mode for better readability 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix(visualizer): prompt input * refactor(visualizer): useTheme hook --------- Co-authored-by: Claude <[email protected]>

* feat(core): update style for planning param * chore(core): fix report * fix(core): screenshot panel

This commit introduces automatic agent destruction after each MCP tool execution to prevent connection leaks and ensure fresh agent instances for each call. Key changes: - Added `toolWithAutoDestroy` wrapper method that automatically destroys agents after tool execution completes - Refactored all 14 agent-using tools to use the new wrapper: * Android tools: connect, launch, back, home * Browser tools: navigate, get_tabs, set_active_tab, aiHover * Common tools: aiWaitFor, aiAssert, aiKeyboardPress, screenshot, aiTap, aiScroll, aiInput - Handler code remains unchanged - only registration method differs - Tools that don't use agents (list_devices, get_console_logs, get_screenshot) remain unchanged Benefits: - Prevents Chrome Extension bridge disconnection issues - Ensures clean state for each tool call - No code changes needed in tool handlers - Elegant and maintainable solution Fixes the issue where bridge connections would disconnect after first use, requiring manual agent destruction between calls. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude <[email protected]>

Copilot AI and others added 30 commits October 17, 2025 17:12

chore(core): remove warning msg for gpt-4 (#1331)

2a98471

* chore(core): remove warning msg for gpt-4 * chore(core): remove dom-based locator

feat(core): update recorder (#1330)

23c49d3

* chore(core): refine recorder loop * feat(core): update implementation of recorder

chore(core): update tasks impementation (#1338)

c9b385b

* chore(core): update implementation of insight * chore(core): refine error plan * chore(core): refine error plan * chore(core): split tasks into multiple parts * fix(core): fix ci

refine(core): use 'subTask' flag to reuse context (#1350)

dc60bc3

* chore(core): update types of task executor * chore(core): update sleep tasks * chore(core): update types for planning * feat(core): update subTask flag

refactor(core): remove tree in context (#1376)

57cd24a

* refactor(core): remove tree info in uiContext * chore(core): fix lint * chore(core): remove dom-based locator * fix(core): test cases * chore(core): fix lint * fix(core): test cases

yuyutaotao and others added 6 commits November 3, 2025 12:57

fix(core): action context as param (#1415)

9bffc83

* feat(core): make aiActionContext as a param for planning * feat(core): make aiActionContext as a param for planning

feat(core): show intent in report (#1407)

44589ae

feat(core): update timeout strategy of aiWaitFor (#1419)

706f70c

github-advanced-security bot found potential problems Nov 4, 2025

View reviewed changes

yuyutaotao marked this pull request as draft November 4, 2025 07:24

chatgpt-codex-connector bot reviewed Nov 4, 2025

View reviewed changes

yuyutaotao and others added 19 commits November 4, 2025 17:54

feat(android,ios): expose mobile system navigation actions (#1420)

94f4f74

* feat(android,ios): expose mobile system navigation actions * chore(core): fix lint

docs(site): remove unreleased model env names (#1427)

b386a5e

* docs(site): remove unreleased model env names * chore(core): fix lint

fix(visualizer): cursor not move in player (#1429)

03485d6

docs(core): docs for 1.0 (#1423)

8378dae

* docs(core): add docs for system buttons * docs(core): merge mcp docs * docs(core): update toc * docs(core): update docs for cli * docs(core): update docs * docs(core): add docs * docs(core): fix all dead links * chore(core): merge latest code

fix: workflow of planning (#1431)

341bdb8

* chore(core): refine error processing of agent * chore(core): fix lint * chore(core): fix lint

feat(report): replace sidebar grid layout with antd Table (#1436)

aa30ba8

feat(core): update sidebar ui (#1437)

1228a7d

fix(core): replay scripts (#1440)

d1104a0

feat(core): show markup in screenshot panel (#1444)

2792369

* feat(core): update style for planning param * chore(core): fix report * fix(core): screenshot panel

feat(core): redefine scroll param (#1441)

6fcc788

feat(core): redefine the ai shortcut (#1445)

43e3316

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP: 1.0 #1421

WIP: 1.0 #1421

yuyutaotao commented Nov 4, 2025

Uh oh!

Check failure

netlify bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

CLAassistant commented Nov 4, 2025 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

WIP: 1.0 #1421

Are you sure you want to change the base?

WIP: 1.0 #1421

Conversation

yuyutaotao commented Nov 4, 2025

Uh oh!

Check failure

Uh oh!

Uh oh!

netlify bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for midscene ready!

Uh oh!

CLAassistant commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

netlify bot commented Nov 4, 2025 •

edited

Loading

CLAassistant commented Nov 4, 2025 •

edited

Loading