fix #1563 (cdp): resolve page leaks and race conditions in concurrent… #1592
+685
−28
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
#1563 Fix memory leaks and race conditions in CDP managed browser crawling
Fix memory leaks and race conditions when using arun_many() with managed CDP browsers. Each crawl now gets proper page isolation with automatic cleanup while maintaining shared browser context.
Key fixes:
This ensures stable parallel crawling without memory growth or browser instability.
Summary
Fixes #1563
This PR resolves critical memory leaks and race conditions that occurred when using
arun_many()with managed CDP browsers. The main issues were:The fix ensures that:
arun_many()gets its own isolated page/tabList of files changed and why
crawl4ai/async_crawler_strategy.py - Updated page cleanup logic to properly close pages after crawling when using non-managed browsers, while preserving session pages for authentication persistence
crawl4ai/browser_manager.py - Added thread-safe page creation with locks to prevent race conditions, and improved page lifecycle management to distinguish between managed and non-managed browser contexts
docs/md_v2/advanced/cdp-browser-crawling.md - Added comprehensive documentation for CDP browser crawling, including setup instructions, usage examples, and best practices for managed browser workflows
tests/test_arun_many_cdp.py - Created new test suite with both parallel and sequential test cases to verify proper page isolation and cleanup in
arun_many()operations with managed CDP browsersHow Has This Been Tested?
The changes have been tested with:
Unit Tests: Created
tests/test_arun_many_cdp.pywith two test scenarios:test_arun_many_with_cdp(): Tests parallel crawling of 3 URLs to verify proper page isolationtest_arun_many_with_cdp_sequential(): Tests sequential crawling to isolate potential issuesManual Testing:
localhost:9222arun_many()operations to confirm tabs are created and cleaned up properlyTest Requirements: Tests require a running CDP browser instance (can be started with
crwl cdp -d 9222)All tests pass successfully, confirming that memory leaks and race conditions are resolved.
Checklist: