-
Notifications
You must be signed in to change notification settings - Fork 3.2k
fix: fixing exclude region inclusion #43602
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes a bug where customer-provided excluded regions were not being honored during transient failures when the SDK calls the global endpoint, which could point to an excluded region. It also adds logging enhancements to provide proper context when regions are marked unavailable.
Key changes:
- Modified region selection logic to properly respect both user-excluded and circuit-breaker-excluded locations
- Added context parameters to endpoint unavailability tracking methods for better diagnostics
- Updated retry policy to calculate retries based on applicable (non-excluded) regions
Reviewed Changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| test_service_request_retry_policy_async.py | New async test verifying write failover respects excluded regions during service request errors |
| test_service_request_retry_policy.py | New sync test verifying write failover respects excluded regions during service request errors |
| test_location_cache.py | Added tests for excluded region handling, regional fallback behavior, and global endpoint fallback scenarios |
| _global_endpoint_manager_async.py | Added context parameter to endpoint unavailability tracking methods and their call sites |
| _service_request_retry_policy.py | Updated retry count calculation to use applicable (non-excluded) regional routing contexts and added context to mark_endpoint_unavailable calls |
| _location_cache.py | Refactored region filtering logic to properly separate user-excluded from circuit-breaker-excluded locations and respect exclusions in global endpoint resolution |
| _global_endpoint_manager.py | Added context parameter to endpoint unavailability tracking methods and their call sites |
| _endpoint_discovery_retry_policy.py | Added context parameter when marking endpoints unavailable |
sdk/cosmos/azure-cosmos/tests/test_service_request_retry_policy_async.py
Outdated
Show resolved
Hide resolved
sdk/cosmos/azure-cosmos/tests/test_service_request_retry_policy.py
Outdated
Show resolved
Hide resolved
| return write_regional_routing_context.get_primary() | ||
| if self.connection_policy.EnableEndpointDiscovery: | ||
| # Get the list of applicable write locations, which respects excluded locations. | ||
| applicable_contexts = self._get_applicable_write_regional_routing_contexts(request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will break 403.3 scenarios where we want to ignore preferred regions. I think would be better to add the needed routing logic in session retry policy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tomas, I refactored the code. Do you think it will break scenarios where want to ignore preferred regions?
sdk/cosmos/azure-cosmos/tests/test_service_request_retry_policy_async.py
Outdated
Show resolved
Hide resolved
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
|
/azp run python - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This PR addresses scenarios where customer provided excluded region was not being honored during certain transient failures as the SDK ends up calling global endpoint which can point to a region that is part of excluded region list.
It also adds logging enhancements to ensure when a region is marked unavailable we have the proper context.