-
Notifications
You must be signed in to change notification settings - Fork 21
Request level retry handler #991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: retry-unification
Are you sure you want to change the base?
Request level retry handler #991
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add more details PR description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed till databrickshttpclient.
THRIFT_OPEN_SESSION, | ||
THRIFT_CLOSE_SESSION, | ||
THRIFT_METADATA, | ||
THRIFT_CLOSE_OPERATION, | ||
THRIFT_CANCEL_OPERATION, | ||
THRIFT_EXECUTE_STATEMENT, | ||
THRIFT_FETCH_RESULTS, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need the THRIFT_
prefix here?
src/main/java/com/databricks/jdbc/common/safe/DatabricksDriverFeatureFlagsContext.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/IDatabricksHttpClient.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add logging for retries. follow this section from design doc
https://docs.google.com/document/d/1WI-FGrcBdv34_r4LNH--BneN1HB1AKyXsHkhFV70AHo/edit?tab=t.0#heading=h.2mcjtearjzw7
src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/IdempotentRetryStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/IdempotentRetryStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/NonIdempotentRetryStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/RetryHandlingHelperFunctions.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/RetryHandlingHelperFunctions.java
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/IdempotentRetryStrategy.java
Show resolved
Hide resolved
if (accumulatedTimeTempUnavailable <= 0 || accumulatedTimeRateLimit <= 0) { | ||
return response; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will still recommend keeping timestamps for request start and current timestamp and using that for checking timeout. Current approach is also correct. will leave this decision to you
src/main/java/com/databricks/jdbc/common/DatabricksJdbcUrlParams.java
Outdated
Show resolved
Hide resolved
THRIFT_FETCH_RESULTS(RequestRetryability.NON_IDEMPOTENT), | ||
CLOUD_FETCH(RequestRetryability.IDEMPOTENT), | ||
VOLUME_LIST(RequestRetryability.IDEMPOTENT), | ||
VOLUME_SHOW_VOLUMES(RequestRetryability.IDEMPOTENT), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VOLUME_SHOW_VOLUMES -> what does this mean? how does it differ from LIST?
VOLUME_LIST(RequestRetryability.IDEMPOTENT), | ||
VOLUME_SHOW_VOLUMES(RequestRetryability.IDEMPOTENT), | ||
VOLUME_GET(RequestRetryability.IDEMPOTENT), | ||
VOLUME_PUT(RequestRetryability.NON_IDEMPOTENT), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if we retry a PUT volume request? I would assume we simply overwrite the existing file, is that not the case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we do overwrite the existing file.
src/main/java/com/databricks/jdbc/common/RequestRetryability.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/NonIdempotentRetryStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/NonIdempotentRetryStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/RetryHandlingHelperFunctions.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/RetryHandlingHelperFunctions.java
Show resolved
Hide resolved
while (cause != null) { | ||
if (cause instanceof DatabricksRetryHandlerException) { | ||
throw new DatabricksHttpException( | ||
cause.getMessage(), cause, DatabricksDriverErrorCode.INVALID_STATE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why INVALID_STATE?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its the current behaviour of the driver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these are severe bugs which will break retry behaviour, also signals that the testing in this PR is not adequate, please take a deeper look
src/main/java/com/databricks/jdbc/common/RequestRetryability.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java
Outdated
Show resolved
Hide resolved
int statusCode = response.getStatusLine().getStatusCode(); | ||
|
||
// Get retry delay from strategy | ||
retryDelayMillis = strategy.retryRequestAfter(response, attempt, connectionContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is all very confusing, we calculate exponential backoff first then always overwrite it here. why did we even calculate the exponential backoff at line 47?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because if httpclient.execute(request) throws an exception, exponential backoff is followed rather than retrying it again immediately.
src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/RetryHandlingHelperFunctions.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces request-level retry handling to the Databricks JDBC driver by implementing a strategy pattern for HTTP request retries based on request type. The retry logic differentiates between idempotent and non-idempotent requests, with configurable timeout and retry limits.
- Replaced direct
execute
calls withexecuteWithRetry
throughout the codebase - Added request type classification with appropriate retry strategies
- Introduced comprehensive unit tests for retry strategies and handlers
Reviewed Changes
Copilot reviewed 33 out of 33 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
Test files | Updated all test mocks to use executeWithRetry instead of execute |
New test classes | Added comprehensive tests for retry strategies and handlers |
HttpRequestTypeBasedRetryHandler |
Main retry handler implementing strategy pattern |
IRetryStrategy |
Interface for retry strategies |
IdempotentRetryStrategy |
Strategy for idempotent requests with exponential backoff |
NonIdempotentRetryStrategy |
Strategy for non-idempotent requests respecting Retry-After headers |
HTTPRequestType enum |
Classification of request types with retryability |
HTTP client implementations | Added executeWithRetry methods and integration |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java
Outdated
Show resolved
Hide resolved
src/test/java/com/databricks/jdbc/dbclient/impl/http/IdempotentRetryStrategyTest.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/IdempotentRetryStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/IdempotentRetryStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/NonIdempotentRetryStrategy.java
Outdated
Show resolved
Hide resolved
* @return a jittered delay value between value and value * 1.2 | ||
*/ | ||
public static int addJitter(int value) { | ||
return (int) (value * (1.0 + (RANDOM.nextDouble() * 0.2))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not have a separate function for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the code looks more readable with this being a seperate function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adds unnecessary complexity and multiple function definitions to go through, also define these values as constant.
src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java
Outdated
Show resolved
Hide resolved
prepareRequestHeaders(request, supportGzipEncoding); | ||
|
||
IRetryStrategy strategy = RetryUtils.getRetryStrategy(requestType); | ||
LOGGER.debug( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice. @samikshya-db can you help nishkarsh with adding retry logs to telemetry as well
src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java
Outdated
Show resolved
Hide resolved
requestType, | ||
strategy.getClass().getSimpleName()); | ||
|
||
RetryTimeoutManager retryTimeoutManager = new RetryTimeoutManager(connectionContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its not efficient to initialize for every request. You are just passing connection context here, so you can move it to class constructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we need the retry timeout values to be reset for each request. And storing it in the httpclient could be dangerous in multi-threaded scenarios.
src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/IdempotentRetryStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/NonIdempotentRetryStrategy.java
Outdated
Show resolved
Hide resolved
src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! please wait for @vikrantpuppala's approval
public CloseableHttpResponse executeWithRetry( | ||
HttpUriRequest request, RequestType requestType, boolean supportGzipEncoding) | ||
throws DatabricksHttpException { | ||
prepareRequestHeaders(request, supportGzipEncoding); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is already happening in execute, don't think we need this here
int retryAttempt = 0; | ||
|
||
while (true) { | ||
// follow exponential backoff if executing the request throws IOException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove comment
CloseableHttpResponse response = httpClient.execute(request); | ||
int statusCode = response.getStatusLine().getStatusCode(); | ||
Optional<Integer> retryAfterHeader = RetryUtils.extractRetryAfterHeader(response); | ||
// Get retry delay from strategy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove obvious comments
// Get retry delay from strategy | ||
Optional<Integer> retryDelay = | ||
strategy.retryRequestAfter( | ||
statusCode, retryAfterHeader, retryAttempt, connectionContext); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
change this method to shouldRetry() which should only return a boolean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should take in your timeoutmanager as well and check if your timeouts have already exceeded limits
retryAttempt, | ||
requestType, | ||
e.getMessage()); | ||
if (!retryTimeoutManager.evaluateRetryDecisionForException(strategy, e, retryDelayMillis)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
replace this with strategy.shouldRetryException which takes in your timeoutManager
boolean isStatusCodeRetriable(int statusCode, IDatabricksConnectionContext connectionContext); | ||
|
||
/* Returns the delay in milliseconds after which a request should be retried, or empty if it shouldn't be retried */ | ||
Optional<Integer> retryRequestAfter( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this completely
int statusCode, Optional<Integer> retryDelayMillis) { | ||
if (retryDelayMillis.isEmpty()) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should not happen here, it should only check for timeouts
* @return a jittered delay value between value and value * 1.2 | ||
*/ | ||
public static int addJitter(int value) { | ||
return (int) (value * (1.0 + (RANDOM.nextDouble() * 0.2))); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
adds unnecessary complexity and multiple function definitions to go through, also define these values as constant.
Description
Added a executeWithRetry function in DatabricksHTTPClient and replaced all execute calls by executeWithRetry calls (except for thrift, UC volume client).
Testing
Tested through existing unit tests (but changed some mocks). Also added new unit tests for Retry Handling and Retry Strategies.
Additional Notes to the Reviewer