Request level retry handler #991

nishkarsh-db · 2025-09-11T10:45:24Z

Description

Added a executeWithRetry function in DatabricksHTTPClient and replaced all execute calls by executeWithRetry calls (except for thrift, UC volume client).

Testing

Tested through existing unit tests (but changed some mocks). Also added new unit tests for Retry Handling and Retry Strategies.

Additional Notes to the Reviewer

shivam2680

add more details PR description.

shivam2680

Reviewed till databrickshttpclient.

shivam2680 · 2025-09-11T12:50:29Z

src/main/java/com/databricks/jdbc/common/HTTPRequestType.java

+  THRIFT_OPEN_SESSION,
+  THRIFT_CLOSE_SESSION,
+  THRIFT_METADATA,
+  THRIFT_CLOSE_OPERATION,
+  THRIFT_CANCEL_OPERATION,
+  THRIFT_EXECUTE_STATEMENT,
+  THRIFT_FETCH_RESULTS,


do we need the THRIFT_ prefix here?

src/main/java/com/databricks/jdbc/common/HTTPRequestType.java

src/main/java/com/databricks/jdbc/common/safe/DatabricksDriverFeatureFlagsContext.java

src/main/java/com/databricks/jdbc/dbclient/IDatabricksHttpClient.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java

shivam2680

Please add logging for retries. follow this section from design doc
https://docs.google.com/document/d/1WI-FGrcBdv34_r4LNH--BneN1HB1AKyXsHkhFV70AHo/edit?tab=t.0#heading=h.2mcjtearjzw7

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/IdempotentRetryStrategy.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/NonIdempotentRetryStrategy.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/RetryHandlingHelperFunctions.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/IdempotentRetryStrategy.java

shivam2680 · 2025-09-16T08:22:00Z

src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java

+        if (accumulatedTimeTempUnavailable <= 0 || accumulatedTimeRateLimit <= 0) {
+          return response;
+        }


I will still recommend keeping timestamps for request start and current timestamp and using that for checking timeout. Current approach is also correct. will leave this decision to you

src/main/java/com/databricks/jdbc/common/DatabricksJdbcUrlParams.java

src/main/java/com/databricks/jdbc/common/HTTPRequestType.java

vikrantpuppala · 2025-09-17T11:25:27Z

src/main/java/com/databricks/jdbc/common/HTTPRequestType.java

+  THRIFT_FETCH_RESULTS(RequestRetryability.NON_IDEMPOTENT),
+  CLOUD_FETCH(RequestRetryability.IDEMPOTENT),
+  VOLUME_LIST(RequestRetryability.IDEMPOTENT),
+  VOLUME_SHOW_VOLUMES(RequestRetryability.IDEMPOTENT),


VOLUME_SHOW_VOLUMES -> what does this mean? how does it differ from LIST?

vikrantpuppala · 2025-09-17T11:26:57Z

src/main/java/com/databricks/jdbc/common/HTTPRequestType.java

+  VOLUME_LIST(RequestRetryability.IDEMPOTENT),
+  VOLUME_SHOW_VOLUMES(RequestRetryability.IDEMPOTENT),
+  VOLUME_GET(RequestRetryability.IDEMPOTENT),
+  VOLUME_PUT(RequestRetryability.NON_IDEMPOTENT),


what happens if we retry a PUT volume request? I would assume we simply overwrite the existing file, is that not the case?

Yes, we do overwrite the existing file.

src/main/java/com/databricks/jdbc/common/RequestRetryability.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/NonIdempotentRetryStrategy.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/RetryHandlingHelperFunctions.java

vikrantpuppala · 2025-09-18T04:48:20Z

src/main/java/com/databricks/jdbc/dbclient/impl/http/RetryHandlingHelperFunctions.java

+    while (cause != null) {
+      if (cause instanceof DatabricksRetryHandlerException) {
+        throw new DatabricksHttpException(
+            cause.getMessage(), cause, DatabricksDriverErrorCode.INVALID_STATE);


why INVALID_STATE?

Its the current behaviour of the driver.

vikrantpuppala

these are severe bugs which will break retry behaviour, also signals that the testing in this PR is not adequate, please take a deeper look

src/main/java/com/databricks/jdbc/common/RequestRetryability.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java

vikrantpuppala · 2025-09-18T04:55:36Z

src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java

+        int statusCode = response.getStatusLine().getStatusCode();
+
+        // Get retry delay from strategy
+        retryDelayMillis = strategy.retryRequestAfter(response, attempt, connectionContext);


this is all very confusing, we calculate exponential backoff first then always overwrite it here. why did we even calculate the exponential backoff at line 47?

This is because if httpclient.execute(request) throws an exception, exponential backoff is followed rather than retrying it again immediately.

src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/RetryHandlingHelperFunctions.java

Copilot

Pull Request Overview

This PR introduces request-level retry handling to the Databricks JDBC driver by implementing a strategy pattern for HTTP request retries based on request type. The retry logic differentiates between idempotent and non-idempotent requests, with configurable timeout and retry limits.

Replaced direct execute calls with executeWithRetry throughout the codebase
Added request type classification with appropriate retry strategies
Introduced comprehensive unit tests for retry strategies and handlers

Reviewed Changes

Copilot reviewed 33 out of 33 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
Test files	Updated all test mocks to use `executeWithRetry` instead of `execute`
New test classes	Added comprehensive tests for retry strategies and handlers
`HttpRequestTypeBasedRetryHandler`	Main retry handler implementing strategy pattern
`IRetryStrategy`	Interface for retry strategies
`IdempotentRetryStrategy`	Strategy for idempotent requests with exponential backoff
`NonIdempotentRetryStrategy`	Strategy for non-idempotent requests respecting Retry-After headers
`HTTPRequestType` enum	Classification of request types with retryability
HTTP client implementations	Added `executeWithRetry` methods and integration

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

src/main/java/com/databricks/jdbc/dbclient/impl/http/HttpRequestTypeBasedRetryHandler.java

src/test/java/com/databricks/jdbc/dbclient/impl/http/IdempotentRetryStrategyTest.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/IdempotentRetryStrategy.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/IdempotentRetryStrategy.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/NonIdempotentRetryStrategy.java

vikrantpuppala · 2025-09-23T14:21:59Z

src/main/java/com/databricks/jdbc/dbclient/impl/http/RetryUtils.java

+   * @return a jittered delay value between value and value * 1.2
+   */
+  public static int addJitter(int value) {
+    return (int) (value * (1.0 + (RANDOM.nextDouble() * 0.2)));


can we not have a separate function for this?

I think the code looks more readable with this being a seperate function.

adds unnecessary complexity and multiple function definitions to go through, also define these values as constant.

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java

shivam2680 · 2025-09-25T04:40:20Z

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java

+    prepareRequestHeaders(request, supportGzipEncoding);
+
+    IRetryStrategy strategy = RetryUtils.getRetryStrategy(requestType);
+    LOGGER.debug(


nice. @samikshya-db can you help nishkarsh with adding retry logs to telemetry as well

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java

shivam2680 · 2025-09-25T05:08:29Z

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java

+        requestType,
+        strategy.getClass().getSimpleName());
+
+    RetryTimeoutManager retryTimeoutManager = new RetryTimeoutManager(connectionContext);


Its not efficient to initialize for every request. You are just passing connection context here, so you can move it to class constructor.

But we need the retry timeout values to be reset for each request. And storing it in the httpclient could be dangerous in multi-threaded scenarios.

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/IdempotentRetryStrategy.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/NonIdempotentRetryStrategy.java

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java

…xception

shivam2680

lgtm! please wait for @vikrantpuppala's approval

vikrantpuppala · 2025-09-26T06:44:22Z

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java

+  public CloseableHttpResponse executeWithRetry(
+      HttpUriRequest request, RequestType requestType, boolean supportGzipEncoding)
+      throws DatabricksHttpException {
+    prepareRequestHeaders(request, supportGzipEncoding);


this is already happening in execute, don't think we need this here

vikrantpuppala · 2025-09-26T06:49:59Z

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java

+    int retryAttempt = 0;
+
+    while (true) {
+      // follow exponential backoff if executing the request throws IOException


remove comment

vikrantpuppala · 2025-09-26T06:50:09Z

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java

+        CloseableHttpResponse response = httpClient.execute(request);
+        int statusCode = response.getStatusLine().getStatusCode();
+        Optional<Integer> retryAfterHeader = RetryUtils.extractRetryAfterHeader(response);
+        // Get retry delay from strategy


remove obvious comments

vikrantpuppala · 2025-09-26T06:57:00Z

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java

+        // Get retry delay from strategy
+        Optional<Integer> retryDelay =
+            strategy.retryRequestAfter(
+                statusCode, retryAfterHeader, retryAttempt, connectionContext);


change this method to shouldRetry() which should only return a boolean

this should take in your timeoutmanager as well and check if your timeouts have already exceeded limits

vikrantpuppala · 2025-09-26T07:00:06Z

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java

+            retryAttempt,
+            requestType,
+            e.getMessage());
+        if (!retryTimeoutManager.evaluateRetryDecisionForException(strategy, e, retryDelayMillis)) {


replace this with strategy.shouldRetryException which takes in your timeoutManager

vikrantpuppala · 2025-09-26T07:00:38Z

src/main/java/com/databricks/jdbc/dbclient/impl/http/IRetryStrategy.java

+  boolean isStatusCodeRetriable(int statusCode, IDatabricksConnectionContext connectionContext);
+
+  /* Returns the delay in milliseconds after which a request should be retried, or empty if it shouldn't be retried */
+  Optional<Integer> retryRequestAfter(


remove this completely

vikrantpuppala · 2025-09-26T07:01:19Z

src/main/java/com/databricks/jdbc/dbclient/impl/http/RetryTimeoutManager.java

+      int statusCode, Optional<Integer> retryDelayMillis) {
+    if (retryDelayMillis.isEmpty()) {
+      return false;
+    }


this should not happen here, it should only check for timeouts

vikrantpuppala · 2025-09-26T07:02:42Z

src/main/java/com/databricks/jdbc/dbclient/impl/http/RetryUtils.java

+   * @return a jittered delay value between value and value * 1.2
+   */
+  public static int addJitter(int value) {
+    return (int) (value * (1.0 + (RANDOM.nextDouble() * 0.2)));


adds unnecessary complexity and multiple function definitions to go through, also define these values as constant.

nishkarsh-db added 3 commits September 11, 2025 10:32

added executeWithRetry Function

d9213cd

changed execute calls with executeWithRetry

ebb4635

changed NOT_SET to UNKNOWN

654b664

nishkarsh-db requested a review from shivam2680 September 11, 2025 10:45

shivam2680 requested a review from vikrantpuppala September 11, 2025 10:56

shivam2680 reviewed Sep 11, 2025

View reviewed changes

shivam2680 changed the base branch from main to retry-unification September 12, 2025 06:44

nishkarsh-db added 4 commits September 12, 2025 17:28

Refactored retry handler

058f173

Removed Duplicate File

0b4c3f9

refactored the retry handler implementation

4a166d6

added mock connection context in retry handler tests

1986c6c

shivam2680 reviewed Sep 16, 2025

View reviewed changes

src/main/java/com/databricks/jdbc/dbclient/impl/http/RetryHandlingHelperFunctions.java Show resolved Hide resolved

shivam2680 reviewed Sep 16, 2025

View reviewed changes

src/main/java/com/databricks/jdbc/dbclient/impl/http/IdempotentRetryStrategy.java Show resolved Hide resolved

shivam2680 reviewed Sep 16, 2025

View reviewed changes

nishkarsh-db added 3 commits September 16, 2025 16:16

Added Logging, jdbc parameter for Max Retries

ef7bef2

added jitter

9b71c7d

Changed exception handling

07e60ff

vikrantpuppala reviewed Sep 18, 2025

View reviewed changes

vikrantpuppala requested changes Sep 18, 2025

View reviewed changes

vikrantpuppala requested a review from Copilot September 18, 2025 05:23

Copilot AI reviewed Sep 18, 2025

View reviewed changes

Refactored executeWithRetry and added timeouts

603225c

vikrantpuppala reviewed Sep 23, 2025

View reviewed changes

vikrantpuppala reviewed Sep 24, 2025

View reviewed changes

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java Outdated Show resolved Hide resolved

nishkarsh-db added 2 commits September 24, 2025 21:44

Added RetryTimeoutManager

8be041c

added units to timeouts

8e1e91e

shivam2680 reviewed Sep 25, 2025

View reviewed changes

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java Outdated Show resolved Hide resolved

shivam2680 reviewed Sep 25, 2025

View reviewed changes

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java Show resolved Hide resolved

shivam2680 reviewed Sep 25, 2025

View reviewed changes

src/main/java/com/databricks/jdbc/dbclient/impl/http/IdempotentRetryStrategy.java Outdated Show resolved Hide resolved

shivam2680 reviewed Sep 25, 2025

View reviewed changes

src/main/java/com/databricks/jdbc/dbclient/impl/http/NonIdempotentRetryStrategy.java Outdated Show resolved Hide resolved

Fixed comments

ff22d17

shivam2680 reviewed Sep 25, 2025

View reviewed changes

src/main/java/com/databricks/jdbc/dbclient/impl/http/DatabricksHttpClient.java Outdated Show resolved Hide resolved

nishkarsh-db added 3 commits September 25, 2025 15:54

Fixed Logging comment in HTTP Client

1d89f47

Changed function name from throwHttpException to throwDatabricksHttpE…

eab52cc

…xception

refactored retry delay calculation

2d04743

shivam2680 approved these changes Sep 26, 2025

View reviewed changes

vikrantpuppala reviewed Sep 26, 2025

View reviewed changes

Request level retry handler #991

Are you sure you want to change the base?

Request level retry handler #991

Uh oh!

Conversation

nishkarsh-db commented Sep 11, 2025

Description

Testing

Additional Notes to the Reviewer

Uh oh!

shivam2680 left a comment

Choose a reason for hiding this comment

Uh oh!

shivam2680 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

shivam2680 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vikrantpuppala left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

shivam2680 left a comment •

edited

Loading