Add TestConnectionClosureRace #709

rschmitt · 2025-08-18T20:57:39Z

No description provided.

ok2c · 2025-08-19T17:10:46Z

@rschmitt Does this really need to be a JUnit class? Does it need to be executed with each and every CI run? Can still just be a stand-alone executable with a main method?

rschmitt · 2025-08-20T17:12:36Z

@ok2c Currently, this class seems to be exposing a serious bug in the HTTP/2 client. testRapidSequentialRequests tends to hang on h2 and possibly h2c; the stack traces seem to indicate that there's a race condition that causes continuations to be dropped. My current intention is to submit this as a reproducer of this bug that runs with all the other tests. Once we find and fix that bug, I can mark this test @Disabled, add a @Tag, and/or convert it to a pure example class.

This class's tests run in about seven seconds on my machine, but I'm not planning on leaving it enabled beyond the short term, because it burns through a lot of sockets and ephemeral ports which can cause problems.

ok2c · 2025-08-21T09:19:37Z

Currently, this class seems to be exposing a serious bug in the HTTP/2 client.

@rschmitt I have started a major revision of the H2 protocol handling code and have been making minor corrections / improvements to the H2 stream termination logic.

I am going to use your code as a reproducer for the defect you have been seeing. However I still do think there is little value in running this test with every CI run. This is a type of integration test one may want to execute prior to a GA release or after having done a major code refactoring.

I would really appreciate if you could convert these tests to a pure example class or mark them as @Disabled

rschmitt · 2025-08-23T16:22:25Z

@ok2c OK

ok2c · 2025-08-24T10:00:35Z

Currently, this class seems to be exposing a serious bug in the HTTP/2 client.

@rschmitt Curiously enough I am unable to reproduce the problem. I have run the test suite with the #testRapidSequentialRequests enabled against the latest core master and client master and all tests pass

h2: Validation disabled: Sequential requests (slow): 5 succeeded; 5 failed (50.00% success rate, 50.00% retriable)
h2: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Sequential requests (rapid): 2,499 succeeded; 1 failed (99.96% success rate, 0.04% retriable)
h2: Validation enabled:  Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 50.00% retriable)
h2: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Multiple small batches: 8 succeeded; 7 failed (53.33% success rate, 46.67% retriable)
h2: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

h2c: Validation disabled: Sequential requests (slow): 5 succeeded; 5 failed (50.00% success rate, 50.00% retriable)
h2c: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2c: Validation enabled:  Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 50.00% retriable)
h2c: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Multiple small batches: 8 succeeded; 7 failed (53.33% success rate, 46.67% retriable)
h2c: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

https: Validation disabled: Sequential requests (slow): 5 succeeded; 5 failed (50.00% success rate, 30.00% retriable)
https: Validation enabled:  Sequential requests (slow): 5 succeeded; 5 failed (50.00% success rate, 50.00% retriable)
https: Validation disabled: Sequential requests (rapid): 2,392 succeeded; 108 failed (95.68% success rate, 4.28% retriable)
https: Validation enabled:  Sequential requests (rapid): 2,457 succeeded; 43 failed (98.28% success rate, 1.72% retriable)
https: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 46.67% retriable)
https: Validation enabled:  Single large batch: 15 succeeded; 15 failed (50.00% success rate, 50.00% retriable)
https: Validation disabled: Multiple small batches: 8 succeeded; 7 failed (53.33% success rate, 46.67% retriable)
https: Validation enabled:  Multiple small batches: 8 succeeded; 7 failed (53.33% success rate, 46.67% retriable)

http: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation disabled: Sequential requests (rapid): 2,066 succeeded; 434 failed (82.64% success rate, 17.24% retriable)
http: Validation enabled:  Sequential requests (rapid): 2,304 succeeded; 196 failed (92.16% success rate, 7.72% retriable)
http: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation enabled:  Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)
http: Validation enabled:  Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)

I was also unable to reproduce the problem with the latest core 5.3.x and the latest client 5.5.x.

h2: Validation disabled: Sequential requests (slow): 5 succeeded; 5 failed (50.00% success rate, 50.00% retriable)
h2: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2: Validation enabled:  Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 50.00% retriable)
h2: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Multiple small batches: 8 succeeded; 7 failed (53.33% success rate, 46.67% retriable)
h2: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

h2c: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2c: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Sequential requests (rapid): 2,487 succeeded; 13 failed (99.48% success rate, 0.52% retriable)
h2c: Validation enabled:  Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 50.00% retriable)
h2c: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 33.33% retriable)
h2c: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

https: Validation disabled: Sequential requests (slow): 9 succeeded; 1 failed (90.00% success rate, 10.00% retriable)
https: Validation enabled:  Sequential requests (slow): 9 succeeded; 1 failed (90.00% success rate, 0.00% retriable)
https: Validation disabled: Sequential requests (rapid): 1,432 succeeded; 1,068 failed (57.28% success rate, 42.36% retriable)
https: Validation enabled:  Sequential requests (rapid): 1,447 succeeded; 1,053 failed (57.88% success rate, 42.04% retriable)
https: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 10.00% retriable)
https: Validation enabled:  Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
https: Validation disabled: Multiple small batches: 9 succeeded; 6 failed (60.00% success rate, 13.33% retriable)
https: Validation enabled:  Multiple small batches: 9 succeeded; 6 failed (60.00% success rate, 0.00% retriable)

http: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation disabled: Sequential requests (rapid): 2,286 succeeded; 214 failed (91.44% success rate, 8.52% retriable)
http: Validation enabled:  Sequential requests (rapid): 2,488 succeeded; 12 failed (99.52% success rate, 0.48% retriable)
http: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation enabled:  Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)
http: Validation enabled:  Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)

It is entirely possible that apache/httpcomponents-core@a202086 has fixed the cause of the problem but I am a bit doubtful. Could you please re-run the tests in your environment with the latest core master and let me know if the problem persists.

In the meantime I will be working on improvement of the stale check.

ok2c · 2025-08-24T12:07:22Z

@rschmitt I am unable to reproduce the problem neither with core 5.3.4 (stable), nor 5.3.5-SNAPSHOT not 5.4-alpha1-SNAPSHOT. I need your help with the reproducer. I am traveling right now and only have my Ubuntu Linux box at my disposal. The problem appears to be OS / JRE specific

ok2c · 2025-08-25T11:22:22Z

@rschmitt I believe I have found the underlying cause of the problem (or misunderstanding if you agree to see it that way)

Here is the thing. The HTTP/1.1 protocol handler by default aggressively pipelines request messages. What that basically means the client can write out n requests before the server gets a chance to process the first request and to drop the connection. By that time all n request messages may have already been committed to the execution pipeline

exchange 1: POST /ping
exchange 1: stale-check (success)
exchange 2: POST /ping
exchange 2: stale-check (success)
exchange 3: POST /ping
exchange 3: stale-check (success)
exchange 1: 200 OK
exchange 1: server closes connection
exchange 2: boom
exchange 3: boom

The stale connection check cannot help when pipelining is being employed because the connection is perfectly valid at the time of request initiation.

The H2 tests do not exhibit this problem because the server limits the number of concurrent streams to 1

        out.write(ff.createSettings(
...
            new H2Setting(H2Param.MAX_CONCURRENT_STREAMS, 1),
...
        ), outputStream);

I do not know if this is intentional or not but with the message pipelining not used, I get 100% reliability of message exchanges with both HTTP/1.1 and H2 using code with minor modifications to ensure request messages get executed sequentially

h2: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Sequential requests (rapid): 199 succeeded; 1 failed (99.50% success rate, 0.50% retriable)
h2: Validation enabled:  Sequential requests (rapid): 200 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)
h2: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

h2c: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2c: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Sequential requests (rapid): 200 succeeded; 0 failed (100.00% success rate)
h2c: Validation enabled:  Sequential requests (rapid): 200 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2c: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)
h2c: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

https: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
https: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
https: Validation disabled: Sequential requests (rapid): 182 succeeded; 18 failed (91.00% success rate, 9.00% retriable)
https: Validation enabled:  Sequential requests (rapid): 200 succeeded; 0 failed (100.00% success rate)
https: Validation disabled: Single large batch: 29 succeeded; 1 failed (96.67% success rate, 3.33% retriable)
https: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
https: Validation disabled: Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)
https: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

http: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation disabled: Sequential requests (rapid): 191 succeeded; 9 failed (95.50% success rate, 4.00% retriable)
http: Validation enabled:  Sequential requests (rapid): 200 succeeded; 0 failed (100.00% success rate)
http: Validation disabled: Single large batch: 29 succeeded; 1 failed (96.67% success rate, 0.00% retriable)
http: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
http: Validation disabled: Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)
http: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

rschmitt · 2025-08-25T17:26:29Z

The HTTP/1.1 protocol handler by default aggressively pipelines request messages.

That's an important detail. How does this work? Does pipelining only kick in when the max connection pool size is reached? How does this interact with the connectionRequestTimeout? How can pipelining be disabled? Which version of the client enabled pipelining by default?

rschmitt · 2025-08-25T18:09:53Z

The problem appears to be OS / JRE specific

Testing against apache/httpcomponents-core@be71c3a, I can reproduce the HTTP/2 hang on Linux (m5.2xlarge instance running Linux 5.10). I noticed that on macOS, it's much harder to reproduce the bug by running the tests on JDK21, so make sure you're running them on JDK8.

ok2c · 2025-08-25T19:17:44Z

The HTTP/1.1 protocol handler by default aggressively pipelines request messages.

That's an important detail. How does this work? Does pipelining only kick in when the max connection pool size is reached? How does this interact with the connectionRequestTimeout? How can pipelining be disabled? Which version of the client enabled pipelining by default?

@rschmitt HTTP/1.1 protocol handler has been working like that since 5.0-alpha1. Requests have always been immediately written out to the connection as soon as submitted by the caller for execution. I do not see a reason why it should be different. It is fully protocol conformant while dropping connection without signalling it is not.

If you insist we can add a config knob to disable it.

ok2c · 2025-08-25T19:21:55Z

The problem appears to be OS / JRE specific

Testing against apache/httpcomponents-core@be71c3a, I can reproduce the HTTP/2 hang on Linux (m5.2xlarge instance running Linux 5.10). I noticed that on macOS, it's much harder to reproduce the bug by running the tests on JDK21, so make sure you're running them on JDK8.

@rschmitt I am on vacation at the moment and will not be able to lay my hands on a different hardware until mid September. My Linux box is all I have. I always use Java 8 by default. I have never been able to reproduce it. Not once.

I will see if I can reproduce the lock-up on GitHub CI.

rschmitt · 2025-08-25T19:30:58Z

I've amended the PR to rebase atop the latest origin/master and to move this test to a separate profile through the power of JUnit 5 tags. I figured that this is more likely to be useful in the future than special-casing this test, moving it to an example, or just marking it as @Disabled, especially if you need to reproduce it in GitHub CI or something.

rschmitt · 2025-08-25T19:43:02Z

FYI, I can also reproduce the h2 hang against httpcore 5.3.4, in addition to apache/httpcomponents-core@be71c3a. It's definitely timing-sensitive. Newer and faster JDKs are less likely to reproduce the problem, and h2c is less likely to hang than TLS 1.3, which is less likely to hang than TLS 1.2. I can also reproduce it on Windows.

ok2c · 2025-08-26T08:44:32Z

@rschmitt The testSlowSequentialRequests, testOneLargeBatchOfRequests, and testSpacedOutBatchesOfRequests tests still submit requests in a tight loop. The testRapidSequentialRequests test behaves correctly.

ok2c · 2025-08-26T09:35:54Z

@rschmitt I honestly do not know how to fix a problem I am unable to reproduce

h2: Validation disabled: Sequential requests (rapid): 4,998 succeeded; 2 failed (99.96% success rate, 0.04% retriable)

Linux ok2c 6.8.0-78-generic #78-Ubuntu SMP PREEMPT_DYNAMIC Tue Aug 12 11:34:18 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux

openjdk version "1.8.0_342"
OpenJDK Runtime Environment Corretto-8.342.07.4 (build 1.8.0_342-b07)
OpenJDK 64-Bit Server VM Corretto-8.342.07.4 (build 25.342-b07, mixed mode)

rschmitt · 2025-08-27T17:03:00Z

The testSlowSequentialRequests, testOneLargeBatchOfRequests, and testSpacedOutBatchesOfRequests tests still submit requests in a tight loop. The testRapidSequentialRequests test behaves correctly.

I'm not sure what you mean. The tests are all doing what I intend them to do.

openjdk version "1.8.0_342"
OpenJDK Runtime Environment Corretto-8.342.07.4 (build 1.8.0_342-b07)
OpenJDK 64-Bit Server VM Corretto-8.342.07.4 (build 25.342-b07, mixed mode)

I'm really glad you posted this. I'm running with Corretto-8.462.08.1 (build 1.8.0_462-b08). Just for grins, I tried the reproducer with corretto-8.342.07.3, installed via https://github.com/jdx/mise (handy dandy!), and it's not reproducing the problem. I bet if you grab the latest version you'll be able to reproduce the problem. If you install mise, you can do this by running:

mise shell [email protected]
./mvnw -Pslow-tests,-use-toolchains

rschmitt · 2025-08-27T17:23:31Z

I bisected the Corretto 8 releases available from mise. It looks like corretto-8.392.08.1 is the most recent release that does not reproduce the problem. The next available release, corretto-8.402.07.1, exhibits the issue.

rschmitt · 2025-08-27T18:00:36Z

No obvious smoking gun in the release notes: https://builds.shipilev.net/backports-monitor/release-notes-openjdk8u402.html

ok2c · 2025-08-27T18:43:32Z

I'm not sure what you mean.

@rschmitt Okay. I will have to be blunt, then

The tests are all doing what I intend them to do.

Then, in their present form the tests are meaningless / useless for HTTP/1.1 due to the request pipelining. There is no point committing them in their present shape

Some do

                final Future<SimpleHttpResponse> f = sendPing(client);
                try {
                    f.get();
                } catch (final ExecutionException ignore) {
                }
                futures.add(f);

which is correct.

Others do

                futures.add(sendPing(client));

which is wrong in the given context due to the use of request pipleining with the HTTP/1.1 transport.

I bet if you grab the latest version you'll be able to reproduce the problem.

I will get the latest Java 8 JRE and will try out to reproduce the defect with it.

rschmitt · 2025-08-27T21:27:58Z

Oh, duh, I see your point now. You're saying that the HTTP/1.1 tests think they're testing a race between connection leasing and half-close, but they're actually testing a race within the duplexer, between the write path (send the next pipelined request) and the read path (read and process connection closure). And that's not a meaningful race, except maybe for determining which exception ends up getting thrown.

What I don't understand, then, is why the use of StaleCheckCommand showed such an improvement when combined with greedy reads in the HTTP/1.1 duplexer (i.e. read the input buffer until it returns 0 or -1). If futures.add(sendPing(client)) causes the request to immediately be pipelined on an assigned connection, then StaleCheckCommand shouldn't make a difference, but in fact it makes a dramatic difference.

When the connection pool is full, does the async client ever just wait for a connection to become available, instead of pipelining the request onto an existing connection? Alternatively, is there logic within the client (independent of any RetryStrategy) that can take unsent pipelined requests and reassign them to another connection?

I'll also check and see whether I can replicate my earlier experiment.

ok2c · 2025-08-28T11:05:41Z

When the connection pool is full, does the async client ever just wait for a connection to become available, instead of pipelining the request onto an existing connection?

@rschmitt I made a mistake. I was looking at the problem at the core protocol level but I forgot about the connection pool preventing request pipelining at the client level by leasing connections to a single request at a time.

My point still stands, though. The test cases behave inconsistently. This needs to be fixed.

ok2c · 2025-08-28T11:53:01Z

@rschmitt And no luck with the reproducer

h2: Validation disabled: Sequential requests (rapid): 4,998 succeeded; 2 failed (99.96% success rate, 0.04% retriable)
h2: Validation enabled:  Sequential requests (rapid): 5,000 succeeded; 0 failed (100.00% success rate)

openjdk version "1.8.0_462"
OpenJDK Runtime Environment Corretto-8.462.08.1 (build 1.8.0_462-b08)
OpenJDK 64-Bit Server VM Corretto-8.462.08.1 (build 25.462-b08, mixed mode)

core: 0e22b407c741f6f1af73bd199abf4e7c329c9df7
client: f176d922df466bf06924f2b19474352141bafa7c

ok2c · 2025-09-14T20:00:50Z

@rschmitt I can reliably reproduce the lock-up on one of my client's Windows machines. I will be looking into it.

ok2c · 2025-09-15T10:31:46Z

@rschmitt This apache/httpcomponents-core@ac42dec should fix the immediate problem with a race between connection status and command submission. This apache/httpcomponents-core@5de2967 should make sure that #disconnected method of the protocol handler gets executed as late as possible after the underlying i/o session has been fully closed out

Both master and 5.3.x no longer get stuck for me due to a command left in a limbo by a race with connection shutdown.

h2: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Sequential requests (rapid): 2,464 succeeded; 36 failed (98.56% success rate, 1.44% retriable)
h2: Validation enabled:  Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 30.00% retriable)
h2: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 20.00% retriable)
h2: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

h2c: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2c: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Sequential requests (rapid): 2,180 succeeded; 320 failed (87.20% success rate, 12.76% retriable)
h2c: Validation enabled:  Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 50.00% retriable)
h2c: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 26.67% retriable)
h2c: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

https: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
https: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
https: Validation disabled: Sequential requests (rapid): 1,305 succeeded; 1,195 failed (52.20% success rate, 47.60% retriable)
https: Validation enabled:  Sequential requests (rapid): 1,316 succeeded; 1,184 failed (52.64% success rate, 47.32% retriable)
https: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
https: Validation enabled:  Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
https: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 20.00% retriable)
https: Validation enabled:  Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 6.67% retriable)

http: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation disabled: Sequential requests (rapid): 1,313 succeeded; 1,187 failed (52.52% success rate, 38.68% retriable)
http: Validation enabled:  Sequential requests (rapid): 1,332 succeeded; 1,168 failed (53.28% success rate, 38.56% retriable)
http: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation enabled:  Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)
http: Validation enabled:  Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)

ok2c · 2025-09-15T12:08:32Z

...nt5-testing/src/test/java/org/apache/hc/client5/testing/async/TestConnectionClosureRace.java

+        try (final CloseableHttpAsyncClient client = asyncClient(validateConnections)) {
+            final List<Future<SimpleHttpResponse>> futures = new ArrayList<>();
+            for (int i = 0; i < 2500; i++) {
+                final Future<SimpleHttpResponse> f = sendPing(client);


@rschmitt If you want my OK to merge this change-set please help me understand why do

final Future<SimpleHttpResponse> f = sendPing(client); try { f.get(); } catch (final ExecutionException ignore) { } futures.add(f);

in this instance and do

futures.add(sendPing(client));

in other cases?

The common idea in these tests is to submit batches of concurrent requests and then wait for them to finish:

Test case Batch size # of batches Between batches

testSlowSequentialRequests 1 10 Sleep

testRapidSequentialRequests 1 2500 Block

testOneLargeBatchOfRequests 30 1 N/A

testSpacedOutBatchesOfRequests 3 5 Sleep

testRapidSequentialRequests could Thread.sleep(25) instead, but blocking on the future is faster. Similarly, the other requests could block on all the futures in a batch, instead of simply sleeping in order to allow the client to quiesce.

testRapidSequentialRequests could Thread.sleep(25) instead, but blocking on the future is faster.

@rschmitt I still do not understand why all tests could not behave consistently, and why a faster future is better than Thread.sleep(25) but so be it.

This isn't a test case so much as a reproducer that tries to force a race between client-side connection reuse and server-initiated connection closure on various protocols. Since it's fairly slow and resource-intensive, I have marked it with `@Tag("slow")`. The test can be run directly from an IDE, or from the command line using the `slow-tests` profile: ./mvnw -Pslow-tests

rschmitt · 2025-09-16T16:34:08Z

I verified that apache/httpcomponents-core@ac42dec and apache/httpcomponents-core@5de2967 fix the lock-up on my machine. Additionally, I tested b16d6d032d7e2b7e2f2b0f94c5d01375cb1ab21b, using StaleCheckCommand to implement setValidateAfterInactivity for HTTP/1.1 connections in PoolingAsyncClientConnectionManager. It significantly improves stale connection reuse for HTTPS but has no effect on HTTP. I'm satisfied with this, since most production usage these days is HTTPS. I think it would be ideal if the HTTP/1.1 duplexer implemented greedy reads, but I accept that there are other reasons why you don't want to do that.

https: Validation disabled: Sequential requests (rapid): 2,450 succeeded; 50 failed (98.00% success rate, 1.96% retriable)
https: Validation enabled:  Sequential requests (rapid): 2,499 succeeded; 1 failed (99.96% success rate, 0.00% retriable)
https: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 50.00% retriable)
https: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
https: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 26.67% retriable)
https: Validation enabled:  Multiple small batches: 14 succeeded; 1 failed (93.33% success rate, 0.00% retriable)

http: Validation disabled: Sequential requests (rapid): 2,475 succeeded; 25 failed (99.00% success rate, 0.76% retriable)
http: Validation enabled:  Sequential requests (rapid): 2,493 succeeded; 7 failed (99.72% success rate, 0.00% retriable)
http: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation enabled:  Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)
http: Validation enabled:  Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)

ok2c · 2025-09-16T17:04:12Z

I think it would be ideal if the HTTP/1.1 duplexer implemented greedy reads, but I accept that there are other reasons why you don't want to do that.

@rschmitt Wait a minute. I did make the HTTP/1.1 duplexer read incoming data greedily with apache/httpcomponents-core@be71c3a. What I did not want to do is to add an extra read at the end of the message only to make things marginally better for endpoints with broken HTTP/1.1 connection persistence while adding an overhead of an unnecessary i/o operation to well-behaving ones.

ok2c

@rschmitt I leave it up to you if you want to merge the change-set in its present state or not.

rschmitt · 2025-09-22T17:33:20Z

I leave it up to you if you want to merge the change-set in its present state or not.

I'll take another look at it. The test cases can probably be made more consistent (and therefore less confusing) without changing their behavior.

rschmitt force-pushed the race branch from 46b1641 to d51ab11 Compare August 18, 2025 21:02

rschmitt force-pushed the race branch from d51ab11 to 9fd4336 Compare August 25, 2025 19:28

ok2c requested changes Sep 15, 2025

View reviewed changes

rschmitt force-pushed the race branch from 9fd4336 to e3d5fea Compare September 16, 2025 16:20

ok2c reviewed Sep 16, 2025

View reviewed changes

Test case	Batch size	# of batches	Between batches
`testSlowSequentialRequests`	1	10	Sleep
`testRapidSequentialRequests`	1	2500	Block
`testOneLargeBatchOfRequests`	30	1	N/A
`testSpacedOutBatchesOfRequests`	3	5	Sleep

Add TestConnectionClosureRace #709

Are you sure you want to change the base?

Add TestConnectionClosureRace #709

Uh oh!

Conversation

rschmitt commented Aug 18, 2025

Uh oh!

ok2c commented Aug 19, 2025

Uh oh!

rschmitt commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ok2c commented Aug 21, 2025

Uh oh!

rschmitt commented Aug 23, 2025

Uh oh!

ok2c commented Aug 24, 2025

Uh oh!

ok2c commented Aug 24, 2025

Uh oh!

ok2c commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rschmitt commented Aug 25, 2025

Uh oh!

rschmitt commented Aug 25, 2025

Uh oh!

ok2c commented Aug 25, 2025

Uh oh!

ok2c commented Aug 25, 2025

Uh oh!

rschmitt commented Aug 25, 2025

Uh oh!

rschmitt commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ok2c commented Aug 26, 2025

Uh oh!

ok2c commented Aug 26, 2025

Uh oh!

rschmitt commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rschmitt commented Aug 27, 2025

Uh oh!

rschmitt commented Aug 27, 2025

Uh oh!

ok2c commented Aug 27, 2025

Uh oh!

rschmitt commented Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ok2c commented Aug 28, 2025

Uh oh!

ok2c commented Aug 28, 2025

Uh oh!

ok2c commented Sep 14, 2025

Uh oh!

ok2c commented Sep 15, 2025

Uh oh!

ok2c Sep 15, 2025

Choose a reason for hiding this comment

Uh oh!

rschmitt Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

ok2c Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

rschmitt commented Sep 16, 2025

Uh oh!

ok2c commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ok2c left a comment

Choose a reason for hiding this comment

Uh oh!

rschmitt commented Sep 22, 2025

Uh oh!

Uh oh!

rschmitt commented Aug 20, 2025 •

edited

Loading

ok2c commented Aug 25, 2025 •

edited

Loading

rschmitt commented Aug 25, 2025 •

edited

Loading

rschmitt commented Aug 27, 2025 •

edited

Loading

rschmitt commented Aug 27, 2025 •

edited

Loading

ok2c commented Sep 16, 2025 •

edited

Loading