Skip to content

Conversation

rschmitt
Copy link
Contributor

No description provided.

@ok2c
Copy link
Member

ok2c commented Aug 19, 2025

@rschmitt Does this really need to be a JUnit class? Does it need to be executed with each and every CI run? Can still just be a stand-alone executable with a main method?

@rschmitt
Copy link
Contributor Author

rschmitt commented Aug 20, 2025

@ok2c Currently, this class seems to be exposing a serious bug in the HTTP/2 client. testRapidSequentialRequests tends to hang on h2 and possibly h2c; the stack traces seem to indicate that there's a race condition that causes continuations to be dropped. My current intention is to submit this as a reproducer of this bug that runs with all the other tests. Once we find and fix that bug, I can mark this test @Disabled, add a @Tag, and/or convert it to a pure example class.

This class's tests run in about seven seconds on my machine, but I'm not planning on leaving it enabled beyond the short term, because it burns through a lot of sockets and ephemeral ports which can cause problems.

@ok2c
Copy link
Member

ok2c commented Aug 21, 2025

Currently, this class seems to be exposing a serious bug in the HTTP/2 client.

@rschmitt I have started a major revision of the H2 protocol handling code and have been making minor corrections / improvements to the H2 stream termination logic.

I am going to use your code as a reproducer for the defect you have been seeing. However I still do think there is little value in running this test with every CI run. This is a type of integration test one may want to execute prior to a GA release or after having done a major code refactoring.

I would really appreciate if you could convert these tests to a pure example class or mark them as @Disabled

@rschmitt
Copy link
Contributor Author

@ok2c OK

@ok2c
Copy link
Member

ok2c commented Aug 24, 2025

Currently, this class seems to be exposing a serious bug in the HTTP/2 client.

@rschmitt Curiously enough I am unable to reproduce the problem. I have run the test suite with the #testRapidSequentialRequests enabled against the latest core master and client master and all tests pass

h2: Validation disabled: Sequential requests (slow): 5 succeeded; 5 failed (50.00% success rate, 50.00% retriable)
h2: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Sequential requests (rapid): 2,499 succeeded; 1 failed (99.96% success rate, 0.04% retriable)
h2: Validation enabled:  Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 50.00% retriable)
h2: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Multiple small batches: 8 succeeded; 7 failed (53.33% success rate, 46.67% retriable)
h2: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

h2c: Validation disabled: Sequential requests (slow): 5 succeeded; 5 failed (50.00% success rate, 50.00% retriable)
h2c: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2c: Validation enabled:  Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 50.00% retriable)
h2c: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Multiple small batches: 8 succeeded; 7 failed (53.33% success rate, 46.67% retriable)
h2c: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

https: Validation disabled: Sequential requests (slow): 5 succeeded; 5 failed (50.00% success rate, 30.00% retriable)
https: Validation enabled:  Sequential requests (slow): 5 succeeded; 5 failed (50.00% success rate, 50.00% retriable)
https: Validation disabled: Sequential requests (rapid): 2,392 succeeded; 108 failed (95.68% success rate, 4.28% retriable)
https: Validation enabled:  Sequential requests (rapid): 2,457 succeeded; 43 failed (98.28% success rate, 1.72% retriable)
https: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 46.67% retriable)
https: Validation enabled:  Single large batch: 15 succeeded; 15 failed (50.00% success rate, 50.00% retriable)
https: Validation disabled: Multiple small batches: 8 succeeded; 7 failed (53.33% success rate, 46.67% retriable)
https: Validation enabled:  Multiple small batches: 8 succeeded; 7 failed (53.33% success rate, 46.67% retriable)

http: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation disabled: Sequential requests (rapid): 2,066 succeeded; 434 failed (82.64% success rate, 17.24% retriable)
http: Validation enabled:  Sequential requests (rapid): 2,304 succeeded; 196 failed (92.16% success rate, 7.72% retriable)
http: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation enabled:  Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)
http: Validation enabled:  Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)

I was also unable to reproduce the problem with the latest core 5.3.x and the latest client 5.5.x.

h2: Validation disabled: Sequential requests (slow): 5 succeeded; 5 failed (50.00% success rate, 50.00% retriable)
h2: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2: Validation enabled:  Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 50.00% retriable)
h2: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Multiple small batches: 8 succeeded; 7 failed (53.33% success rate, 46.67% retriable)
h2: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

h2c: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2c: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Sequential requests (rapid): 2,487 succeeded; 13 failed (99.48% success rate, 0.52% retriable)
h2c: Validation enabled:  Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 50.00% retriable)
h2c: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 33.33% retriable)
h2c: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

https: Validation disabled: Sequential requests (slow): 9 succeeded; 1 failed (90.00% success rate, 10.00% retriable)
https: Validation enabled:  Sequential requests (slow): 9 succeeded; 1 failed (90.00% success rate, 0.00% retriable)
https: Validation disabled: Sequential requests (rapid): 1,432 succeeded; 1,068 failed (57.28% success rate, 42.36% retriable)
https: Validation enabled:  Sequential requests (rapid): 1,447 succeeded; 1,053 failed (57.88% success rate, 42.04% retriable)
https: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 10.00% retriable)
https: Validation enabled:  Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
https: Validation disabled: Multiple small batches: 9 succeeded; 6 failed (60.00% success rate, 13.33% retriable)
https: Validation enabled:  Multiple small batches: 9 succeeded; 6 failed (60.00% success rate, 0.00% retriable)

http: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation disabled: Sequential requests (rapid): 2,286 succeeded; 214 failed (91.44% success rate, 8.52% retriable)
http: Validation enabled:  Sequential requests (rapid): 2,488 succeeded; 12 failed (99.52% success rate, 0.48% retriable)
http: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation enabled:  Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)
http: Validation enabled:  Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)

It is entirely possible that apache/httpcomponents-core@a202086 has fixed the cause of the problem but I am a bit doubtful. Could you please re-run the tests in your environment with the latest core master and let me know if the problem persists.

In the meantime I will be working on improvement of the stale check.

@ok2c
Copy link
Member

ok2c commented Aug 24, 2025

@rschmitt I am unable to reproduce the problem neither with core 5.3.4 (stable), nor 5.3.5-SNAPSHOT not 5.4-alpha1-SNAPSHOT. I need your help with the reproducer. I am traveling right now and only have my Ubuntu Linux box at my disposal. The problem appears to be OS / JRE specific

@ok2c
Copy link
Member

ok2c commented Aug 25, 2025

@rschmitt I believe I have found the underlying cause of the problem (or misunderstanding if you agree to see it that way)

Here is the thing. The HTTP/1.1 protocol handler by default aggressively pipelines request messages. What that basically means the client can write out n requests before the server gets a chance to process the first request and to drop the connection. By that time all n request messages may have already been committed to the execution pipeline

exchange 1: POST /ping
exchange 1: stale-check (success)
exchange 2: POST /ping
exchange 2: stale-check (success)
exchange 3: POST /ping
exchange 3: stale-check (success)
exchange 1: 200 OK
exchange 1: server closes connection
exchange 2: boom
exchange 3: boom

The stale connection check cannot help when pipelining is being employed because the connection is perfectly valid at the time of request initiation.

The H2 tests do not exhibit this problem because the server limits the number of concurrent streams to 1

        out.write(ff.createSettings(
...
            new H2Setting(H2Param.MAX_CONCURRENT_STREAMS, 1),
...
        ), outputStream);

I do not know if this is intentional or not but with the message pipelining not used, I get 100% reliability of message exchanges with both HTTP/1.1 and H2 using code with minor modifications to ensure request messages get executed sequentially

h2: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Sequential requests (rapid): 199 succeeded; 1 failed (99.50% success rate, 0.50% retriable)
h2: Validation enabled:  Sequential requests (rapid): 200 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)
h2: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

h2c: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2c: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Sequential requests (rapid): 200 succeeded; 0 failed (100.00% success rate)
h2c: Validation enabled:  Sequential requests (rapid): 200 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2c: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)
h2c: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

https: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
https: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
https: Validation disabled: Sequential requests (rapid): 182 succeeded; 18 failed (91.00% success rate, 9.00% retriable)
https: Validation enabled:  Sequential requests (rapid): 200 succeeded; 0 failed (100.00% success rate)
https: Validation disabled: Single large batch: 29 succeeded; 1 failed (96.67% success rate, 3.33% retriable)
https: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
https: Validation disabled: Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)
https: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

http: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation disabled: Sequential requests (rapid): 191 succeeded; 9 failed (95.50% success rate, 4.00% retriable)
http: Validation enabled:  Sequential requests (rapid): 200 succeeded; 0 failed (100.00% success rate)
http: Validation disabled: Single large batch: 29 succeeded; 1 failed (96.67% success rate, 0.00% retriable)
http: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
http: Validation disabled: Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)
http: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

@rschmitt
Copy link
Contributor Author

The HTTP/1.1 protocol handler by default aggressively pipelines request messages.

That's an important detail. How does this work? Does pipelining only kick in when the max connection pool size is reached? How does this interact with the connectionRequestTimeout? How can pipelining be disabled? Which version of the client enabled pipelining by default?

@rschmitt
Copy link
Contributor Author

The problem appears to be OS / JRE specific

Testing against apache/httpcomponents-core@be71c3a, I can reproduce the HTTP/2 hang on Linux (m5.2xlarge instance running Linux 5.10). I noticed that on macOS, it's much harder to reproduce the bug by running the tests on JDK21, so make sure you're running them on JDK8.

@ok2c
Copy link
Member

ok2c commented Aug 25, 2025

The HTTP/1.1 protocol handler by default aggressively pipelines request messages.

That's an important detail. How does this work? Does pipelining only kick in when the max connection pool size is reached? How does this interact with the connectionRequestTimeout? How can pipelining be disabled? Which version of the client enabled pipelining by default?

@rschmitt HTTP/1.1 protocol handler has been working like that since 5.0-alpha1. Requests have always been immediately written out to the connection as soon as submitted by the caller for execution. I do not see a reason why it should be different. It is fully protocol conformant while dropping connection without signalling it is not.

If you insist we can add a config knob to disable it.

@ok2c
Copy link
Member

ok2c commented Aug 25, 2025

The problem appears to be OS / JRE specific

Testing against apache/httpcomponents-core@be71c3a, I can reproduce the HTTP/2 hang on Linux (m5.2xlarge instance running Linux 5.10). I noticed that on macOS, it's much harder to reproduce the bug by running the tests on JDK21, so make sure you're running them on JDK8.

@rschmitt I am on vacation at the moment and will not be able to lay my hands on a different hardware until mid September. My Linux box is all I have. I always use Java 8 by default. I have never been able to reproduce it. Not once.

I will see if I can reproduce the lock-up on GitHub CI.

@rschmitt
Copy link
Contributor Author

I've amended the PR to rebase atop the latest origin/master and to move this test to a separate profile through the power of JUnit 5 tags. I figured that this is more likely to be useful in the future than special-casing this test, moving it to an example, or just marking it as @Disabled, especially if you need to reproduce it in GitHub CI or something.

@rschmitt
Copy link
Contributor Author

rschmitt commented Aug 25, 2025

FYI, I can also reproduce the h2 hang against httpcore 5.3.4, in addition to apache/httpcomponents-core@be71c3a. It's definitely timing-sensitive. Newer and faster JDKs are less likely to reproduce the problem, and h2c is less likely to hang than TLS 1.3, which is less likely to hang than TLS 1.2. I can also reproduce it on Windows.

@ok2c
Copy link
Member

ok2c commented Aug 26, 2025

@rschmitt The testSlowSequentialRequests, testOneLargeBatchOfRequests, and testSpacedOutBatchesOfRequests tests still submit requests in a tight loop. The testRapidSequentialRequests test behaves correctly.

@ok2c
Copy link
Member

ok2c commented Aug 26, 2025

@rschmitt I honestly do not know how to fix a problem I am unable to reproduce

h2: Validation disabled: Sequential requests (rapid): 4,998 succeeded; 2 failed (99.96% success rate, 0.04% retriable)
Linux ok2c 6.8.0-78-generic #78-Ubuntu SMP PREEMPT_DYNAMIC Tue Aug 12 11:34:18 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
openjdk version "1.8.0_342"
OpenJDK Runtime Environment Corretto-8.342.07.4 (build 1.8.0_342-b07)
OpenJDK 64-Bit Server VM Corretto-8.342.07.4 (build 25.342-b07, mixed mode)

@rschmitt
Copy link
Contributor Author

rschmitt commented Aug 27, 2025

The testSlowSequentialRequests, testOneLargeBatchOfRequests, and testSpacedOutBatchesOfRequests tests still submit requests in a tight loop. The testRapidSequentialRequests test behaves correctly.

I'm not sure what you mean. The tests are all doing what I intend them to do.

openjdk version "1.8.0_342"
OpenJDK Runtime Environment Corretto-8.342.07.4 (build 1.8.0_342-b07)
OpenJDK 64-Bit Server VM Corretto-8.342.07.4 (build 25.342-b07, mixed mode)

I'm really glad you posted this. I'm running with Corretto-8.462.08.1 (build 1.8.0_462-b08). Just for grins, I tried the reproducer with corretto-8.342.07.3, installed via https://github.com/jdx/mise (handy dandy!), and it's not reproducing the problem. I bet if you grab the latest version you'll be able to reproduce the problem. If you install mise, you can do this by running:

mise shell [email protected]
./mvnw -Pslow-tests,-use-toolchains

@rschmitt
Copy link
Contributor Author

I bisected the Corretto 8 releases available from mise. It looks like corretto-8.392.08.1 is the most recent release that does not reproduce the problem. The next available release, corretto-8.402.07.1, exhibits the issue.

@rschmitt
Copy link
Contributor Author

No obvious smoking gun in the release notes: https://builds.shipilev.net/backports-monitor/release-notes-openjdk8u402.html

@ok2c
Copy link
Member

ok2c commented Aug 27, 2025

I'm not sure what you mean.

@rschmitt Okay. I will have to be blunt, then

The tests are all doing what I intend them to do.

Then, in their present form the tests are meaningless / useless for HTTP/1.1 due to the request pipelining. There is no point committing them in their present shape

Some do

                final Future<SimpleHttpResponse> f = sendPing(client);
                try {
                    f.get();
                } catch (final ExecutionException ignore) {
                }
                futures.add(f);

which is correct.

Others do

                futures.add(sendPing(client));

which is wrong in the given context due to the use of request pipleining with the HTTP/1.1 transport.

I bet if you grab the latest version you'll be able to reproduce the problem.

I will get the latest Java 8 JRE and will try out to reproduce the defect with it.

@rschmitt
Copy link
Contributor Author

rschmitt commented Aug 27, 2025

Oh, duh, I see your point now. You're saying that the HTTP/1.1 tests think they're testing a race between connection leasing and half-close, but they're actually testing a race within the duplexer, between the write path (send the next pipelined request) and the read path (read and process connection closure). And that's not a meaningful race, except maybe for determining which exception ends up getting thrown.

What I don't understand, then, is why the use of StaleCheckCommand showed such an improvement when combined with greedy reads in the HTTP/1.1 duplexer (i.e. read the input buffer until it returns 0 or -1). If futures.add(sendPing(client)) causes the request to immediately be pipelined on an assigned connection, then StaleCheckCommand shouldn't make a difference, but in fact it makes a dramatic difference.

When the connection pool is full, does the async client ever just wait for a connection to become available, instead of pipelining the request onto an existing connection? Alternatively, is there logic within the client (independent of any RetryStrategy) that can take unsent pipelined requests and reassign them to another connection?

I'll also check and see whether I can replicate my earlier experiment.

@ok2c
Copy link
Member

ok2c commented Aug 28, 2025

When the connection pool is full, does the async client ever just wait for a connection to become available, instead of pipelining the request onto an existing connection?

@rschmitt I made a mistake. I was looking at the problem at the core protocol level but I forgot about the connection pool preventing request pipelining at the client level by leasing connections to a single request at a time.

My point still stands, though. The test cases behave inconsistently. This needs to be fixed.

@ok2c
Copy link
Member

ok2c commented Aug 28, 2025

@rschmitt And no luck with the reproducer

h2: Validation disabled: Sequential requests (rapid): 4,998 succeeded; 2 failed (99.96% success rate, 0.04% retriable)
h2: Validation enabled:  Sequential requests (rapid): 5,000 succeeded; 0 failed (100.00% success rate)
openjdk version "1.8.0_462"
OpenJDK Runtime Environment Corretto-8.462.08.1 (build 1.8.0_462-b08)
OpenJDK 64-Bit Server VM Corretto-8.462.08.1 (build 25.462-b08, mixed mode)
core: 0e22b407c741f6f1af73bd199abf4e7c329c9df7
client: f176d922df466bf06924f2b19474352141bafa7c

@ok2c
Copy link
Member

ok2c commented Sep 14, 2025

@rschmitt I can reliably reproduce the lock-up on one of my client's Windows machines. I will be looking into it.

@ok2c
Copy link
Member

ok2c commented Sep 15, 2025

@rschmitt This apache/httpcomponents-core@ac42dec should fix the immediate problem with a race between connection status and command submission. This apache/httpcomponents-core@5de2967 should make sure that #disconnected method of the protocol handler gets executed as late as possible after the underlying i/o session has been fully closed out

Both master and 5.3.x no longer get stuck for me due to a command left in a limbo by a race with connection shutdown.

h2: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Sequential requests (rapid): 2,464 succeeded; 36 failed (98.56% success rate, 1.44% retriable)
h2: Validation enabled:  Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 30.00% retriable)
h2: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 20.00% retriable)
h2: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

h2c: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2c: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Sequential requests (rapid): 2,180 succeeded; 320 failed (87.20% success rate, 12.76% retriable)
h2c: Validation enabled:  Sequential requests (rapid): 2,500 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 50.00% retriable)
h2c: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
h2c: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 26.67% retriable)
h2c: Validation enabled:  Multiple small batches: 15 succeeded; 0 failed (100.00% success rate)

https: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
https: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
https: Validation disabled: Sequential requests (rapid): 1,305 succeeded; 1,195 failed (52.20% success rate, 47.60% retriable)
https: Validation enabled:  Sequential requests (rapid): 1,316 succeeded; 1,184 failed (52.64% success rate, 47.32% retriable)
https: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
https: Validation enabled:  Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
https: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 20.00% retriable)
https: Validation enabled:  Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 6.67% retriable)

http: Validation disabled: Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation enabled:  Sequential requests (slow): 10 succeeded; 0 failed (100.00% success rate)
http: Validation disabled: Sequential requests (rapid): 1,313 succeeded; 1,187 failed (52.52% success rate, 38.68% retriable)
http: Validation enabled:  Sequential requests (rapid): 1,332 succeeded; 1,168 failed (53.28% success rate, 38.56% retriable)
http: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation enabled:  Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)
http: Validation enabled:  Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)

try (final CloseableHttpAsyncClient client = asyncClient(validateConnections)) {
final List<Future<SimpleHttpResponse>> futures = new ArrayList<>();
for (int i = 0; i < 2500; i++) {
final Future<SimpleHttpResponse> f = sendPing(client);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschmitt If you want my OK to merge this change-set please help me understand why do

                final Future<SimpleHttpResponse> f = sendPing(client);
                try {
                    f.get();
                } catch (final ExecutionException ignore) {
                }
                futures.add(f);

in this instance and do

                futures.add(sendPing(client));

in other cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The common idea in these tests is to submit batches of concurrent requests and then wait for them to finish:

Test case Batch size # of batches Between batches
testSlowSequentialRequests 1 10 Sleep
testRapidSequentialRequests 1 2500 Block
testOneLargeBatchOfRequests 30 1 N/A
testSpacedOutBatchesOfRequests 3 5 Sleep

testRapidSequentialRequests could Thread.sleep(25) instead, but blocking on the future is faster. Similarly, the other requests could block on all the futures in a batch, instead of simply sleeping in order to allow the client to quiesce.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

testRapidSequentialRequests could Thread.sleep(25) instead, but blocking on the future is faster.

@rschmitt I still do not understand why all tests could not behave consistently, and why a faster future is better than Thread.sleep(25) but so be it.

This isn't a test case so much as a reproducer that tries to force a
race between client-side connection reuse and server-initiated
connection closure on various protocols. Since it's fairly slow and
resource-intensive, I have marked it with `@Tag("slow")`. The test can
be run directly from an IDE, or from the command line using the
`slow-tests` profile:

  ./mvnw -Pslow-tests
@rschmitt
Copy link
Contributor Author

I verified that apache/httpcomponents-core@ac42dec and apache/httpcomponents-core@5de2967 fix the lock-up on my machine. Additionally, I tested b16d6d032d7e2b7e2f2b0f94c5d01375cb1ab21b, using StaleCheckCommand to implement setValidateAfterInactivity for HTTP/1.1 connections in PoolingAsyncClientConnectionManager. It significantly improves stale connection reuse for HTTPS but has no effect on HTTP. I'm satisfied with this, since most production usage these days is HTTPS. I think it would be ideal if the HTTP/1.1 duplexer implemented greedy reads, but I accept that there are other reasons why you don't want to do that.

https: Validation disabled: Sequential requests (rapid): 2,450 succeeded; 50 failed (98.00% success rate, 1.96% retriable)
https: Validation enabled:  Sequential requests (rapid): 2,499 succeeded; 1 failed (99.96% success rate, 0.00% retriable)
https: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 50.00% retriable)
https: Validation enabled:  Single large batch: 30 succeeded; 0 failed (100.00% success rate)
https: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 26.67% retriable)
https: Validation enabled:  Multiple small batches: 14 succeeded; 1 failed (93.33% success rate, 0.00% retriable)

http: Validation disabled: Sequential requests (rapid): 2,475 succeeded; 25 failed (99.00% success rate, 0.76% retriable)
http: Validation enabled:  Sequential requests (rapid): 2,493 succeeded; 7 failed (99.72% success rate, 0.00% retriable)
http: Validation disabled: Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation enabled:  Single large batch: 15 succeeded; 15 failed (50.00% success rate, 0.00% retriable)
http: Validation disabled: Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)
http: Validation enabled:  Multiple small batches: 10 succeeded; 5 failed (66.67% success rate, 0.00% retriable)

@ok2c
Copy link
Member

ok2c commented Sep 16, 2025

I think it would be ideal if the HTTP/1.1 duplexer implemented greedy reads, but I accept that there are other reasons why you don't want to do that.

@rschmitt Wait a minute. I did make the HTTP/1.1 duplexer read incoming data greedily with apache/httpcomponents-core@be71c3a. What I did not want to do is to add an extra read at the end of the message only to make things marginally better for endpoints with broken HTTP/1.1 connection persistence while adding an overhead of an unnecessary i/o operation to well-behaving ones.

Copy link
Member

@ok2c ok2c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rschmitt I leave it up to you if you want to merge the change-set in its present state or not.

@rschmitt
Copy link
Contributor Author

I leave it up to you if you want to merge the change-set in its present state or not.

I'll take another look at it. The test cases can probably be made more consistent (and therefore less confusing) without changing their behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants