Add drainTimeout property to Http2Protocol/Http2UpgradeHandler #917
+28
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is to delay sending the second GOAWAY (with last seen stream id) after sending the first GOAWAY (with max stream id), in order to mitigate a race between the client already having buffered frames for new streams after the server has sent the first graceful GOAWAY frame and the second GOAWAY.
Once the second GOAWAY frame reaches the client, its own HTTP/2 client implementation may then itself signal a stream reset (rejected stream) error condition to the client application or might send the new forbidden stream frames to the server, where it then would be rejected.
Currently, Tomcat's Http2UpgradeHandler uses the last computed/determined Round-Trip-Time (RTT) and delays only by that amount (which might only be a few microseconds), following the suggestion of RFC 9113 of using "at least one round-trip time". However, due to network conditions and server load and imprecise timers due to OS thread scheduling delays or hypervisor-limited CPU credits (think Google Cloud Platform E2 machine types with scheduler throttling) one RTT might race with the client already having buffered up writes for new streams/requests before it had the chance to see either the first or the last GOAWAY sent by Tomcat.
Therefore, we implement the same behaviour as provided by the Envoy proxy's HTTP Connection Manager drain_timeout: Delay sending the second final GOAWAY after sending the first graceful GOAWAY to give the client enough time to react to the first GOAWAY and stop sending new streams and close the connection on its side first.
Testing this with an Envoy Proxy 1.36.2 client, when using the patched Tomcat server, all 503 errors generated by Envoy due to HTTP/2 stream reset errors completely vanished when doing a Kubernetes rolling update of a deployment. Before, we saw occasional 503 responses due to stream resets when Tomcat would return the final GOAWAY frame too fast.
Bugzilla Enhancement Ticket: https://bz.apache.org/bugzilla/show_bug.cgi?id=69870