Skip to content

Conversation

@kaiburjack
Copy link

@kaiburjack kaiburjack commented Nov 5, 2025

This is to delay sending the second GOAWAY (with last seen stream id) after sending the first GOAWAY (with max stream id), in order to mitigate a race between the client already having buffered frames for new streams after the server has sent the first graceful GOAWAY frame and the second GOAWAY.

Once the second GOAWAY frame reaches the client, its own HTTP/2 client implementation may then itself signal a stream reset (rejected stream) error condition to the client application or might send the new forbidden stream frames to the server, where it then would be rejected.

Currently, Tomcat's Http2UpgradeHandler uses the last computed/determined Round-Trip-Time (RTT) and delays only by that amount (which might only be a few microseconds), following the suggestion of RFC 9113 of using "at least one round-trip time". However, due to network conditions and server load and imprecise timers due to OS thread scheduling delays or hypervisor-limited CPU credits (think Google Cloud Platform E2 machine types with scheduler throttling) one RTT might race with the client already having buffered up writes for new streams/requests before it had the chance to see either the first or the last GOAWAY sent by Tomcat.

Therefore, we implement the same behaviour as provided by the Envoy proxy's HTTP Connection Manager drain_timeout: Delay sending the second final GOAWAY after sending the first graceful GOAWAY to give the client enough time to react to the first GOAWAY and stop sending new streams and close the connection on its side first.

Testing this with an Envoy Proxy 1.36.2 client, when using the patched Tomcat server, all 503 errors generated by Envoy due to HTTP/2 stream reset errors completely vanished when doing a Kubernetes rolling update of a deployment. Before, we saw occasional 503 responses due to stream resets when Tomcat would return the final GOAWAY frame too fast.

Bugzilla Enhancement Ticket: https://bz.apache.org/bugzilla/show_bug.cgi?id=69870

This is to delay sending the second GOAWAY (with last seen stream id)
after sending the first GOAWAY (with max stream id), in order to
mitigate a race between the client already having buffered frames
for new streams after the server has sent the first graceful GOAWAY
frame and the second GOAWAY.

Once the second GOAWAY frame reaches the client, its own HTTP/2 client
implementation may then itself signal a stream reset (rejected stream)
error condition to the client application or might send the new
forbidden stream frames to the server, where it then would be rejected.

Therefore, we implement the same behaviour as provided by the Envoy
proxy's HTTP Connection Manager drain_timeout: Delay sending the second
final GOAWAY after sending the first graceful GOAWAY to give the client
enough time to react to the first GOAWAY and stop sending new streams
and close the connection on its side first.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant