-
Notifications
You must be signed in to change notification settings - Fork 324
Do not reuse the same refresh token multiple times #1309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- The "add a UAA token to HTTP request headers" flow works like this:
- User makes a request using AbstractReactorOperations -> Operator
- This adds an "Authorization" header, using Provider#getToken mono ;
if a value is cached it uses that, otherwise it uses whatever flow
is available to request a token.
- If the response is unauthorized, it means the access token is
expired, and the Operator calls Provider#invalidate ; and then
retries the request, which will trigger another #getToken call.
- There was a race condition, when an access_token was cached and
multiple request used it concurrently, they would all call
AbstractUaaTokenProvider#invalidate, and all reuse the same refresh
token. This is an issue when the UAA is configured with non-reusable
refresh tokens (revocable + rotating or unique), only the first
refresh token request succeeds, and all other refresh token requests
fail.
- This PR addresses this by ensuring that the cached refresh token is
removed from the cache right before being used. Any other call to
#invalidate will be a no-op.
- This is NOT a perfect fix, and there are some smaller scale race
conditions happening. For example, #invalidate calls
refreshTokens.remove and accessTokens.put sequentially. It is possible
that a concurrent request calls invalidate, finds the refreshTokens
cache empty, and then will populate accessTokens through #getToken ;
in that case there could be a race condition and two tokens fetched.
- Re-architecting the whole token logic is too big of a lift for the
project, so we accept that this solution is not perfect - as long as
the issues are recoverable.
- Fixes #1146
Signed-off-by: Daniel Garnier-Moiroux <[email protected]>
|
@eaglerainbow here's a different take on #1300, that leverages the If you could test this code with your own scenarios, that'd be great. I've tested it against your I've spent some time with a Project Reactor contributor discussing #1146 . Note that this is NOT perfect, but achieving perfect here would require a significant rework of the entire |
|
Affirm. |
|
@eaglerainbow please take your time, it took me over a month to get to this issue 😅 |
I just built a first new version of the main project where Having built a second version that does no longer contain my workaround, I observe even slightly higher CPU loads (all other environmental things are exactly the same). #1146 will not surface right away in my environment. I keep the installed second version running for now. It will take at least a week of runtime until we can be confident enough that we have caught the bug. @Kehrlann I keep you posted on the results. |
|
If I understand correctly:
Could you please take a look at the CPU consumption with a build using 5.14, but not our fixes? |
|
That's correct - at least this was my initial impression. For clarity: Item 1 as of your list was with a
Will do. I will only do that towards the end of this week. Background is that I don't want to interrupt the ongoing experiment by stopping the application (and for comparable values, I would have to do). Intermediate report: No major outage yet. It first appeared a little after 24hrs of the start - and the next instance was another 24hrs later (which is the one I have copied here). |
|
thanks for the detailed report! I’m surprised at the CPU usage, as the only change I’ve added triggers when we invalidate a token, so, worst case scenario, it should only impact the app every X hours on token refresh. I’ll think about it. This log error is probably because the refresh token does not rotate in the UAA, and expires after 1 day. |
It's been a whole week since the installation. I did not have a single hanging of the test-promregator during that time frame. That makes me confident that this PR here really caught the issue.
If you started, don't continue on this.
The yellow bar splits the values at the point in time when I installed the test version (i.e. to the left, it's the "old version", to the right it's the version under test). What I might have seen before is within the block of the blue box in the diagram above. Zooming in there, you observe quite a series of unusual peaks (note that the way of calculating the diagram implies a certain "moving average effect", so individual readings may even be significant higher). In my scenario these peaks could have all kind of different root causes. I would not bet a dollar to a doughnut that this even originates from
Yep, sounds like that. As you can auto-recover from there using the Password Grant again, the error message in the log is a little ugly, but essentially harmless. Given all this, I suggest to merge this PR. I am looking forward to a new release so that I also can officially bring this into promregator so that all its users can benefit from it. Thanks for your support! |
|
If merged, this will closes #1300 |
|
Thank you very much for running a real, live experiment; and for thoroughly reporting the results and observations 🙇 Fixing some pipelines issues on my side, and then I'll merge this. I have a couple of PR to get merged in as well before I cut a release, but it shouldn't take too long. |
Take your time. The issue has already celebrated its third anniversary, so a couple of days or weeks more or less doesn't make a big difference anymore 😝 |


Description
Fix for #1146. Tested with the "load test" application below ; using
logging.level.cloudfoundry-client.token=DEBUGto observe which tokens are being requested.Details in the commit message.
Commit message
Appendix: load test app
This is not optimal and does too many things, but it showcases the bug. It does the following:
uaaidentity zone configtest-load-test-load-*, and a space in eachRun with
logging.level.cloudfoundry-client.token=DEBUGto observe when tokens are requested.