-
Notifications
You must be signed in to change notification settings - Fork 529
QDR: Multiple transactions for permission reindexing #11822
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
QDR: Multiple transactions for permission reindexing #11822
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Scary, that we've been reindexing potentially huge numbers of datasets in one transaction here. I will try to tweak some assignment on root on the perf system, and measure before and after.
…mult-transactions
|
Testing the "before" part on the perf system - i.e., watching the root datasets/sub-collections reindexed under 6.8-RC. |
|
I tested the PR on the clone of the IQSS prod. db last week. This IS a huge potential problem for us. and it was 10 sec. on average by the end of it: That time also fluctuated significantly throughout the run; on account of the number of files in the dataset, I'm assuming. With the max. time of 162 sec. (!) recorded: There are 40+ K files in the dataset above. For anyone who wants to nerd out over the exact numbers, the reindexing times for every dataset have been uploaded. In short: this is clearly unacceptable. In addition to taking forever, it is entirely possible that it would eventually paralyze the db. It may not be a common real life scenario, changing assignments on root *). But we do have a real (non-harvested) collection with about 6K child datasets and it is very common, for a new user or group to be given an admin or curator etc. role on an active collection. *) we had to make an assignment on root too in recent memory, by giving members of a shib. group of trusted schools some extra rights. Testing with this PR, the reindex ran at a constant speed and was completed in 100 minutes. (!!) |
|
Test on dataverse-internal (v6.8, Payara 6.2025.3) — PR #11822 Scope: exercised “permission reindex after role change on root”. Root DV: id=1. Roles (root-scoped) include admin (1), member (8), fullContributor (3), etc. Current assignments on root already include: :authenticated-users → fullContributor (roleId=3) [id=1834] @kc032221 → member (8) [id=1835] and admin (1) [id=16073] @dataverseAdmin → admin (1) [id=1] Attempt to add another assignment via API returned: 400 Bad Request: "Can't find role named 'null'" (alias/roleId parsing quirk when posting to /api/dataverses/1/assignments). Despite the above, indexing is responsive: /api/admin/index/status → "Index Status Batch Job initiated, check log for job status." Result: No regressions observed. Root has existing assignments sufficient to trigger the PR’s path; index status shows batch job initiation as expected. Recommend a small follow-up to harden the assignments API against alias/roleId ambiguity (avoid "role named 'null'" when roleId is supplied). Merging |
What this PR does / why we need it: When a user is given a role on the root collection (or has that role removed), we reindex all the datasets and files within it. Prior to this PR, that was all done in one transaction. This PR breaks the operation into multiple batches in separate transactions. This reduces the time and memory needed for reindex the permissions.
At QDR, on a ~low-memory test machine, reindexing after a role assignment on root was taking ~5-6 minutes. After the PR, reindexing takes ~30-35 seconds. (We have also seen the server slow to a crawl doing this permission reindex (at least 10 times slower) when (presumably) other issues have caused very low memory conditions. We're hopeful that batching the work here will also lower overall memory requirements for reindexing perms on root and help to avoid it contributing to/ be less affected by very low memory).
Which issue(s) this PR closes:
I thought there was an open issue, but I've only found #10697, #10698.
Special notes for your reviewer: By default, marking a method as requiring a new transaction does nothing if the method is being called from within the same bean. However, by "self-injecting" the bean into itself and then calling the method on the injected copy it works. I've used this technique here (suggested by AI). The alternative is to break the bean into parts and make calls between them.
Suggestions on how to test this:
Run the /api/admin/index/status command to confirm there are no indexing issues to start. (Runthe /api/admin/index/clear-orphans command if needed to remove perm docs in solr that aren't in the db).
Go to the root dataverse and assign/revoke a role with fine logging on this class. Check the logs for the run time. Then do the same thing after the PR and confirm it's significantly faster.
Verify that there are no / no new issues reported by /api/admin/index/status .
[update: I have already tested the PR on our prod. database clone, see my comment below - L.A.]
Does this PR introduce a user interface change? If mockups are available, please link/include them here:
Is there a release notes update needed for this change?:
Additional documentation: