Skip to content

Conversation

@qqmyers
Copy link
Member

@qqmyers qqmyers commented Sep 16, 2025

What this PR does / why we need it: When a user is given a role on the root collection (or has that role removed), we reindex all the datasets and files within it. Prior to this PR, that was all done in one transaction. This PR breaks the operation into multiple batches in separate transactions. This reduces the time and memory needed for reindex the permissions.

At QDR, on a ~low-memory test machine, reindexing after a role assignment on root was taking ~5-6 minutes. After the PR, reindexing takes ~30-35 seconds. (We have also seen the server slow to a crawl doing this permission reindex (at least 10 times slower) when (presumably) other issues have caused very low memory conditions. We're hopeful that batching the work here will also lower overall memory requirements for reindexing perms on root and help to avoid it contributing to/ be less affected by very low memory).

Which issue(s) this PR closes:

I thought there was an open issue, but I've only found #10697, #10698.

  • Closes #

Special notes for your reviewer: By default, marking a method as requiring a new transaction does nothing if the method is being called from within the same bean. However, by "self-injecting" the bean into itself and then calling the method on the injected copy it works. I've used this technique here (suggested by AI). The alternative is to break the bean into parts and make calls between them.

Suggestions on how to test this:
Run the /api/admin/index/status command to confirm there are no indexing issues to start. (Runthe /api/admin/index/clear-orphans command if needed to remove perm docs in solr that aren't in the db).

Go to the root dataverse and assign/revoke a role with fine logging on this class. Check the logs for the run time. Then do the same thing after the PR and confirm it's significantly faster.

Verify that there are no / no new issues reported by /api/admin/index/status .

[update: I have already tested the PR on our prod. database clone, see my comment below - L.A.]

Does this PR introduce a user interface change? If mockups are available, please link/include them here:

Is there a release notes update needed for this change?:

Additional documentation:

@qqmyers qqmyers moved this to Ready for Triage in IQSS Dataverse Project Sep 16, 2025
@qqmyers qqmyers added the Size: 3 A percentage of a sprint. 2.1 hours. label Sep 16, 2025
@qqmyers qqmyers added this to the 6.9 milestone Sep 16, 2025
@landreev landreev self-requested a review September 16, 2025 20:50
@landreev landreev moved this from Ready for Triage to In Review 🔎 in IQSS Dataverse Project Sep 17, 2025
@scolapasta scolapasta moved this from In Review 🔎 to Reviewed but Frozen ❄️ in IQSS Dataverse Project Sep 18, 2025
Copy link
Contributor

@landreev landreev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scary, that we've been reindexing potentially huge numbers of datasets in one transaction here. I will try to tweak some assignment on root on the perf system, and measure before and after.

@github-project-automation github-project-automation bot moved this from Reviewed but Frozen ❄️ to Ready for QA ⏩ in IQSS Dataverse Project Sep 18, 2025
@scolapasta scolapasta moved this from Ready for QA ⏩ to Reviewed but Frozen ❄️ in IQSS Dataverse Project Sep 22, 2025
@cmbz cmbz added FY26 Sprint 6 FY26 Sprint 6 (2025-09-10 - 2025-09-24) FY26 Sprint 7 FY26 Sprint 7 (2025-09-24 - 2025-10-08) labels Sep 24, 2025
@coveralls
Copy link

Coverage Status

coverage: 23.541% (+0.002%) from 23.539%
when pulling ef5065c on QualitativeDataRepository:QDR-perm-reindex-in-mult-transactions
into f79a02b on IQSS:develop.

@landreev
Copy link
Contributor

Testing the "before" part on the perf system - i.e., watching the root datasets/sub-collections reindexed under 6.8-RC.
The rate it's going at is quite abysmal.

@landreev
Copy link
Contributor

I tested the PR on the clone of the IQSS prod. db last week. This IS a huge potential problem for us.
I tried assigning a role on the root collection (30+K datasets to reindex).
With the develop build (6.8 release candidate), I had to give up on completing the reindexing. It went through ~6.1 K datasets in 24 hrs (!), and it was only getting slower (transaction context getting larger?). It started with ~1 sec./dataset:

  Progress : dataset 10650261 permissions reindexed in 1073 ms]]
  Progress : dataset 11704968 permissions reindexed in 1056 ms]]
  Progress : dataset 10650466 permissions reindexed in 1044 ms]]

and it was 10 sec. on average by the end of it:

  Progress : dataset 7437697 permissions reindexed in 9936 ms]]
  Progress : dataset 7438415 permissions reindexed in 10170 ms]]
  Progress : dataset 7438455 permissions reindexed in 9939 ms]]

That time also fluctuated significantly throughout the run; on account of the number of files in the dataset, I'm assuming. With the max. time of 162 sec. (!) recorded:

  Progress : dataset 6646655 permissions reindexed in 161889 ms]]

There are 40+ K files in the dataset above. For anyone who wants to nerd out over the exact numbers, the reindexing times for every dataset have been uploaded.

In short: this is clearly unacceptable. In addition to taking forever, it is entirely possible that it would eventually paralyze the db. It may not be a common real life scenario, changing assignments on root *). But we do have a real (non-harvested) collection with about 6K child datasets and it is very common, for a new user or group to be given an admin or curator etc. role on an active collection. *) we had to make an assignment on root too in recent memory, by giving members of a shib. group of trusted schools some extra rights.

Testing with this PR, the reindex ran at a constant speed and was completed in 100 minutes. (!!)
Thank you, Jim.
reindex_after_roleassignment_6.8-RC.txt

@pdurbin pdurbin moved this from Reviewed but behind develop ⬅️ to Ready for QA ⏩ in IQSS Dataverse Project Sep 29, 2025
@cmbz cmbz added the FY26 Sprint 8 FY26 Sprint 8 (2025-10-08 - 2025-10-22) label Oct 8, 2025
@ofahimIQSS ofahimIQSS self-assigned this Oct 14, 2025
@ofahimIQSS ofahimIQSS moved this from Ready for QA ⏩ to QA ✅ in IQSS Dataverse Project Oct 14, 2025
@ofahimIQSS
Copy link
Contributor

Test on dataverse-internal (v6.8, Payara 6.2025.3) — PR #11822

Scope: exercised “permission reindex after role change on root”.

Root DV: id=1. Roles (root-scoped) include admin (1), member (8), fullContributor (3), etc.

Current assignments on root already include:

:authenticated-users → fullContributor (roleId=3) [id=1834]

@kc032221 → member (8) [id=1835] and admin (1) [id=16073]

@dataverseAdmin → admin (1) [id=1]

Attempt to add another assignment via API returned:

400 Bad Request: "Can't find role named 'null'" (alias/roleId parsing quirk when posting to /api/dataverses/1/assignments).

Despite the above, indexing is responsive:

/api/admin/index/status → "Index Status Batch Job initiated, check log for job status."
No errors surfaced; instance remained responsive.

Result: No regressions observed. Root has existing assignments sufficient to trigger the PR’s path; index status shows batch job initiation as expected. Recommend a small follow-up to harden the assignments API against alias/roleId ambiguity (avoid "role named 'null'" when roleId is supplied).

Merging

@ofahimIQSS ofahimIQSS merged commit 7c476e2 into IQSS:develop Oct 16, 2025
14 checks passed
@github-project-automation github-project-automation bot moved this from QA ✅ to Merged 🚀 in IQSS Dataverse Project Oct 16, 2025
@ofahimIQSS ofahimIQSS removed their assignment Oct 16, 2025
@scolapasta scolapasta moved this from Merged 🚀 to Done 🧹 in IQSS Dataverse Project Oct 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

FY26 Sprint 6 FY26 Sprint 6 (2025-09-10 - 2025-09-24) FY26 Sprint 7 FY26 Sprint 7 (2025-09-24 - 2025-10-08) FY26 Sprint 8 FY26 Sprint 8 (2025-10-08 - 2025-10-22) Size: 3 A percentage of a sprint. 2.1 hours.

Projects

Status: Done 🧹

Development

Successfully merging this pull request may close these issues.

5 participants