Skip to content

Conversation

dweiss
Copy link
Contributor

@dweiss dweiss commented Sep 17, 2025

This moves more gradle script plugins to plain Java, also updating some of them to not use deprecated APIs.

It also removes apache rat dependency (rewritten to a custom, incremental task, #15185). I've removed owasp support entirely (#15114).

I've removed the code that checks that the license precedes opening tag in XMLs. I don't see the benefit of doing this here. Maybe we can do this on the CI with ast-grep (it doesn't support xml out of the box though).

Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.


private record LicenseFamily(String code, String name, Predicate<String> matcherPredicate) {}

static final List<LicenseFamily> LUCENE_ACCEPTED_LICENSES =
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the list of allowed licenses and their fixed-string detection patterns.


private LicenseFamily detectLicense(File file, StringBuilder buffer, char[] scratch)
throws IOException {
// I assume all files are in UTF8... This is verified elsewhere (eclint).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume all files are in UTF8... This is verified elsewhere (eclint)

Copy link
Contributor

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

@dweiss
Copy link
Contributor Author

dweiss commented Sep 17, 2025

Before:
image
After:
image

It's also incremental so once it runs, the cost of subsequent checks is close to zero.

project.fileTree(
".",
tree -> {
// Exclude build outputs, ide files, .git.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the future could we instead please consider using git ls-files for this?

I am afraid of when these long long lists of hardcoded excludes fall out of sync with .gitignore files, then checks become slow, or have false failures, or both.

I am only referring to .git/build/etc exclusions which should not be necessary. In 2025 all checkers should respect .gitignore :) We should only have the "real" excludes here particular to the specific check (things in source control)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the thing i am highly suspicious of: is this project.fileTree. Potential forbidden-apis candidate.

Some old build processes got slower over months/years in exactly this way. This can go unnoticed, for example if they only descend into a python venv, because there are often thousands and thousands of directories files in there. They don't have to actually match any of the files to slow everything down, that's a big part of the trap.

Maybe there is a gradle or java plugin we could use that behaves like https://docs.rs/ignore/latest/ignore/ or we can just use git ls-files.

Sorry for the rant, I see this as the number one cause of slow checkers these days, and it is easy to avoid.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are two downsides I see - one is that it'd no longer work on a source-only package (without .git). The other is that when you run a check but haven't added your files to git yet, it won't see them. I understand what you're after but it's a double edged sword - you then need to remember to git add (or commit) before you run the checks.

Definitely worth a thought though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First problem is solved by passing some additional parameters. For example, try git ls-files -c -o --exclude-standard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second problem is solved by running git init before running the checks. It hurts nothing if there is already a repo initialized (and documents this fact)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense - thanks! I think it's doable but I'll keep it for another patch. Makes sense to me though - reuse .gitignore as the source of truth about versioned files.

@dweiss dweiss merged commit 79a98b4 into apache:main Sep 17, 2025
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants