[QueryCache] Use RamUsageEstimator instead of defaulting to 1024 bytes for non-accountable query size #15124

sgup432 · 2025-08-26T02:16:35Z

Description

Related issue - #15097

Instead of using the default 1024 bytes for query size, we try to use RamUsageEstimator to calculate approx size. As seen in above issues, using 1024 as a default can cause issues(like heap exhaustion) due to underestimation, and even cause overestimation leading to fewer queries getting cached.

RamUsageEstimator.sizeOf() is usually pretty fast, I made a sample BooleanQuery with 15 clauses with a size of around 3kb. It took around 14959ns to calculate its size via RamUsageEstimator.sizeOf().

…1024 bytes for non accountable queries

github-actions · 2025-08-26T02:17:31Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java

sgup432 · 2025-08-26T20:59:18Z

@msfroh Would you mind taking a look at this?

github-actions · 2025-08-26T20:59:25Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

benwtrent · 2025-08-27T13:28:22Z

lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java

+    long queryRamBytesUsed = RamUsageEstimator.sizeOf(query, 0);
+    return LINKED_HASHTABLE_RAM_BYTES_PER_ENTRY + queryRamBytesUsed;


I realize you have a small benchmark that you did, but real world performance is still a concern of mine here.

I am not sure how to benchmark this, but my concern is that non-accountable queries just got more expensive, we calculate memory used for a query on cache and eviction

Yeah I just timed one 3kb boolean query.

I am not sure how to benchmark this, but my concern is that non-accountable queries just got more expensive

The only way I see it is to micro-benchmark RamUsageEstimator with different query types and size, might give more idea. I had tried to do same but with only term and boolean queries.

but my concern is that non-accountable queries just got more expensive

I am not sure whether it would be that expensive(atleast in absolute terms). Like for TermQuery and similar cheaper queries, RamUsageEstimator should be pretty fast considering we already cache ramBytesUsed for the underlying terms etc. In my opinion, complex BooleanQueries with many nested clauses might be more expensive, as we need to visit all the underlying children for size calculation. To handle that, we already have a limit of 16 clauses beyond which we can cannot cache, and it can be refined further if needed.

I don't see any other good way of calculating query size until we implement Accountable for all queries, but it would too intrusive for just the caching use-case.

@benwtrent So I wrote a simple code to micro-benchmark putIfAbsent method which is the one which calls getRamBytesUsed(Query query) during caching and eviction.
Created a cache with MAX_SIZE = 10000 and MAX_SIZE_IN_BYTES = 1048576, and created N sample boolean queries and the logic looks like something below

for (int i = 0; i < MAX_SIZE; i++) { TermQuery must = new TermQuery(new Term("foo", "bar" + i)); TermQuery filter = new TermQuery(new Term("foo", "quux" + i)); TermQuery mustNot = new TermQuery(new Term("foo", "foo" + i)); BooleanQuery.Builder bq = new BooleanQuery.Builder(); bq.add(must, BooleanClause.Occur.FILTER); bq.add(filter, BooleanClause.Occur.FILTER); bq.add(mustNot, BooleanClause.Occur.MUST_NOT); queries[i] = bq.build(); }

JMH method to test 100% writes workload for 10 threads

@Benchmark @Group("concurrentPutOnly") @GroupThreads(10) public void testConcurrentPuts() { int random = ThreadLocalRandom.current().nextInt(MAX_SIZE); queryCache.putIfAbsent( queries[random], this.sampleCacheAndCount, cacheHelpers[random & (SEGMENTS - 1)]); }

Baseline numbers

Benchmark Mode Cnt Score Error Units QueryCacheBenchmark.concurrentPutOnly thrpt 15 4102080.220 ± 80816.546 ops/s

My changes

Benchmark Mode Cnt Score Error Units QueryCacheBenchmark.concurrentPutOnly thrpt 15 901925.345 ± 38059.230 ops/s

So it became ~4.5x slower. There were lot of evictions as well. Though this probably one of the worst case scenario with only write workload and boolean query. For a mixed read/write or simple filter queries, this might be way less.
And this is the method profile (taken from JFR). Kind of expected.

I don't know if there is a way out or we can further improve query visitor logic itself.

Seems like I was doing

long queryRamBytesUsed = RamUsageEstimator.sizeOf(query, 0);

long sizeOf(Query q, long defSize) Here defSize represents the shallow size of the query, and if 0 is passed, it calculates the shallow size during runtime which was doing a lot of object allocations(adding to higher CPU%) and therefore adding to latency.

I changed to RamUsageEstimator.sizeOf(query, 32);, Here 32 is a rough shallow bytes for a term/boolean query which I calculated manually by calling that method

After running with above change, performance improved but still 2.5x slower than the baseline.

Benchmark Mode Cnt Score Error Units QueryCacheBenchmark.concurrentPutOnly thrpt 15 1608680.127 ± 229218.026 ops/s

I wonder if the query cache should "cache" the query sizes? It seems we do this more expensive calculation twice for no reason. Why not keep track of the query size in the cache as well, only calculating when adding it?

I wonder if the query cache should "cache" the query sizes?

Yeah good point. Let me try that approach as well and update here.
I was able to further improve the performance earlier by making changes in BooleanQuery.visit() so that we store clause level queries explicitly in a list beforehand, which made it 62% faster, overall 1.5x slower than baseline

1.5x is REALLY good!

Okay, I did the change to cache query size as well in the existing map

Baseline (100% write)

Benchmark Mode Cnt Score Error Units QueryCacheBenchmark.concurrentPutOnly thrpt 9 4057602.444 ± 380153.561 ops/s

My changes (100% write) - 25% slower than baseline

Benchmark Mode Cnt Score Error Units QueryCacheBenchmark.concurrentPutOnly thrpt 9 3028840.412 ± 483817.248 ops/s

Baseline - 60% read, 40% write (6 read threads, 4 write threads)

Benchmark Mode Cnt Score Error Units QueryCacheBenchmark.concurrentGetAndPuts thrpt 9 8074623.026 ± 1229792.902 ops/s

My changes - 60% read, 40% write

Benchmark Mode Cnt Score Error Units QueryCacheBenchmark.concurrentGetAndPuts thrpt 9 8262227.277 ± 1224929.417 ops/s

So with 100% writes, which is the worst case for any cache, my changes(to add RamUsageEstimator and BooleanQuery.visit() changes) performs 25% worse than the baseline. But this isn't a ideal workload for any cache.

So I did another benchmark to test with 60% reads, 40% writes which represents more ideal distribution for the cache.

Here both are almost equivalent, and no degradation is seen.

so great! I really like this.

@benwtrent I have pushed my final changes, let me know how it looks.
On a side note, I am also going to raise a PR soon(based on my idea here) to refactor query cache to improve performance by atleast 2-3x :)

github-actions · 2025-09-11T00:24:52Z

This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the [email protected] list. Thank you for your contribution!

…mpact of repeated visit() calls during RamUsageEstimator.sizeOf()

github-actions · 2025-09-18T21:25:27Z

This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog label to it and you will stop receiving this reminder on future updates to the PR.

Use RamUsageEstimator to calculate query size instead of the default …

2dd3b8b

…1024 bytes for non accountable queries

github-actions bot added the module:core/search label Aug 26, 2025

sgup432 changed the title ~~Use RamUsageEstimator to estimate query size instead of defaulting to 1024 bytes for non-accountable queries.~~ [QueryCache] Use RamUsageEstimator instead of defaulting to 1024 bytes for non-accountable query size Aug 26, 2025

cwperks reviewed Aug 26, 2025

View reviewed changes

lucene/core/src/test/org/apache/lucene/search/TestLRUQueryCache.java Outdated Show resolved Hide resolved

Use try-with-resources for directory and indexWriter

c60d477

benwtrent reviewed Aug 27, 2025

View reviewed changes

github-actions bot added Stale and removed Stale labels Sep 11, 2025

Adding changes to cache query size and queries per clause to reduce i…

dd86863

…mpact of repeated visit() calls during RamUsageEstimator.sizeOf()

Adding changelog entry

2fe6e55

github-actions bot added this to the 11.0.0 milestone Sep 18, 2025

sgup432 added 2 commits September 18, 2025 15:00

Making queries per clause list immutable

436b95e

Adding unit test to verify query size is cached correctly

9e60d4f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[QueryCache] Use RamUsageEstimator instead of defaulting to 1024 bytes for non-accountable query size #15124

[QueryCache] Use RamUsageEstimator instead of defaulting to 1024 bytes for non-accountable query size #15124

sgup432 commented Aug 26, 2025

Uh oh!

github-actions bot commented Aug 26, 2025

Uh oh!

Uh oh!

sgup432 commented Aug 26, 2025

Uh oh!

github-actions bot commented Aug 26, 2025

Uh oh!

benwtrent Aug 27, 2025

Uh oh!

sgup432 Aug 27, 2025 •

edited

Loading

Uh oh!

sgup432 Sep 18, 2025

Uh oh!

sgup432 Sep 18, 2025 •

edited

Loading

Uh oh!

benwtrent Sep 18, 2025

Uh oh!

sgup432 Sep 18, 2025 •

edited

Loading

Uh oh!

benwtrent Sep 18, 2025

Uh oh!

sgup432 Sep 18, 2025 •

edited

Loading

Uh oh!

benwtrent Sep 18, 2025

Uh oh!

sgup432 Sep 18, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 18, 2025

Uh oh!

Uh oh!

		long queryRamBytesUsed = RamUsageEstimator.sizeOf(query, 0);
		return LINKED_HASHTABLE_RAM_BYTES_PER_ENTRY + queryRamBytesUsed;

[QueryCache] Use RamUsageEstimator instead of defaulting to 1024 bytes for non-accountable query size #15124

Are you sure you want to change the base?

[QueryCache] Use RamUsageEstimator instead of defaulting to 1024 bytes for non-accountable query size #15124

Conversation

sgup432 commented Aug 26, 2025

Description

Uh oh!

github-actions bot commented Aug 26, 2025

Uh oh!

Uh oh!

sgup432 commented Aug 26, 2025

Uh oh!

github-actions bot commented Aug 26, 2025

Uh oh!

benwtrent Aug 27, 2025

Choose a reason for hiding this comment

Uh oh!

sgup432 Aug 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgup432 Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

sgup432 Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benwtrent Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

sgup432 Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benwtrent Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

sgup432 Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benwtrent Sep 18, 2025

Choose a reason for hiding this comment

Uh oh!

sgup432 Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Sep 11, 2025

Uh oh!

github-actions bot commented Sep 18, 2025

Uh oh!

Uh oh!

sgup432 Aug 27, 2025 •

edited

Loading

sgup432 Sep 18, 2025 •

edited

Loading

sgup432 Sep 18, 2025 •

edited

Loading

sgup432 Sep 18, 2025 •

edited

Loading

sgup432 Sep 18, 2025 •

edited

Loading