[KYUUBI #7192] Fix filestatus not cached #7191

flaming-archer · 2025-09-04T09:13:57Z

The previous filestatus will not be cached, as its source hivetable will be created every time, and the filestatus will also be created, resulting in different client IDs for the cached object's key, leading to cache invalidation.

It can cause two problems:

Cache failure, which greatly affects query performance because obtaining filestatus from HDFS is a slow process.
Memory leakage occurs because objects are constantly added to the cache, and since the default cache time is -1, the memory will become larger and larger.

Why are the changes needed?

Improve perfomance.

How was this patch tested?

UT and spark sql query.

Was this patch authored or co-authored using generative AI tooling?

No.

cache version 2 cache ut change ut

codecov-commenter · 2025-09-09T10:59:34Z

Codecov Report

❌ Patch coverage is 0% with 48 lines in your changes missing coverage. Please review.
✅ Project coverage is 0.00%. Comparing base (ea75fa8) to head (9353f0c).
⚠️ Report is 3 commits behind head on master.

Files with missing lines	Patch %	Lines
...park/connector/hive/read/HiveFileStatusCache.scala	0.00%	42 Missing ⚠️
...kyuubi/spark/connector/hive/HiveTableCatalog.scala	0.00%	4 Missing ⚠️
...uubi/spark/connector/hive/read/HiveFileIndex.scala	0.00%	2 Missing ⚠️

Additional details and impacted files

@@          Coverage Diff           @@
##           master   #7191   +/-   ##
======================================
  Coverage    0.00%   0.00%           
======================================
  Files         695     696    +1     
  Lines       43433   43479   +46     
  Branches     5887    5902   +15     
======================================
- Misses      43433   43479   +46

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

flaming-archer · 2025-09-10T01:37:48Z

@pan3793 could u pls take a look at it

pan3793 · 2025-09-11T07:34:05Z

...or-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HiveFileStatusCache.scala

+}
+
+/**
+ * An implementation that caches partition file statuses in memory.


if the code is forked from spark, clarify where and which version it comes from, and briefly explain your modification and expectation

pan3793 · 2025-09-11T07:41:59Z

...or-hive/src/main/scala/org/apache/kyuubi/spark/connector/hive/read/HiveFileStatusCache.scala

+import org.apache.spark.util.SizeEstimator
+
+/**
+ * Use [[HiveFileStatusCache.getOrCreate()]] to construct a globally shared file status cache.


TBH, the "globally shared" concept does not match the Spark's multi-session architecture, especially for Kyuubi use cases, it's possible that multi users share one Spark application.

I know that there are many hive-related instances are globally shared in Spark, as we are improving this part, let's make it possible to be session shared, and have a config to allow it to be global shared.

cached fileindex

1da81f2

cache version 2 cache ut change ut

github-actions bot added module:spark module:extensions labels Sep 4, 2025

flaming-archer changed the title ~~fix filestatus not cached~~ [KYUUBI #7192]fix filestatus not cached Sep 4, 2025

flaming-archer changed the title ~~[KYUUBI #7192]fix filestatus not cached~~ [KYUUBI #7192] Fix filestatus not cached Sep 4, 2025

fix failed tests

9353f0c

pan3793 reviewed Sep 11, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[KYUUBI #7192] Fix filestatus not cached #7191

[KYUUBI #7192] Fix filestatus not cached #7191

flaming-archer commented Sep 4, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Sep 9, 2025

Uh oh!

flaming-archer commented Sep 10, 2025

Uh oh!

pan3793 Sep 11, 2025

Uh oh!

pan3793 Sep 11, 2025

Uh oh!

Uh oh!

[KYUUBI #7192] Fix filestatus not cached #7191

Are you sure you want to change the base?

[KYUUBI #7192] Fix filestatus not cached #7191

Conversation

flaming-archer commented Sep 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are the changes needed?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

codecov-commenter commented Sep 9, 2025

Codecov Report

Uh oh!

flaming-archer commented Sep 10, 2025

Uh oh!

pan3793 Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

pan3793 Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

flaming-archer commented Sep 4, 2025 •

edited

Loading