Skip to content

without_files=True doesn't apply as soon as version conflicts happen, leading to performance hit #3528

@itamarst

Description

@itamarst

Environment

Delta-rs version: Git, June 1, 2025


Bug

Let's say you're using without_files=True / require_files=false with the goal of not loading files into memory.

In particular, you have two processes appending at same time. If they don't stomp on each other, all is well, the files don't get loaded into memory, as EagerSnapshot::try_new_with_visitor() will see that require_files is false, and not load them.

However, if they conflict in version numbers, EagerSnapshot.update() is called... and this loads files unconditionally, ignoring the require_files=false the user originally passed in which is respected by EagerSnapshot::try_new_with_visitor():

Compare:

let files = match config.require_files {

to

So now you have much larger memory usage, and also more CPU impact.

At scale, if you have many processes appending, this can have a significant impact on performance. With make develop here's the performance I got with git, compared to a prototype of options 2/3 mentioned below:

Version of code # of processes Ops / sec
Git 8 10
Git 16 7
Git w/prototype fix 8 14
Git w/prototype fix 16 10

(My computer has 12 CPU cores).

How to fix?

Options:

  1. It seems a little weird to have EagerSnapshot that isn't eager. And a bunch of its methods just lie if require_files is false, essentially claiming no files. So one option is to ... have a snapshot trait, and use the appropriate snapshot type depending on whether it's append operation not? This may be major refactoring though.
  2. Change EagerSnapshot.update() to respect require_files: false. I am not sure if this is semantically correct, though.
  3. Change EagerSnapshot.update() to respect require_files: false, but only when appending.

As you can see in table above, options 2/3 definitely speed things up.

Metadata

Metadata

Assignees

Labels

binding/pythonIssues for the Python packagebinding/rustIssues for the Rust cratebugSomething isn't working

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions