Skip to content

Conversation

samwillis
Copy link
Contributor

Also converts almost all operators to use them wherever possible.

We should benchmark this against main to see how what it does with the memory usage and allocations.

cursoragent and others added 19 commits July 10, 2025 15:43
@samwillis samwillis changed the title Anable the use of generators to reduce memory usage via a LazyMultiSet Use of generators to reduce memory usage via a LazyMultiSet Jul 13, 2025
@samwillis samwillis requested a review from kevin-dp July 13, 2025 14:33
Copy link
Contributor

@kevin-dp kevin-dp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM implementation-wise. However, i do think we should carefully benchmark performance and memory usage before merging as it does complicate the code quite a bit so i want to be 100% sure this is the way to go.

return new MultiSet(result)
}

*lazyJoin<V2>(other: Index<K, V2>): Generator<[[K, [V, V2]], number], void, unknown> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the duplication between both branches like i did here: ac24ff0

const consolidated = new MultiSet(values).consolidate()
const sortedValues = consolidated
.getInner()
const consolidated = LazyMultiSet.fromArray(values).consolidate()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why creating a lazy multiset to then consolidate it (which materializes it) to then immediately copy it into an array. Seems like a lot of overhead for something we could do by reducing once over the original array (i.e. loop over values and consolidate it into an array of consolidated values).

})
})

it('should consolidate correctly', () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also check the corner case where the multiplicities for a certain value cancel out and that the result does not contain the value.

@samwillis
Copy link
Contributor Author

@kevin-dp

However, i do think we should carefully benchmark performance and memory usage before merging as it does complicate the code quite a bit so i want to be 100% sure this is the way to go.

Absolutely agree, stack traces and and performance traces of individual operators will likely get more complex.

the key things I want is to check:

  • Peak mem usage during a run (likely initial run)
  • Execution time
  • Any GC time after / at the end of processing

The latter will be hard to measure as it not guarantied to happen during execution. We need to do manual testing with performance traces and inspect what happens. Likely need to do this in the browser. Peak mem is a good indication of what GC needs to be done relative to the two versions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants