Skip to content

Conversation

srh
Copy link
Contributor

@srh srh commented Aug 13, 2025

This makes pushing projections down into MemorySourceConfig tables not discard their sort ordering.

This lacks tests. Please tell me where I should put tests. I will add them. (Should they be in memory.rs? In physical planning code?) (As far as I can tell, we can't create a sort order on in-memory tables via SQL, thus SQL test case files wouldn't be the place.) I think a test will want to specifically check that optimizing a plan requiring a sort order consistent with a sorted MemorySourceConfig (and a simple projection on top) produces the correct plan without unnecessary sort nodes, no sanity check failures, or whatnot.

Which issue does this PR close?

Rationale for this change

Without this change, it is impossible in some circumstances for query plans to take advantage of known sort order for in-memory tables. I forget how this bug manifests, but it might be that discarding sort order breaks optimization sanity checks in some way too, or produces an incorrect plan.

Are these changes tested?

Not in the DataFusion repository. (See comment above asking for advice on where to add tests.)

Are there any user-facing changes?

No? (Or yes? I mean, this fixes planning behavior for some users.)

@github-actions github-actions bot added the datasource Changes to the datasource crate label Aug 13, 2025
@srh
Copy link
Contributor Author

srh commented Aug 14, 2025

I guess this got fixed differently, a couple days ago, by #17129. That change does leave MemorySourceConfig stripped of its sort information, while the DataSourceExec above it remembers it from the cache.

I am not familiar with the whole story of equivalence properties maintenance there (and I guess the DataSourceExec change works with every input DataSource). So, I will leave this PR open so that you see it, and you can decide what you want to do (regarding the question of whether you are okay with underlying DataSources losing the sort information that gets updated in the DataSourceExec's "cache"), but as far as I know, you could just close this PR.

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR @srh

I think it needs some tests to make sure we don't break the functionality accidentally in the future

Can you please add some tests, ideally in sqllogicteset format?

Here are the instructions: https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest

Ideally you should be able to extend one of the existing test files in https://github.com/apache/datafusion/tree/main/datafusion/sqllogictest/test_files

@srh
Copy link
Contributor Author

srh commented Aug 24, 2025

@alamb Well sqllogictests wouldn't work for reasons I have stated. I am going to close this PR instead. Things are architecturally icky at this point, but I think if somebody encounters a problem later on, they'll be more equipped to decide how eq_properties information should be maintained between DataSourceExec and DataSource.

@srh srh closed this Aug 24, 2025
@srh srh deleted the memory-source-maintain-ordering branch August 24, 2025 19:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
datasource Changes to the datasource crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MemorySourceConfig::try_swapping_with_projection discards sort information
2 participants