Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Rationale for this change
There are many use cases where you have a column of data that contains an array and you want to transform every element in that array. The current work around is to do something like unnest and then aggregate. This is bad from both ergonomics and performance. With this work we add a function
array_transform
that will take a scalar function and apply it to every element in an array.This PR is narrowly scoped as a first proof of concept. It does not address aggregation as #15882 requests and it is limited in scope to cases where all other variables passed to the inner function must be scalar values.
What changes are included in this PR?
Adds
array_transform
and unit tests.Are these changes tested?
Unit test provided that demonstrates both low level testing of the invocation and also a full test demonstrating it in operation with a dataframe.
Here is an example taken from the test that is included in the PR:
Will produce this dataframe, which shows the original data and transformed:
Are there any user-facing changes?
No
Still to do before ready to merge
array_aggregate