-
Notifications
You must be signed in to change notification settings - Fork 2.5k
docs: Explain aggregation & sorting of lists #25260
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #25260 +/- ##
==========================================
+ Coverage 81.92% 82.08% +0.15%
==========================================
Files 1712 1711 -1
Lines 237225 237160 -65
Branches 3011 3013 +2
==========================================
+ Hits 194358 194683 +325
+ Misses 42094 41704 -390
Partials 773 773 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
03f79d1 to
7626982
Compare
|
Update: Link errors should be fixed now that PR #25314 has been merged. |
Correct formatting and line breaks in the documentation.
7626982 to
d739879
Compare
coastalwhite
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution. Small nit.
| --8<-- "python/user-guide/expressions/lists.py:children" | ||
| ``` | ||
|
|
||
| Using `eval`, we can sort the list elements or compute some aggregations: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, this should reference list.agg as that makes more sense for aggregations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I didn't even know about list.agg. OTOH, what's even the point of having both list.agg and list.eval if the latter can do everything the former can? Dataframes also only have DataFrame.select, which is used for both use cases.
I'm mainly asking so I can explain the differences (if any) here.
Update: I've created a separate issue for this discussion:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coastalwhite Thanks for the suggestion (and the clarification in #25336). I've now modified the explanation by first showing the result of using eval for aggregations, and then showing why it makes sense to use agg instead in those cases.
I've incorporated a slightly modified version of your statement from #25336, and hope that's okay for you:
The
list.aggandlist.evalexpressions are exactly the same, except one difference. If the evaluation expression is statically determined to return only one value, it will automatically explode thelistinto the inner values. This matches what.group_by(...).agg(...)does, hence the name.
Extend the documentation about lists to show how to use
.list.evalin combination with aggregation functions andsort_byto process the list elements.Also fixes the objectively false statement
[...] we can also use pl.all() to refer to all of the elements of the list..