Skip to content

Conversation

benrobby
Copy link

@benrobby benrobby commented Aug 22, 2025

What changes were proposed in this pull request?

Screenshot 2025-08-23 at 02 33 43

Why are the changes needed?

  • it's too easy to miss inadvertent type coercion changes, and some of the existing tables in the codebase have become inconsistent.

Does this PR introduce any user-facing change?

No

How was this patch tested?

  • this is a unit test

Was this patch authored or co-authored using generative AI tooling?

Yes, used claude to generate the table printing utils. Generated-by: claude-sonnet-4

@benrobby benrobby changed the title [SPARK-53355][PYTHON] document python udf type behavior [SPARK-53355][PYTHON][SQL] document python udf type behavior Aug 23, 2025
@benrobby
Copy link
Author

@HyukjinKwon and @asl could you take a look?

Copy link
Contributor

@asl3 asl3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


return [
("byte_values", ByteType(), df([(-128,), (127,), (0,)])),
("byte_null", ByteType(), df([(None,), (42,)])),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add a comment with context about the null handling importance for input type vs. return type?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would you have an example, what comment are you looking for exactly?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

optional nit, but i meant we could clarify how the test cases for the return_types and input_types were decided

@HyukjinKwon HyukjinKwon changed the title [SPARK-53355][PYTHON][SQL] document python udf type behavior [SPARK-53355][PYTHON][SQL] Document python udf type behavior Aug 24, 2025
@@ -0,0 +1,5 @@
These tests capture input/output type interfaces between python udfs and the engine.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to run this in CI? If so, we should add __init__.py ,and add this modul into dev/sparktestsupport/modules.py

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, let me try to add it. Do we need any other reviewer for the infra changes?

@HyukjinKwon
Copy link
Member

My only concern is that I don't think we should say this is the standard behaviours ... as some of behaviours are weird. I am fine with this change as long as we're all on the same page that some behaviours here might change in the future.

@benrobby
Copy link
Author

My only concern is that I don't think we should say this is the standard behaviours ... as some of behaviours are weird. I am fine with this change as long as we're all on the same page that some behaviours here might change in the future.

Yes, agreed. I've also updated the readme to reflect that these tests are purely internal and not an API documentation

@benrobby benrobby changed the title [SPARK-53355][PYTHON][SQL] Document python udf type behavior [SPARK-53355][PYTHON][SQL][INFRA] Document python udf type behavior Aug 25, 2025
@benrobby benrobby changed the title [SPARK-53355][PYTHON][SQL][INFRA] Document python udf type behavior [SPARK-53355][PYTHON][SQL][INFRA] test python udf type behavior Aug 25, 2025
@benrobby benrobby requested review from asl3 and HyukjinKwon August 25, 2025 09:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants