Skip to content

Conversation

timvermeulen
Copy link
Contributor

Summary

Closes #264

Turns COLUMN_NAME and COLUMN_COUNT into non-const functions, and adds support for the #[serde(flatten)] attribute. LMK if this approach is roughly what you had in mind for this feature 🙂

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided so that we can include it in CHANGELOG later
  • For significant changes, documentation in README and https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials

@slvrtrn
Copy link
Contributor

slvrtrn commented Aug 25, 2025

Thanks for your contribution.
Making these two fields non-const, we need to address a few things:

@timvermeulen
Copy link
Contributor Author

  • needs checking how does it affect performance with validation on/off

None of the benchmarks perform noticeably different for me locally between this branch and main 👍

  • we need to have more integration tests that verify the correct behavior with maps

Adding integration tests makes sense, though could you clarify which behavior with maps you'd like to see tested?

@slvrtrn
Copy link
Contributor

slvrtrn commented Aug 25, 2025

Adding integration tests makes sense, though could you clarify which behavior with maps you'd like to see tested?

Some basic scenarios like https://serde.rs/attr-flatten.html or #99 (comment). Makes sense to test it with 3rd party maps, too.

@timvermeulen timvermeulen marked this pull request as draft August 25, 2025 16:49
@timvermeulen timvermeulen force-pushed the serde-flatten branch 2 times, most recently from 0208651 to 3b55479 Compare August 27, 2025 09:36
@timvermeulen
Copy link
Contributor Author

This has proven to be a bit more complex than I thought because of my misunderstanding of how #[serde(flatten)] works exactly. Some things that make this difficult:

  • In the presence of the #[serde(flatten)] attribute, the derived Deserialize implementation of a struct calls deserialize_map and not deserialize_struct. This is obviously bad for performance and it also complicates RowBinaryDeserializer which already uses deserialize_map to deserialize ClickHouse Map values, not entire rows.
  • Furthermore, after calling deserialize_map it repeatedly calls deserialize_any to deserialize the column values, essentially ruling out non-self-describing formats. We can sort of work around this by having deserialize_any forward to the appropriate deserialize_* method based on the type of the next column according to the validator, but it's messy and probably not very robust.
  • #[serde(flatten)] also causes serde to use their internal FlatMapDeserializer type which for whatever reason doesn't preserve the is_human_readable value, defaulting to true. Some types rely on is_human_readable to be false for correct deserialization from a ClickHouse input such as Ipv6Addr and Uuid, and since we have no control over Ipv6Addr's deserialization logic, this seems like a non-starter.

I've updated this branch with (hacky) workaround for the first two points but the last point seems impossible to solve without ditching serde entirely, so I'm not sure this is still the right direction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support serde(flatten) attribute

2 participants