-
Notifications
You must be signed in to change notification settings - Fork 1k
Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
- part of [EPIC] [Parquet] Implement Variant type support in Parquet #6736
- Requires [Variant] Support creating Variants with pre-existing Metadata #8152
- Requires [Variant] Support Shredded Objects in
variant_get: typed path access (STEP 1) #8150
Note this is likely one of the most complex parts of implementing Shredded Variants , so it is not a good first task
Please see more commentary on
We are trying to support the general case of the variant_get function, which allows runtime dynamic access to Variants (either shredded or unshredded).
- We found in [Variant] Support Shredded Objects in
variant_get#8083 that supporting variant_get is quite complicated (see here), so we are proposing to brake it down into multiple piece.
This ticket tracks
Support variant_get for Some(DataType::Struct) (nested shredding)
The idea here is that the user would specify a "shredding schema" (similar to what @friendlymatthew is sketching out in #7921) and the variant_get kernel would produce a VariantArray with the defined schema, extracting fields as necessary
Implementing this functionality will likely require the basic representation for shredded Variant arrays along with path traversal in variant_get. However, it does NOT cover the following (which are / will be broken into separate tickets)
- Support for retrieving as a specific non Struct data type (e.g.
Some(DataType::Utf8)) - Retrieving any arbitrary path and returning what is there (no type specified)
- Retrieving any arbitrary path as a Variant (aka "unshredding")
Describe the solution you'd like
@scovich sketched out a high level design for Shredded Objects (see Representing Variant In Arrow Proposal: "Shredding an Object" and Variant Shredding::Objects) in this PR
So roughly that means supporting
// get the named field of variant object as a typed field
variant_get(array, "$.field_name", DataType::Struct <....>)Where $.field_name represents some arbitrary VariantPath such as a for field "a", or a.b for field "b" of field "a"
And DataType::Struct is a "shredding schema" that reflects both value and typed_value
This should work for:
- Variants where the field_name is in a typed_value
- Variants where the field_name is not in the typed value
Describe alternatives you've considered
- Add a test that manually constructs a shredded variant array (follow the example in the arrow proposal)
- Add a test that calls variant_get appropriately
- Implement the code
I suggest getting this working for non-nested obejcts first, and then working on nesting / pathing as a second pR
Additional context
Reference