Skip to content

[Variant] Support Shredded Objects in variant_get: access as Some(DataType::Struct) (nested shredding) #8153

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

Note this is likely one of the most complex parts of implementing Shredded Variants , so it is not a good first task

Please see more commentary on

We are trying to support the general case of the variant_get function, which allows runtime dynamic access to Variants (either shredded or unshredded).

This ticket tracks
Support variant_get for Some(DataType::Struct) (nested shredding)

The idea here is that the user would specify a "shredding schema" (similar to what @friendlymatthew is sketching out in #7921) and the variant_get kernel would produce a VariantArray with the defined schema, extracting fields as necessary

Implementing this functionality will likely require the basic representation for shredded Variant arrays along with path traversal in variant_get. However, it does NOT cover the following (which are / will be broken into separate tickets)

  • Support for retrieving as a specific non Struct data type (e.g. Some(DataType::Utf8))
  • Retrieving any arbitrary path and returning what is there (no type specified)
  • Retrieving any arbitrary path as a Variant (aka "unshredding")

Describe the solution you'd like
@scovich sketched out a high level design for Shredded Objects (see Representing Variant In Arrow Proposal: "Shredding an Object" and Variant Shredding::Objects) in this PR

So roughly that means supporting

// get the named field of variant object as a typed field 
variant_get(array, "$.field_name", DataType::Struct <....>)

Where $.field_name represents some arbitrary VariantPath such as a for field "a", or a.b for field "b" of field "a"

And DataType::Struct is a "shredding schema" that reflects both value and typed_value

This should work for:

  1. Variants where the field_name is in a typed_value
  2. Variants where the field_name is not in the typed value

Describe alternatives you've considered

  1. Add a test that manually constructs a shredded variant array (follow the example in the arrow proposal)
  2. Add a test that calls variant_get appropriately
  3. Implement the code

I suggest getting this working for non-nested obejcts first, and then working on nesting / pathing as a second pR

Additional context

Reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions