Skip to content

Conversation

alamb
Copy link

@alamb alamb commented Jun 24, 2025

If you merge this PR it will update apache#7670

It:

  1. Merges up from main to get bring the PR up to date
  2. Fixes a logical conflict (see commit fe94db2)
  3. has a minor fix to the test logic (see 4b18f7f)

friendlymatthew and others added 7 commits June 24, 2025 05:51
….fields` to maintain invariants upon entry writes (apache#7720)

# Which issue does this PR close?

- It doesn't directly close the issue, but it's related to
apache#7698

# Rationale for this change

This commit changes the `dict` field in `VariantBuilder` + the `fields`
field in `ObjectBuilder` to be `BTreeMap`s, and checks for existing
field names in a object before appending a new field.

These collections are often used in places where having an already
sorted structure would be more performant. Inside of
`ObjectBuilder::finish()`, we sort the fields by `field_name` and we can
use the fact that `VariantBuilder`'s `dict` maintains a sorted mapping
to `field_id` by `field_name`.

To check whether an existing field name exists in a object, it is simply
two lookups: 1) to find the `field_name: &str`'s unique `field_name_id`,
and 2) check if the `ObjectBuilder` `fields` already has a key with that
`field_name_id`.

We make `ObjectBuilder` `fields` a `BTreeMap` sorted by `field_id`.
Since `field_id`s correlate to insertion order, we now have some notion
of which fields were inserted first. This also improves the time to look
up the max field id, as it changes the linear scan over the entire
`fields` collection to a logarithmic call using `fields.keys().last()`.
# Which issue does this PR close?

- part of apache#6736

# Rationale for this change

- While reviewing @friendlymatthew 's PR
apache#7740 I found that the code to
get the Variant object was awkward


I think that an accessor is similar to the existing `as_null`, `as_i32,`
etc APIs.

# What changes are included in this PR?

1. Add Variant::as_object and Variant::as_list

# Are there any user-facing changes?
New API (and docs with tests)
# Which issue does this PR close?

N/A

# Rationale for this change

It is critical and generally required to add tests for any changes to
arrow-rs and it something we look for in our PR reviews. It would be
nice to remind people of this explicitly in the PR

# What changes are included in this PR?

1. Update the PR template to include a section on testing
2. Add a list marker (`-`) to the closes section which causes github to
render the name of the PR in markdown not just the number (see rationale
on apache/datafusion#14507)

I copied the wording from DataFusion:
https://github.com/apache/datafusion/blob/b6c8cc57760686fffe4878e69c1be27e4d9f5e68/.github/pull_request_template.md?plain=1#L22


# Are there any user-facing changes?

A new section on PR descriptions
# Which issue does this PR close?

- Closes apache#7697

# Rationale for this change

# What changes are included in this PR?

- Introduced new types: VariantDecimal4, VariantDecimal8, and
VariantDecimal16
- These types encapsulate decimal values and ensure proper validation
and wrapping

# Are there any user-facing changes?
@mprammer mprammer merged commit 6e65287 into cmu-db:variant-to-json Jun 24, 2025
30 of 31 checks passed
@mprammer
Copy link
Collaborator

Merged. Thanks @alamb and @carpecodeum !

@alamb alamb deleted the variant-to-json branch June 24, 2025 22:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants