Skip to content

Add row index support to the default parquet reader #919

@scovich

Description

@scovich

Please describe why this is necessary.

Delta features such as deletion vectors and row tracking rely on file-local row index information. Today, we implicitly rely on the row positions of returned data to match those rows' positions in the parquet file they were read from. That assumption precludes optimizations such as row group skipping that would break the relationship between row ordinals in the returned data and row indexes in the underlying file.

Describe the functionality you are proposing.

Add row index support to the default parquet reader as a first step to propagating row indexes through the kernel (deletion vectors, in particular, since they're already implemented).

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions