Cache file listing results and Parquet metadata

Currently, the plan resolver lists all files and fetches metadata for each individual Parquet file when planning each query, even if the dataset has already been registered as a temporary view. This adds overhead especially when the data is remote (e.g. in an object store such as AWS S3) and when the query involves multiple datasets with large number of partitioned files.

We may want to cache file listing results and Parquet metadata in the plan resolver. The downside is that there is no way to detect staleness of the cache. This is acceptable though, since we usually assume the files would not change. If the dataset does change after being overwritten, as a workaround, the user can restart the session.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache file listing results and Parquet metadata #516

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Cache file listing results and Parquet metadata #516

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions