-
Notifications
You must be signed in to change notification settings - Fork 47
Open
Labels
non-trivialLikely not a quick addition and may require design discussionsLikely not a quick addition and may require design discussions
Description
Currently, the plan resolver lists all files and fetches metadata for each individual Parquet file when planning each query, even if the dataset has already been registered as a temporary view. This adds overhead especially when the data is remote (e.g. in an object store such as AWS S3) and when the query involves multiple datasets with large number of partitioned files.
We may want to cache file listing results and Parquet metadata in the plan resolver. The downside is that there is no way to detect staleness of the cache. This is acceptable though, since we usually assume the files would not change. If the dataset does change after being overwritten, as a workaround, the user can restart the session.
Metadata
Metadata
Assignees
Labels
non-trivialLikely not a quick addition and may require design discussionsLikely not a quick addition and may require design discussions