Skip to content

v0.11.0

Choose a tag to compare

@github-actions github-actions released this 10 Oct 17:04
Add local staging to Zarr setup in xarray_beam.

Fixes https://github.com/google/xarray-beam/issues/122

This change introduces a `stage_locally` parameter to `setup_zarr`, `ChunksToZarr` and `Dataset.to_zarr`. When enabled, Zarr metadata is first written to a local temporary directory and then copied to the final destination in parallel using `fsspec`. This can significantly speed up the setup process on high-latency filesystems, e.g., in one example, I found it sped up Zarr setup by a factor of 25x, from 100 seconds to 4 seconds.

This adds a hard dependency on fsspec in Xarray-Beam.

Hopefully in the future Xarray will have concurrent writing to stores built in (see https://github.com/pydata/xarray/issues/10622), which will eliminate the primary need for this.

Alternatively, we might be able to eventually leverage Zarr's built-in stores to do this copying rather than fsspec. Zarr has all the necessary functionality (including atomic writes, which would be nice) but does not expose the required public APIs for copying store objects from a synchronous function.

PiperOrigin-RevId: 817684876