Skip to content

Scipy function not executed lazily #10636

@yum-yab

Description

@yum-yab

What is your issue?

I recently tried to resample a large geotif file using majority vote for each block, following this stackoverflow article: https://stackoverflow.com/questions/75041095/how-to-apply-a-custom-function-to-xarray-dataarray-coarsen-reduce

I wanted to use dask for larger-than-memory computation, but while some functions like np.sum execute lazily, using scipy.stats.mode seems to instantly call compute() somehow. Unfortunately, this exceeds my computers capacity and also renders the larger_than_memory capabilities of dask useless.

here is an example to easily reproduce the issue:

import xarray as xr
import dask.array as da
import numpy as np
from scipy import stats

# Define dimensions
nx, ny, nt = 3000, 300, 100  # size of each dimension
chunks = (300, 30, 10)     # chunk sizes for Dask

# Create Dask arrays
data1 = da.random.random((nx, ny, nt), chunks=chunks)
data2 = da.random.random((nx, ny, nt), chunks=chunks)

# Create coordinates
x = np.linspace(0, 10, nx)
y = np.linspace(0, 5, ny)
time = np.arange(nt)

# Build the xarray Dataset
ds = xr.Dataset(
    {
        "temperature": (("x", "y", "time"), data1),
        "precipitation": (("x", "y", "time"), data2),
    },
    coords={
        "x": x,
        "y": y,
        "time": time,
    }
)

# custom function for accessing the mode
def find_mode(arr, axis):
    m, _ = stats.mode(arr, axis=axis)
    return m

# this is lazy!
coearse_mean_ds = ds.coarsen(x=3, y=3, boundary='pad').reduce(np.sum)

# this computes  on spot!
maj_vote_coarse = ds.coarsen(x=3, y=3, boundary='pad').reduce(find_mode)

Can you please guide me how I would apply a function like that that computes lazily with dask?

PS: THis is the xarray version of my repo


INSTALLED VERSIONS
------------------
commit: None
python: 3.12.11 (main, Jul 23 2025, 00:34:44) [Clang 20.1.4 ]
python-bits: 64
OS: Linux
OS-release: 6.12.41-1-MANJARO
machine: x86_64
processor: 
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: 1.14.2
libnetcdf: 4.9.4-development

xarray: 2025.7.1
pandas: 2.3.1
numpy: 2.2.6
scipy: 1.16.0
netCDF4: 1.7.2
pydap: None
h5netcdf: None
h5py: None
zarr: 3.1.0
cftime: 1.6.4.post1
nc_time_axis: None
iris: 3.12.2
bottleneck: None
dask: 2025.7.0
distributed: 2025.7.0
matplotlib: 3.10.3
cartopy: 0.24.1
seaborn: 0.13.2
numbagg: None
fsspec: 2025.5.1
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: None
pip: None
conda: None
pytest: None
mypy: None
IPython: 9.4.0
sphinx: None

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions