Decouple `rand` and `eltype` #1905

devmotion · 2024-09-25T23:40:33Z

One of my take-aways from issues such as #1252, #1902, #894, #1783, #1041, and #1071 is that eltype is not only quite inconsistently implemented and unclearly documented currently but more generally it might be a bad design decision to use it for pre-allocating containers of samples: Setting it to a fixed type (historically Float64 for continuous and Int for discrete distributions) is too limiting, but trying to infer it from parameters is challenging and doomed to fail for more challenging scenarios that are not as clear as e.g. Normal.

This PR tries to decouple rand and eltype, to make it easier to possibly eventually deprecate and remove eltype.

Fixes #1041.
Fixes #1071.
Fixes #1082.
Fixes #1252.
Fixes #1783.
Fixes #1884.
Fixes #1902.
Fixes #1907.

codecov-commenter · 2024-09-26T00:02:24Z

Codecov Report

❌ Patch coverage is 83.25243% with 69 lines in your changes missing coverage. Please review.
✅ Project coverage is 85.56%. Comparing base (0421b18) to head (b813560).

Files with missing lines	Patch %	Lines
src/mixtures/unigmm.jl	27.77%	13 Missing ⚠️
src/product.jl	69.44%	11 Missing ⚠️
src/multivariate/product.jl	30.00%	7 Missing ⚠️
src/multivariate/mvlogitnormal.jl	68.75%	5 Missing ⚠️
src/multivariate/mvtdist.jl	79.16%	5 Missing ⚠️
src/common.jl	55.55%	4 Missing ⚠️
src/genericrand.jl	71.42%	4 Missing ⚠️
src/multivariate/mvlognormal.jl	60.00%	4 Missing ⚠️
src/univariate/continuous/chisq.jl	40.00%	3 Missing ⚠️
src/matrix/lkj.jl	95.12%	2 Missing ⚠️
... and 7 more

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1905      +/-   ##
==========================================
- Coverage   86.36%   85.56%   -0.81%     
==========================================
  Files         146      146              
  Lines        8786     8851      +65     
==========================================
- Hits         7588     7573      -15     
- Misses       1198     1278      +80

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

andreasnoack · 2024-09-26T06:48:39Z

Given that, according to the docstring

Distributions.jl/src/common.jl

Lines 96 to 102 in a1010e4

    
           """ 
        
               eltype(::Type{Sampleable}) 
        
           The default element type of a sample. This is the type of elements of the samples generated 
        
           by the `rand` method. However, one can provide an array of different element types to 
        
           store the samples using `rand!`. 
        
           """

the sole purpose of eltype is to return the type of rand, which I agree isn't really feasible, should we also deprecate the methods here as part of this PR?

quildtide · 2024-10-02T07:44:18Z

#1907 is related.

devmotion · 2024-10-02T07:59:48Z

#1907 is related.

17154a2 fixes it.

…mentation

quildtide · 2024-10-02T17:49:09Z

My personal opinion:
rand(d::Distribution) should remain a valid method (I don't think anyone is trying to change this), and its return type should be predictable by knowing what d is. It would also be useful to have a method that tells you what type rand(d::Distribution) returns without running rand(d::Distribution). eltype(d::Distribution) is logical for this.

This is orthogonal to the fact that Distributions should probably support rand(d::Distribution, T::Type).

In an ideal situation, I think we would define rand(d::Distribution, T::Type) for distributions (to allow for potential type-specific optimizations and evade unnecessary typecasts) and then dispatch rand(d::Distribution) = rand(d, eltype(d)).

devmotion · 2024-10-02T18:31:44Z

its return type should be predictable by knowing what d is

I became convinced that in general this is not possible. It basically means trying to manually figure out type inference, a task that even the Julia compiler can fail at. Of course, for some number types and distributions it can be done (as also the compiler will succeed in many instances). But in general it's far from trivial. As in the initial stages of Distributions, one could be restrictive and limit samples to Int and Float but that's IMO not desirable either. Therefore I think it's best to remove eltype (which IMO has also been a weird name since I don't view distributions as containers or collections) from the API. IMO it's just too brittle and currently to inconsistent. Of course, that doesn't prevent users from trying to figure out the (element) type of samples with eg the help of the compiler, it's just not an official feature anymore.

See JuliaStats/Distributions.jl#1905

singularitti · 2024-11-03T01:52:22Z

A few distributions still give me Float64 after this PR, while others work fine:

julia> dist = Semicircle(50.0f0)
Semicircle{Float32}(r=50.0f0)

julia> rand(dist, 5)
5-element Array{Float64, 1}:
  36.80671953487414
 -18.355635129900335
 -11.701855436648922
 -21.444118928985656
  -5.80120463505302

julia> dist = JohnsonSU(0.0f0, 1.0f0, 0.0f0, 1.0f0)
JohnsonSU{Float32}(ξ=0.0f0, λ=1.0f0, γ=0.0f0, δ=1.0f0)

julia> rand(dist, 5)
5-element Array{Float64, 1}:
  0.5311696707298299
  1.632313034117999
  0.04951771555318912
  0.4721610259428258
 -3.052321854866766

julia> dist = Chisq(5.0f0)
Chisq{Float32}(ν=5.0f0)

julia> rand(dist, 5)
5-element Array{Float32, 1}:
 15.465032
 1.888659
 7.013455
 4.258529
 3.9611576

Can this be fixed?

devmotion · 2024-11-03T22:47:15Z

The reason is that for quite a few of the older, probably less used and definitely less frequently updated distributions the rand implementation contains hardcoded Float64, either as literals or as calls of rand(rng) or randn(rng). I fixed the SemicircleandJohnsonSU`, but I'm not sure if it's possible (or even desirable) to address all of these in a single PR (it's already a quite massive undertaking). Maybe it would be best to open issues for others and address them in follow-up PRs (most (all?) other such changes in this PR are included to fix open issues).

dom-linkevicius · 2025-06-15T17:30:32Z

eltype issues described here are also quite painful when using parameters of Dual type in a distribution d. This makes rand(d, 10) throw due to pre-allocation based on eltype (which is often Float64), as well as eltype just being plain incorrect (doing rand(d)) since Duals actually get returned.

mschauer · 2025-06-16T09:25:03Z

Sorry, late to the party, but with iterators we have a "HasEltype" attribute to address exactly this issue, how is this design related?

sethaxen

Here's a review of censored, Cholesky-, and matrix-variate distributions.

src/cholesky/lkjcholesky.jl

src/matrix/lkj.jl

Co-authored-by: Seth Axen <[email protected]>

nilsbecker · 2025-10-23T14:48:57Z

I also believe that it is reasonable that quantile(d, x) should return eltype(d); it is analogous to getindex to some extent. When would it make sense for rand and quantile to return different types?

Sometimes quantiles of integer-valued distributions are taken to interpolate between the integers flanking the 50% value of the cumulative. E.g. the median of an unbiased Bernoulli distribution would be 0.5 in this convention. Just saying.

sethaxen

Here's a review of the mixture models and multivariate distributions.

sethaxen · 2025-10-23T19:12:56Z

src/mixtures/unigmm.jl

+    randn!(rng, x)
+    for i in eachindex(x)
+        k = rand(rng, psampler)
+        x[i] = muladd(x[i], stds[k], means[k])


This code seems to assume that stds and means have 1-based indexing. Is this enforced anywhere?

No, good point. Basically most of the mixture model code suffers from this issue (it's not as bad anymore now that we removed all @inbounds, but still bad). Let's leave other functions for a separate PR but I'll at least fix the rand/rand! calls.

src/multivariate/dirichlet.jl

src/multivariate/jointorderstatistics.jl

sethaxen · 2025-10-23T19:26:47Z

src/multivariate/mvlognormal.jl

 end
+function rand(rng::AbstractRNG, d::MvLogNormal, n::Int)
+    xs = rand(rng, d.normal, n)
+    map!(exp, xs, xs)


According to the docs for map!:

Behavior can be unexpected when any mutated argument shares memory with any other argument.

Maybe we could broadcast here instead?

Suggested change

map!(exp, xs, xs)

xs .= exp.(xs)

I'm not sure. Generally IMO it's better to avoid broadcasting unless it's necessary (ie unless you're actually broadcasting objects with different dimensions). Broadcasting is slow, both at compile and runtime.

sethaxen · 2025-10-23T19:27:04Z

src/multivariate/mvlognormal.jl

-    _rand!(rng, d.normal, x)
+function rand(rng::AbstractRNG, d::MvLogNormal)
+    x = rand(rng, d.normal)
    map!(exp, x, x)


Suggested change

map!(exp, x, x)

x .= exp.(x)

sethaxen · 2025-10-23T19:27:29Z

src/multivariate/mvlognormal.jl

+    @eval begin
+        Base.@propagate_inbounds function rand!(rng::AbstractRNG, d::MvLogNormal, x::AbstractArray{<:Real,$N})
+            rand!(rng, d.normal, x)
+            map!(exp, x, x)


Suggested change

map!(exp, x, x)

x .= exp.(x)

sethaxen · 2025-10-23T19:34:16Z

src/multivariate/mvtdist.jl

+    y = similar(x, (1, cols))
    unwhiten!(d.Σ, randn!(rng, x))
    rand!(rng, chisqd, y)
    x .= x ./ sqrt.(y ./ d.df) .+ d.μ


Why not broadcast the rand call to avoid allocating an intermediate container?

Suggested change

y = similar(x, (1, cols))

unwhiten!(d.Σ, randn!(rng, x))

rand!(rng, chisqd, y)

x .= x ./ sqrt.(y ./ d.df) .+ d.μ

unwhiten!(d.Σ, randn!(rng, x))

rand!(rng, chisqd, y)

x .= x ./ sqrt.(rand.(rng, chisqd) ./ d.df) .+ d.μ

We could, but I chose this version on purpose: Generally, vectorized sampling tends to be much faster.

src/mixtures/unigmm.jl

src/multivariate/dirichlet.jl

Co-authored-by: Seth Axen <[email protected]>

Decouple rand and eltype

0ea5502

devmotion force-pushed the dw/rand_multiple_consistent branch from 747a191 to 0ea5502 Compare September 25, 2024 23:49

quildtide mentioned this pull request Oct 2, 2024

rand(Beta(1.0f0, 1.0f0)) returns Float64 #1907

Open

Fix rand(::Beta) inconsistencies

17154a2

Remove eltype definitions, add deprecation warning, and update docu…

3e66d3e

…mentation

devmotion force-pushed the dw/rand_multiple_consistent branch from 5644af6 to 3e66d3e Compare October 2, 2024 09:09

devmotion added 2 commits October 2, 2024 11:25

Fix Gumbel and Dirichlet tests

df3004a

Improve eltype fallback for rand methods that depend on eltype

67740a9

devmotion mentioned this pull request Oct 10, 2024

Poisson rand InexactError for extremely large parameter values #821

Open

SamuelBrand1 mentioned this pull request Oct 10, 2024

Change SafePoisson and SafeNegativeBinomial to return value from rand calls CDCgov/Rt-without-renewal#492

Open

devmotion force-pushed the dw/rand_multiple_consistent branch from 4db3858 to d824cf8 Compare October 10, 2024 22:28

Fix deprecations

2e7752e

devmotion force-pushed the dw/rand_multiple_consistent branch from d824cf8 to 2e7752e Compare October 10, 2024 22:41

singularitti added a commit to singularitti/ToyHamiltonians.jl that referenced this pull request Oct 30, 2024

Update deps of Distributions.jl to 0.25.112+

c80b56b

See JuliaStats/Distributions.jl#1905

devmotion added 3 commits November 3, 2024 23:55

Fix rand inconsistencies of Semicircle and JohnsonSU

48c437c

Use typeof

f74b7c7

Error on deprecation warnings

bc5ddd5

devmotion mentioned this pull request Nov 3, 2024

Don't check result of rand(d) in doc tests TuringLang/DistributionsAD.jl#272

Merged

devmotion added 4 commits November 17, 2024 02:05

Fix matrixvariate distributions

d72f535

Fixes

972844b

Fix tests

e8e8284

Relax test

a04559d

Merge branch 'master' into dw/rand_multiple_consistent

33b9da2

devmotion mentioned this pull request Mar 11, 2025

Sampling from Distributions.product_distribution allocates (a lot) #1954

Open

devmotion added 2 commits March 11, 2025 20:35

Update named tuple distribution

202c928

Fix qualification

df29269

This was referenced Apr 11, 2025

Uniform overhaul #1045

Closed

Random sample from LogNormal does not return values of the same type as its parameters #1970

Closed

Merge branch 'master' into dw/rand_multiple_consistent

bef4932

Merge branch 'master' into dw/rand_multiple_consistent

a78417b

devmotion mentioned this pull request Jun 18, 2025

Dispatch for drawing multiples #1985

Open

4 tasks

devmotion mentioned this pull request Jun 25, 2025

formatter with Runic #1987

Open

5 tasks

penelopeysm mentioned this pull request Jul 6, 2025

Cannot sample from bounded distribution when using Dual types TuringLang/Turing.jl#2088

Closed

devmotion mentioned this pull request Aug 18, 2025

rand(Uniform(1f0, 2f0)) returns a Float64. #1995

Closed

Merge branch 'master' into dw/rand_multiple_consistent

2f6f49f

This was referenced Oct 22, 2025

Add distribution axes #2009

Draft

Sampling from ArrayLike sampler without explicit dimensions returns nothing #2010

Open

sethaxen reviewed Oct 23, 2025

View reviewed changes

src/cholesky/lkjcholesky.jl Show resolved Hide resolved

src/matrix/lkj.jl Outdated Show resolved Hide resolved

src/matrix/lkj.jl Outdated Show resolved Hide resolved

Update src/matrix/lkj.jl

0b8e739

Co-authored-by: Seth Axen <[email protected]>

sethaxen reviewed Oct 23, 2025

View reviewed changes