-
Notifications
You must be signed in to change notification settings - Fork 431
Decouple rand and eltype
#1905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Decouple rand and eltype
#1905
Conversation
747a191 to
0ea5502
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #1905 +/- ##
==========================================
- Coverage 86.36% 85.56% -0.81%
==========================================
Files 146 146
Lines 8786 8851 +65
==========================================
- Hits 7588 7573 -15
- Misses 1198 1278 +80 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
Given that, according to the docstring Distributions.jl/src/common.jl Lines 96 to 102 in a1010e4
the sole purpose of eltype is to return the type of rand, which I agree isn't really feasible, should we also deprecate the methods here as part of this PR?
|
|
#1907 is related. |
5644af6 to
3e66d3e
Compare
|
My personal opinion: This is orthogonal to the fact that Distributions should probably support In an ideal situation, I think we would define |
I became convinced that in general this is not possible. It basically means trying to manually figure out type inference, a task that even the Julia compiler can fail at. Of course, for some number types and distributions it can be done (as also the compiler will succeed in many instances). But in general it's far from trivial. As in the initial stages of Distributions, one could be restrictive and limit samples to |
4db3858 to
d824cf8
Compare
d824cf8 to
2e7752e
Compare
|
A few distributions still give me julia> dist = Semicircle(50.0f0)
Semicircle{Float32}(r=50.0f0)
julia> rand(dist, 5)
5-element Array{Float64, 1}:
36.80671953487414
-18.355635129900335
-11.701855436648922
-21.444118928985656
-5.80120463505302
julia> dist = JohnsonSU(0.0f0, 1.0f0, 0.0f0, 1.0f0)
JohnsonSU{Float32}(ξ=0.0f0, λ=1.0f0, γ=0.0f0, δ=1.0f0)
julia> rand(dist, 5)
5-element Array{Float64, 1}:
0.5311696707298299
1.632313034117999
0.04951771555318912
0.4721610259428258
-3.052321854866766
julia> dist = Chisq(5.0f0)
Chisq{Float32}(ν=5.0f0)
julia> rand(dist, 5)
5-element Array{Float32, 1}:
15.465032
1.888659
7.013455
4.258529
3.9611576Can this be fixed? |
|
The reason is that for quite a few of the older, probably less used and definitely less frequently updated distributions the |
|
|
|
Sorry, late to the party, but with iterators we have a "HasEltype" attribute to address exactly this issue, how is this design related? |
sethaxen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a review of censored, Cholesky-, and matrix-variate distributions.
Co-authored-by: Seth Axen <[email protected]>
Sometimes quantiles of integer-valued distributions are taken to interpolate between the integers flanking the 50% value of the cumulative. E.g. the median of an unbiased Bernoulli distribution would be 0.5 in this convention. Just saying. |
sethaxen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's a review of the mixture models and multivariate distributions.
src/mixtures/unigmm.jl
Outdated
| randn!(rng, x) | ||
| for i in eachindex(x) | ||
| k = rand(rng, psampler) | ||
| x[i] = muladd(x[i], stds[k], means[k]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code seems to assume that stds and means have 1-based indexing. Is this enforced anywhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, good point. Basically most of the mixture model code suffers from this issue (it's not as bad anymore now that we removed all @inbounds, but still bad). Let's leave other functions for a separate PR but I'll at least fix the rand/rand! calls.
| end | ||
| function rand(rng::AbstractRNG, d::MvLogNormal, n::Int) | ||
| xs = rand(rng, d.normal, n) | ||
| map!(exp, xs, xs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the docs for map!:
Behavior can be unexpected when any mutated argument shares memory with any other argument.
Maybe we could broadcast here instead?
| map!(exp, xs, xs) | |
| xs .= exp.(xs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure. Generally IMO it's better to avoid broadcasting unless it's necessary (ie unless you're actually broadcasting objects with different dimensions). Broadcasting is slow, both at compile and runtime.
| _rand!(rng, d.normal, x) | ||
| function rand(rng::AbstractRNG, d::MvLogNormal) | ||
| x = rand(rng, d.normal) | ||
| map!(exp, x, x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| map!(exp, x, x) | |
| x .= exp.(x) |
| @eval begin | ||
| Base.@propagate_inbounds function rand!(rng::AbstractRNG, d::MvLogNormal, x::AbstractArray{<:Real,$N}) | ||
| rand!(rng, d.normal, x) | ||
| map!(exp, x, x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| map!(exp, x, x) | |
| x .= exp.(x) |
| y = similar(x, (1, cols)) | ||
| unwhiten!(d.Σ, randn!(rng, x)) | ||
| rand!(rng, chisqd, y) | ||
| x .= x ./ sqrt.(y ./ d.df) .+ d.μ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not broadcast the rand call to avoid allocating an intermediate container?
| y = similar(x, (1, cols)) | |
| unwhiten!(d.Σ, randn!(rng, x)) | |
| rand!(rng, chisqd, y) | |
| x .= x ./ sqrt.(y ./ d.df) .+ d.μ | |
| unwhiten!(d.Σ, randn!(rng, x)) | |
| rand!(rng, chisqd, y) | |
| x .= x ./ sqrt.(rand.(rng, chisqd) ./ d.df) .+ d.μ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could, but I chose this version on purpose: Generally, vectorized sampling tends to be much faster.
Co-authored-by: Seth Axen <[email protected]>
One of my take-aways from issues such as #1252, #1902, #894, #1783, #1041, and #1071 is that
eltypeis not only quite inconsistently implemented and unclearly documented currently but more generally it might be a bad design decision to use it for pre-allocating containers of samples: Setting it to a fixed type (historicallyFloat64for continuous andIntfor discrete distributions) is too limiting, but trying to infer it from parameters is challenging and doomed to fail for more challenging scenarios that are not as clear as e.g.Normal.This PR tries to decouple
randandeltype, to make it easier to possibly eventually deprecate and removeeltype.Fixes #1041.
Fixes #1071.
Fixes #1082.
Fixes #1252.
Fixes #1783.
Fixes #1884.
Fixes #1902.
Fixes #1907.