Skip to content

Conversation

oscardssmith
Copy link
Member

I believe this frees up 16 bytes for approximately free. idxfloor seems like a good idea, but it adds very little performance because with uniformly distributed hashes, on average, the idxfloor will be very low. The other change is to shrink age and maxprobe by 32 bits because there never should be 4 billion collisions next to each other, and age doesn't need to be 64 bits either.

@oscardssmith oscardssmith added the performance Must go faster label Jan 22, 2023
@petvana
Copy link
Member

petvana commented Jan 23, 2023

The only major benefit of idxfloor is that we do not iterate over empty dict (#44411). We can add a fast path for count == 0 without idxfloor to get the same performance.

@oscardssmith
Copy link
Member Author

that makes sense to me.

oscarddssmith added 4 commits January 28, 2023 20:52
…s full idxfloor will usually be <4 so the compute and memory time of tracking this number aren't worth much
…ance of missing a concurrent write due to overflow in age seems acceptable to me (these changes together shrink by 8 bytes)
Co-authored-by: Kristoffer Carlsson <[email protected]>
@simeonschaub simeonschaub added the needs nanosoldier run This PR should have benchmarks run on it label Jan 30, 2023
@simeonschaub
Copy link
Member

I believe we have some basic Dict benchmarks in BaseBenchmarks, so would be good to verify with those before merging

@gbaraldi
Copy link
Member

@nanosoldier runbenchmarks("!scalar", vs=":master")

@nanosoldier
Copy link
Collaborator

Your job failed.

@vtjnash
Copy link
Member

vtjnash commented Jan 30, 2023

maxprobe could be arbitrarily large, if the hash function is poorly chosen (it just degrades the dict into an array)

@oscardssmith
Copy link
Member Author

if the hash function is poorly chosen everything else about our Dict's performance breaks anyway. I don't see why iteration should be special.

@petvana
Copy link
Member

petvana commented Jan 30, 2023

maxprobe could be arbitrarily large, if the hash function is poorly chosen (it just degrades the dict into an array)

For maxprobe to reach 32 bits, you first need to execute ~ 2^62 comparisons just to insert the elements that seems quite intractable for any non-quantum computer. Short example

julia> struct X; x::Int64; end

julia> Base.hash(x::X) = UInt64(1)

julia> for n = 5000:5000:20000; print("n=",n); 
         @time (d = Dict(); for i=1:n; d[X(i)] = i; end)
       end
n=5000  0.458749 seconds (12.52 M allocations: 213.679 MiB, 30.10% gc time, 0.72% compilation time)
n=10000  1.442519 seconds (50.03 M allocations: 785.996 MiB, 7.78% gc time)
n=15000  3.760717 seconds (112.53 M allocations: 1.699 GiB, 10.08% gc time)
n=20000  9.156206 seconds (200.07 M allocations: 3.070 GiB, 7.67% gc time)

@petvana petvana added the collections Data structures holding multiple items, e.g. sets label Mar 18, 2023
Comment on lines 77 to +78
new(copy(d.slots), copy(d.keys), copy(d.vals), d.ndel, d.count, d.age,
d.idxfloor, d.maxprobe)
d.maxprobe)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it can be moved to the previous line.

@petvana
Copy link
Member

petvana commented Mar 30, 2023

Now, the error from docs is that skip_deleted_floor! is missing in base/set.jl and base/weakkeydict.jl

@oscardssmith
Copy link
Member Author

closing in favor of #59769

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

collections Data structures holding multiple items, e.g. sets needs nanosoldier run This PR should have benchmarks run on it performance Must go faster

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants