New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Hash#transform_{keys,values}
#14502
base: master
Are you sure you want to change the base?
Conversation
each do |key, value| | ||
hash[yield(key, value)] = value | ||
end | ||
hash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we merge the branches? Like:
copy = size > 4 ? Hash(K2, V).new(capacity) : Hash(K2, V).new
each_with_object(copy) { ... }
Maybe 4 could be a constant instead of a magic number? For example Hash::INITIAL_CAPACITY
or Hash::MINIMAL_CAPACITY
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be even simpler: Hash(K2, V).new(size < 4 ? 0 : size)
Initializing the transformed has with the current size seems like a good idea. We could also consider using the current capacity in order to make it a copy with identical properties, just transformed values/keys. But I'm not sure about optimizing the capacity for small hashes. The default minimum capacity for a hash has its reasons. It's not highly optimized for the use case of very small hashes, but that's that common and the overhead is minimal. So my suggestion would be to drop the special handling for small hashes and just allocate the transformed hash with an initial capacity. |
Let's do some archeology: In #8017 But #8182 patched Said differently: magic numbers 😮💨 All that to say that magic numbers really are as bad as we say they are + I agree with @straight-shoota's statement above: let's just always use the capacity, as other calls to |
Preallocating the correct-size hash is faster for hash sizes > 4, but the empty hash is faster for sizes <= 4. It would be a lot closer, but
Hash#upsert
allocates an initial capacity of 4 when the hash is empty butHash#initialize
sets the floor forinitial_capacity
at 8 elements.This PR gives the best of both worlds by using the current approach when the source hash has <= 4 keys in it, or a fully preallocated hash for > 4.
Benchmark code