[Mono-devel-list] [IDEA] Caching String Hashcodes

Paolo Molaro lupus at ximian.com
Mon May 3 07:58:24 EDT 2004


On 05/02/04 Ben Maurer wrote:
> > I thought about a pretty simple thing. As strings are immutable we are
> > able to cache their hashcodes. For (long) strings that make several
> > calls to GetHashCode this would give us an incredible perf boost.
> > However it will cost us 4 bytes of additional data per string even if
> > the hashcode isn't used at all.

Eventually we'll probably put the hashcode in the synchronization
area, but that requires quite a bit more changes in the Monitor
implementation.
Adding 4 bytes to all the strings means a 10-20% space increase for the
most commons strings: the total overhead of this is hard to measure, but
we should try to avoid it.
One idea is to allocate extra room for the hash code only for large
strings (at the end of the string data), say, for 50+ char strings
the overhead is capped at 4 % and is usually much less. This would be
easy to implement: we already need to allocate one more char than
required anyway. And for short strings it should not be a big issue to
always calculate the hash code.
We could also try with just a ushort for the hash, just don't quote me
in the future as the one who said: who needs hashtables with many
more string entries than 64K:-)
Anyway: we need people to try out these ideas and experiment and post
numbers that we can look at.

> You might see a huge boost for a large string, but i would only be
> impressed if you were able to show a metric such as MCS bootstrap time
> reduced by 3%.

Considering that a profile run puts the cost of String.GetHashCode()
below 1 % with mcs compiling itself, this looks like an extremely silly
expectation.

> memory for everyone. However, this might prevent us from doing it in
> managed, which would be a very bad thing.

This is completely untrue, it's just that you don't know how things work.

> Also, It would still take some
> sort of benchmark to convince me this is worthwhile.

Ben, he doesn't need to convince _you_: you're free to voice your
opinion, but that doesn't mean it has any value wrt a change to
how the runtime works.

> I gave you a list before of optimizations that would be truly helpful.

Ben: it's none of your business to tell people what to do and whine on
the list if they don't do it. The mono contributors may do whatever
pleases them the most: deal with it.

> Also, last night I checked in some benchmarks that the JIT could do
> better on. They are things that show up in real code. You might want to
> try them.

I only checked readonly-vt.cs and, frankly, the only interesting thing
about that test is that the comment shows you don't understand the
difference between readonly and immutable.

lupus

-- 
-----------------------------------------------------------------
lupus at debian.org                                     debian/rules
lupus at ximian.com                             Monkeys do it better



More information about the Mono-devel-list mailing list