[Mono-devel-list] [IDEA] Caching String Hashcodes

Mon May 3 12:10:07 EDT 2004

Hello,

The number for MCS was honestly something that was off the top of my
head (my build was broken at the time). Regardless, the point is the
same. The cost of extra memory should be negated by perf gains. Be is
.5%, .25% or 3% the concept is the same.

Anyways, I do not mean to discourge anyone from investigating further
into the issue, I just hope that an investigation would entail
real-world uses of the code rather than a strict microbenchmark. This is
one area where such a measurement would be very dangerous.

And, of course, I do not have final say over the approval of such a
patch, though i will surely voice my opinion :-).

WRT to readonly-vt.cs, you may have forgotten the readonly static fields
are value types. For a object type, the item that is readonly is the
pointer -- thus, only readonly fields of the object could be resolved at
jit time. however, for a valuetype, all the fields of the valuetype are
readonly (a trivial example -- for int, the m_value or whatever field is
readonly). So, the example is correct.

bdm

>>> Paolo Molaro <lupus at ximian.com> 05/03/04 08:03 AM >>>
On 05/02/04 Ben Maurer wrote:
> > I thought about a pretty simple thing. As strings are immutable we
are
> > able to cache their hashcodes. For (long) strings that make several
> > calls to GetHashCode this would give us an incredible perf boost.
> > However it will cost us 4 bytes of additional data per string even
if
> > the hashcode isn't used at all.

Eventually we'll probably put the hashcode in the synchronization
area, but that requires quite a bit more changes in the Monitor
implementation.
Adding 4 bytes to all the strings means a 10-20% space increase for the
most commons strings: the total overhead of this is hard to measure, but
we should try to avoid it.
One idea is to allocate extra room for the hash code only for large
strings (at the end of the string data), say, for 50+ char strings
the overhead is capped at 4 % and is usually much less. This would be
easy to implement: we already need to allocate one more char than
required anyway. And for short strings it should not be a big issue to
always calculate the hash code.
We could also try with just a ushort for the hash, just don't quote me
in the future as the one who said: who needs hashtables with many
more string entries than 64K:-)
Anyway: we need people to try out these ideas and experiment and post
numbers that we can look at.

> You might see a huge boost for a large string, but i would only be
> impressed if you were able to show a metric such as MCS bootstrap time
> reduced by 3%.

Considering that a profile run puts the cost of String.GetHashCode()
below 1 % with mcs compiling itself, this looks like an extremely silly
expectation.

> memory for everyone. However, this might prevent us from doing it in
> managed, which would be a very bad thing.

This is completely untrue, it's just that you don't know how things
work.

> Also, It would still take some
> sort of benchmark to convince me this is worthwhile.

Ben, he doesn't need to convince _you_: you're free to voice your
opinion, but that doesn't mean it has any value wrt a change to
how the runtime works.

> I gave you a list before of optimizations that would be truly helpful.

Ben: it's none of your business to tell people what to do and whine on
the list if they don't do it. The mono contributors may do whatever
pleases them the most: deal with it.

> Also, last night I checked in some benchmarks that the JIT could do
> better on. They are things that show up in real code. You might want
to
> try them.

I only checked readonly-vt.cs and, frankly, the only interesting thing
about that test is that the comment shows you don't understand the
difference between readonly and immutable.

lupus

-- 
-----------------------------------------------------------------
lupus at debian.org                                     debian/rules
lupus at ximian.com                             Monkeys do it better
_______________________________________________
Mono-devel-list mailing list
Mono-devel-list at lists.ximian.com
http://lists.ximian.com/mailman/listinfo/mono-devel-list