[Mono-devel-list] [IDEA] Caching String Hashcodes

Ben Maurer bmaurer at users.sourceforge.net
Mon May 3 16:48:52 EDT 2004


On Mon, 2004-05-03 at 07:58, Paolo Molaro wrote:
> One idea is to allocate extra room for the hash code only for large
> strings (at the end of the string data), say, for 50+ char strings
> the overhead is capped at 4 % and is usually much less. This would be
> easy to implement: we already need to allocate one more char than
> required anyway. And for short strings it should not be a big issue to
> always calculate the hash code.
This is an interesting idea. But, I wonder how frequently GetHashCode is
called on such large strings. I think a useful measurement to make
before investigating a change wrt GetHashCode is to capture data on what
size strings are most frequently used in GetHashCode. MCS is a good
application, but monodoc and any asp.net app probably also make good
examples.

> > You might see a huge boost for a large string, but i would only be
> > impressed if you were able to show a metric such as MCS bootstrap time
> > reduced by 3%.
> 
> Considering that a profile run puts the cost of String.GetHashCode()
> below 1 % with mcs compiling itself, this looks like an extremely silly
> expectation.
Well, note the `such as'. My point is that it is probably not too hard
to make a benchmark that reads:

for (int i = 0; i < 1000000000; i ++)
     "akdsjfalsdkjfa;sldkfja;dlkfjasdlfkjasd;lfkjsad;lfksajdf;lksadjf;".GetHashCode ();

faster. And I hope that experimentation wrt to such a patch would be
more in-depth than the above example.

> > Also, last night I checked in some benchmarks that the JIT could do
> > better on. They are things that show up in real code. You might want to
> > try them.
> 
> I only checked readonly-vt.cs and, frankly, the only interesting thing
> about that test is that the comment shows you don't understand the
> difference between readonly and immutable.

Actually, I do understand the difference :-).

For an Object field, `mutated' would mean the pointer to the object is
changed. Thus, the fields would obviously not be subject to JIT time
evaluation.

However, for a value type, the field values *can* be resolved at jit
time because `mutating' a value type means mutating one of its fields.
The thing that prevents this is that you would need to execute a
`ldsflda' on the initonly field. However, partition III states:
        Note: Using ldsflda to compute the address of a static,
        init-only field and then using the resulting pointer to modify
        that value outside the body of the class initializer may lead to
        unpredictable behavior.

This warning is the same warning appended to the `stsfld' instruction:
        Note: Using stsfld to change the value of a static, init-only
        field outside the body of the class initializer may lead to
        unpredictable behavior.

So, the comment in the test case stands.




More information about the Mono-devel-list mailing list