[Mono-dev] String.GetHashCode()

Alan McGovern alan.mcgovern at gmail.com
Sat Dec 1 13:01:02 EST 2007


Well, 'N' could be computed by asking what percent bloat is 'OK' or it could
be computed by asking 'What % speedup do we want to make this a worthwhile
tradeoff'.

Currently a string has these two fields, an int and a char. Add in the
object overhead of (i think) 12 bytes per object, you're up to 18 bytes
straight off for a zero length string. Lets assume that you're only going to
precalculate the hashcode for strings greater than length 4.

So, take the case of a string of length 5. This puts the size of the string
at 18 + 5*2 = 28 bytes. Adding 4 bytes to this is 15% extra memory usage. As
string length goes up, this % goes down.

I'm unsure what % speedup precalculating the hashcode would result in, but
i'm sure someone could run a few benchmarks if they have the time free. You
could say that you want at least a 5x or maybe 10x improvement before it
becomes worthwhile, this would then put a lower bound on the string length
to make it worthwhile.

Alan.

On Dec 1, 2007 5:29 PM, Robert Jordan <robertj at gmx.net> wrote:

> Hi Alan,
>
> Alan McGovern wrote:
> > Also, just looking at the string source a bit more closely, it has a
> > GetCaseInsensitiveHashcode method too, so i'd assume that would need to
> be
> > cached too which would mean 8 bytes would be needed. This wouldn't scale
> > well.
> >
> > Fair enough. Twas just an idea.
>
> nah, don't give up too early! :) If the string is longer than "n",
> you could allocate 4 or even 8 additional bytes at the end of the
> array for lazily computed hash codes.
>
> Open questions:
>
> - What's the optimal "n" for all common application classes
>   mono is used for? :) It looks like it should be dynamically
>   computed & adjusted based on profiling data collected at
>   runtime.
>
> - What's the real gain of this optimization? If "n" tends to
>   be really large (say 1-4KB), what's the probability of
>   such large strings being used as a hash key?
>   This question can be reduced to "how frequently is
>   s.GetHashCode called, where s.Length > n".
>
>   In mono's class libs sources there is only one place (I know
>   of) where large strings are potentially hashed: Sys.Web's
>   cache management.
>
> Robert
>
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20071201/80c2247b/attachment.html 


More information about the Mono-devel-list mailing list