[Mono-dev] String.GetHashCode()

Alan McGovern alan.mcgovern at gmail.com
Sat Dec 1 13:02:40 EST 2007


Also, worst case scenario for a zero length string would mean a 22%
increase, not a 100% increase as was said before.

Alan.

On Dec 1, 2007 6:01 PM, Alan McGovern <alan.mcgovern at gmail.com> wrote:

> Well, 'N' could be computed by asking what percent bloat is 'OK' or it
> could be computed by asking 'What % speedup do we want to make this a
> worthwhile tradeoff'.
>
> Currently a string has these two fields, an int and a char. Add in the
> object overhead of (i think) 12 bytes per object, you're up to 18 bytes
> straight off for a zero length string. Lets assume that you're only going to
> precalculate the hashcode for strings greater than length 4.
>
> So, take the case of a string of length 5. This puts the size of the
> string at 18 + 5*2 = 28 bytes. Adding 4 bytes to this is 15% extra memory
> usage. As string length goes up, this % goes down.
>
> I'm unsure what % speedup precalculating the hashcode would result in, but
> i'm sure someone could run a few benchmarks if they have the time free. You
> could say that you want at least a 5x or maybe 10x improvement before it
> becomes worthwhile, this would then put a lower bound on the string length
> to make it worthwhile.
>
> Alan.
>
>
> On Dec 1, 2007 5:29 PM, Robert Jordan <robertj at gmx.net> wrote:
>
> > Hi Alan,
> >
> > Alan McGovern wrote:
> > > Also, just looking at the string source a bit more closely, it has a
> > > GetCaseInsensitiveHashcode method too, so i'd assume that would need
> > to be
> > > cached too which would mean 8 bytes would be needed. This wouldn't
> > scale
> > > well.
> > >
> > > Fair enough. Twas just an idea.
> >
> > nah, don't give up too early! :) If the string is longer than "n",
> > you could allocate 4 or even 8 additional bytes at the end of the
> > array for lazily computed hash codes.
> >
> > Open questions:
> >
> > - What's the optimal "n" for all common application classes
> >   mono is used for? :) It looks like it should be dynamically
> >   computed & adjusted based on profiling data collected at
> >   runtime.
> >
> > - What's the real gain of this optimization? If "n" tends to
> >   be really large (say 1-4KB), what's the probability of
> >   such large strings being used as a hash key?
> >   This question can be reduced to "how frequently is
> >   s.GetHashCode called, where s.Length > n".
> >
> >   In mono's class libs sources there is only one place (I know
> >   of) where large strings are potentially hashed: Sys.Web's
> >   cache management.
> >
> > Robert
> >
> > _______________________________________________
> > Mono-devel-list mailing list
> > Mono-devel-list at lists.ximian.com
> > http://lists.ximian.com/mailman/listinfo/mono-devel-list
> >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20071201/0cb71eda/attachment.html 


More information about the Mono-devel-list mailing list