[Mono-dev] String.GetHashCode()
Alan McGovern
alan.mcgovern at gmail.com
Sat Dec 1 13:01:02 EST 2007
Well, 'N' could be computed by asking what percent bloat is 'OK' or it could
be computed by asking 'What % speedup do we want to make this a worthwhile
tradeoff'.
Currently a string has these two fields, an int and a char. Add in the
object overhead of (i think) 12 bytes per object, you're up to 18 bytes
straight off for a zero length string. Lets assume that you're only going to
precalculate the hashcode for strings greater than length 4.
So, take the case of a string of length 5. This puts the size of the string
at 18 + 5*2 = 28 bytes. Adding 4 bytes to this is 15% extra memory usage. As
string length goes up, this % goes down.
I'm unsure what % speedup precalculating the hashcode would result in, but
i'm sure someone could run a few benchmarks if they have the time free. You
could say that you want at least a 5x or maybe 10x improvement before it
becomes worthwhile, this would then put a lower bound on the string length
to make it worthwhile.
Alan.
On Dec 1, 2007 5:29 PM, Robert Jordan <robertj at gmx.net> wrote:
> Hi Alan,
>
> Alan McGovern wrote:
> > Also, just looking at the string source a bit more closely, it has a
> > GetCaseInsensitiveHashcode method too, so i'd assume that would need to
> be
> > cached too which would mean 8 bytes would be needed. This wouldn't scale
> > well.
> >
> > Fair enough. Twas just an idea.
>
> nah, don't give up too early! :) If the string is longer than "n",
> you could allocate 4 or even 8 additional bytes at the end of the
> array for lazily computed hash codes.
>
> Open questions:
>
> - What's the optimal "n" for all common application classes
> mono is used for? :) It looks like it should be dynamically
> computed & adjusted based on profiling data collected at
> runtime.
>
> - What's the real gain of this optimization? If "n" tends to
> be really large (say 1-4KB), what's the probability of
> such large strings being used as a hash key?
> This question can be reduced to "how frequently is
> s.GetHashCode called, where s.Length > n".
>
> In mono's class libs sources there is only one place (I know
> of) where large strings are potentially hashed: Sys.Web's
> cache management.
>
> Robert
>
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20071201/80c2247b/attachment.html
More information about the Mono-devel-list
mailing list