[Mono-dev] String.GetHashCode()

Steve Bjorg steveb at mindtouch.com
Sat Dec 1 13:16:23 EST 2007

It doesn't make much sense to pre-compute the hashcode for a string  
whose length is not greater than the cacheline size.  Anything that  
fits into the cacheline and is not memory bound is plenty fast enough.

For the same reason, the size and hashcode of a string would need to  
be juxtaposed in memory to load both on first hit.  Otherwise, the  
benefits are again wiped out due to the high cost of loading data  
from memory into the cache.  If optimization at that level isn't  
interesting (b/c we talking nanoseconds in this realm), then the  
bound for N will go up quite quickly before a cached hashcode makes  
sense.  And as Robert so insightfully pointed out, strings for which  
a pre-computed hashcode might make sense may never be used as keys.   
Hence, the complexity increase in th string code -- and therefore  
quality control headache -- introduced by it might not be worth the  

Just some assembly hacker thoughts on the subject... :)

- Steve

Steve G. Bjorg

On Dec 1, 2007, at 10:02 AM, Alan McGovern wrote:

> Also, worst case scenario for a zero length string would mean a 22%  
> increase, not a 100% increase as was said before.
> Alan.
> On Dec 1, 2007 6:01 PM, Alan McGovern < alan.mcgovern at gmail.com>  
> wrote:
> Well, 'N' could be computed by asking what percent bloat is 'OK' or  
> it could be computed by asking 'What % speedup do we want to make  
> this a worthwhile tradeoff'.
> Currently a string has these two fields, an int and a char. Add in  
> the object overhead of (i think) 12 bytes per object, you're up to  
> 18 bytes straight off for a zero length string. Lets assume that  
> you're only going to precalculate the hashcode for strings greater  
> than length 4.
> So, take the case of a string of length 5. This puts the size of  
> the string at 18 + 5*2 = 28 bytes. Adding 4 bytes to this is 15%  
> extra memory usage. As string length goes up, this % goes down.
> I'm unsure what % speedup precalculating the hashcode would result  
> in, but i'm sure someone could run a few benchmarks if they have  
> the time free. You could say that you want at least a 5x or maybe  
> 10x improvement before it becomes worthwhile, this would then put a  
> lower bound on the string length to make it worthwhile.
> Alan.
> On Dec 1, 2007 5:29 PM, Robert Jordan <robertj at gmx.net > wrote:
> Hi Alan,
> Alan McGovern wrote:
> > Also, just looking at the string source a bit more closely, it has a
> > GetCaseInsensitiveHashcode method too, so i'd assume that would  
> need to be
> > cached too which would mean 8 bytes would be needed. This  
> wouldn't scale
> > well.
> >
> > Fair enough. Twas just an idea.
> nah, don't give up too early! :) If the string is longer than "n",
> you could allocate 4 or even 8 additional bytes at the end of the
> array for lazily computed hash codes.
> Open questions:
> - What's the optimal "n" for all common application classes
>   mono is used for? :) It looks like it should be dynamically
>   computed & adjusted based on profiling data collected at
>   runtime.
> - What's the real gain of this optimization? If "n" tends to
>   be really large (say 1-4KB), what's the probability of
>   such large strings being used as a hash key?
>   This question can be reduced to "how frequently is
>   s.GetHashCode called, where s.Length > n".
>   In mono's class libs sources there is only one place (I know
>   of) where large strings are potentially hashed: Sys.Web's
>   cache management.
> Robert
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20071201/1c181fe0/attachment.html 

More information about the Mono-devel-list mailing list