[Mono-dev] String.GetHashCode Discussion.

Sat Jul 19 05:29:37 EDT 2008

Imho we should revert the change, especially as MS seems to also have an
implementation with low collisions (and in fact this will be what people are
expecting from a hashcode without any further explanation) we should do the
same. Otherwise this might drive some implementations into huge perf
problems.

Happy Hacking
Andreas

> -----Ursprüngliche Nachricht-----
> Von: mono-devel-list-bounces at lists.ximian.com [mailto:mono-devel-list-
> bounces at lists.ximian.com] Im Auftrag von Marek Safar
> Gesendet: Donnerstag, 17. Juli 2008 01:13
> An: Bill Holmes
> Cc: mono-devel-list
> Betreff: Re: [Mono-dev] String.GetHashCode Discussion.
> 
> Hello,
> 
> Here are some data which hopefully bring some light to this topic. I
> didn't measure uniqueness of hashcodes as I consider it less important
> as range distribution. Enclosed you can find the result for you data
> with old and new string.GetHashCode implementation.
> 
> Some numbers to look at:
> 
> ** Old implementation
> 
> Highest contentions: 2985, 1155, 1137
> 
> ** Current implementation
> 
> Highest contentions: 236, 217, 210
> 
> Marek
> 
> 
> > I have been asked to move this discussion to the e-mail list from
> IRC.
> >  Basically we (my company and I) have new unit tests errors in 2.0
> > that did not occur at 1.9.  The errors were traced to the
> > String.GetHashCode change.  I had one of our interns (Mike) research
> > the change and I wanted to share his findings with you.  His final
> > document s attached.  I can also send the test case Mike used to
> > generate his data.  I believe it is too large to send to all of your
> > inboxes.
> >
> > http://docs.google.com/Doc?id=dhnnms3p_3fpmfq8dm
> >
> > (1:28:15 PM) holmes: Have there been any discussions about r106887
> > which was a change to String.GetHashCode?
> > (1:31:54 PM) holmes: This change is causing some of our tests to
> fail.
> >  I will admit that our tests are relying on the sting hash code a bit
> > too much.  However after some investigation I am wondering if the new
> > algorithm is generating a unique enough result.  (I say that as
> > respectfully as I can.)
> > (1:34:33 PM) holmes: We conducted a short test where 58000 English
> > string were tested and the new code only results in 60% unique
> results
> > compared to the previous code's performance of 100%
> > (1:38:36 PM) holmes: We have found the current algorithm only
> > considers the last ~6 characters of a string, which may be the
> > problem.  That is the last thing I will say here I promise.
> > (1:42:21 PM) kumpera: marek: how have you tested the distribution of
> > your new string hashing algorithm?
> > (1:42:54 PM) marek: kumpera: I used english dictionary
> > (1:43:57 PM) marek: holmes: how big is your dictionary?
> > (1:45:05 PM) holmes: marek: 58110
> > (1:46:12 PM) holmes: I think the real problem is not that a word
> > hashes unique enough as a sentence producing a unique result.
> > (1:46:56 PM) holmes: the last 6 characters define the whole
> algorithm.
> >  (or at least I think so)
> > (1:56:50 PM) lupus: it's likely better to revert the change, string
> > hashing is also supposed to be synced with the implementation in C in
> > the runtime anyway
> >
> > -bill
> > _______________________________________________
> > Mono-devel-list mailing list
> > Mono-devel-list at lists.ximian.com
> > http://lists.ximian.com/mailman/listinfo/mono-devel-list
> >
> >