[Mono-dev] String.GetHashCode Discussion.

Zoltan Varga vargaz at gmail.com
Wed Jul 16 19:19:34 EDT 2008


  Where is this hashcode implementation taken from ? I don't think we
should invent new ones.


2008/7/17 Marek Safar <marek.safar at seznam.cz>:
> Hello,
> Here are some data which hopefully bring some light to this topic. I didn't
> measure uniqueness of hashcodes as I consider it less important as range
> distribution. Enclosed you can find the result for you data with old and new
> string.GetHashCode implementation.
> Some numbers to look at:
> ** Old implementation
> Highest contentions: 2985, 1155, 1137
> ** Current implementation
> Highest contentions: 236, 217, 210
> Marek
>> I have been asked to move this discussion to the e-mail list from IRC.
>>  Basically we (my company and I) have new unit tests errors in 2.0
>> that did not occur at 1.9.  The errors were traced to the
>> String.GetHashCode change.  I had one of our interns (Mike) research
>> the change and I wanted to share his findings with you.  His final
>> document s attached.  I can also send the test case Mike used to
>> generate his data.  I believe it is too large to send to all of your
>> inboxes.
>> http://docs.google.com/Doc?id=dhnnms3p_3fpmfq8dm
>> (1:28:15 PM) holmes: Have there been any discussions about r106887
>> which was a change to String.GetHashCode?
>> (1:31:54 PM) holmes: This change is causing some of our tests to fail.
>>  I will admit that our tests are relying on the sting hash code a bit
>> too much.  However after some investigation I am wondering if the new
>> algorithm is generating a unique enough result.  (I say that as
>> respectfully as I can.)
>> (1:34:33 PM) holmes: We conducted a short test where 58000 English
>> string were tested and the new code only results in 60% unique results
>> compared to the previous code's performance of 100%
>> (1:38:36 PM) holmes: We have found the current algorithm only
>> considers the last ~6 characters of a string, which may be the
>> problem.  That is the last thing I will say here I promise.
>> (1:42:21 PM) kumpera: marek: how have you tested the distribution of
>> your new string hashing algorithm?
>> (1:42:54 PM) marek: kumpera: I used english dictionary
>> (1:43:57 PM) marek: holmes: how big is your dictionary?
>> (1:45:05 PM) holmes: marek: 58110
>> (1:46:12 PM) holmes: I think the real problem is not that a word
>> hashes unique enough as a sentence producing a unique result.
>> (1:46:56 PM) holmes: the last 6 characters define the whole algorithm.
>>  (or at least I think so)
>> (1:56:50 PM) lupus: it's likely better to revert the change, string
>> hashing is also supposed to be synced with the implementation in C in
>> the runtime anyway
>> -bill
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list at lists.ximian.com
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list

More information about the Mono-devel-list mailing list