[Mono-dev] String.GetHashCode Discussion.
marek.safar at seznam.cz
Wed Jul 16 19:13:04 EDT 2008
Here are some data which hopefully bring some light to this topic. I
didn't measure uniqueness of hashcodes as I consider it less important
as range distribution. Enclosed you can find the result for you data
with old and new string.GetHashCode implementation.
Some numbers to look at:
** Old implementation
Highest contentions: 2985, 1155, 1137
** Current implementation
Highest contentions: 236, 217, 210
> I have been asked to move this discussion to the e-mail list from IRC.
> Basically we (my company and I) have new unit tests errors in 2.0
> that did not occur at 1.9. The errors were traced to the
> String.GetHashCode change. I had one of our interns (Mike) research
> the change and I wanted to share his findings with you. His final
> document s attached. I can also send the test case Mike used to
> generate his data. I believe it is too large to send to all of your
> (1:28:15 PM) holmes: Have there been any discussions about r106887
> which was a change to String.GetHashCode?
> (1:31:54 PM) holmes: This change is causing some of our tests to fail.
> I will admit that our tests are relying on the sting hash code a bit
> too much. However after some investigation I am wondering if the new
> algorithm is generating a unique enough result. (I say that as
> respectfully as I can.)
> (1:34:33 PM) holmes: We conducted a short test where 58000 English
> string were tested and the new code only results in 60% unique results
> compared to the previous code's performance of 100%
> (1:38:36 PM) holmes: We have found the current algorithm only
> considers the last ~6 characters of a string, which may be the
> problem. That is the last thing I will say here I promise.
> (1:42:21 PM) kumpera: marek: how have you tested the distribution of
> your new string hashing algorithm?
> (1:42:54 PM) marek: kumpera: I used english dictionary
> (1:43:57 PM) marek: holmes: how big is your dictionary?
> (1:45:05 PM) holmes: marek: 58110
> (1:46:12 PM) holmes: I think the real problem is not that a word
> hashes unique enough as a sentence producing a unique result.
> (1:46:56 PM) holmes: the last 6 characters define the whole algorithm.
> (or at least I think so)
> (1:56:50 PM) lupus: it's likely better to revert the change, string
> hashing is also supposed to be synced with the implementation in C in
> the runtime anyway
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 28813 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20080717/83376fd5/attachment-0001.bin
More information about the Mono-devel-list