[Mono-dev] Dictionary`2: optimized and serialization-compatible with MS.net

Paolo Molaro lupus at ximian.com
Mon Jun 11 10:51:33 EDT 2007


On 06/11/07 Juraj Skripsky wrote:
> // walk linked list until right slot is found or end is reached 
> while (cur != NO_SLOT) {
>     if (linkSlots [cur].HashCode == hashCode &&
>         hcp.Equals (dataSlots [cur].Key, key))
>         return dataSlots [cur].Value;
>     cur = linkSlots [cur].Next;
> }
> ------
> 
> "linkSlots [cur].HashCode == hashCode" would have to be replaced with
> "linkSlots [cur].Key.GetHashCode() == hashCode"...
> 
> Or am I missing something here?

You're missing that the hashcode check can be completely removed
since it's not needed. Having it may have some performance benefit
with many collisions or for presence checks that fail and happen
to hit a bucket with values that also have different hashcodes.
But this benefit (I think it is small, would love to see numbers for a
few test cases) needs to be balanced against a memory usage overhead
that in most cases is at about 20%.

> I can't wait for the optimized implementation to be committed to svn. As
> soon as that is done, it will be interesting to replace private/internal
> hashtables in the class libs with Dictionaries (e.g. QuickSearch.shift
> in System.Text.RegularExpressions) for the NET_2_0 profile and measure
> the speedup.

Well, my opionion is that using a hashtable there is completely overkill
and the wrong thing to do (even if it is a super-optimized Dictionary<T>)
performance-wise, in fact I think I removed that sometime in the past
because it gave an enourmous performance vantage to not do it in a wide
range of cases. Just limiting the chars that are handled with the skip
table to latin1 would be enough to have so much better performance in
95% of the cases that the remaining cases would become uninteresting.
But if we want to pay some service to people not using only the latin1
subset of unicode, we could still use a 256-entry array for latin1
and a hash/dictionary that is created and used only for the chars > 255.
GetShiftDistance would become:

	if (c < 256)
		return latin1_skip_table [(byte)c];
	...current code...

lupus

-- 
-----------------------------------------------------------------
lupus at debian.org                                     debian/rules
lupus at ximian.com                             Monkeys do it better



More information about the Mono-devel-list mailing list