[Mono-dev] String class speed improvements

Paolo Molaro lupus at ximian.com
Thu Nov 23 05:06:48 EST 2006


On 11/23/06 marek safar wrote:
> > > +						// We have always at least 2 characters
> > > +						if (*s1_ptr == *any_ptr || *s1_ptr == any_ptr[1])
> > > +							return i;
> > > +
> > > +						int ii = anyOfLength;
> > > +						while (ii > 2) {
> > > +							if (*s1_ptr == any_ptr[ii] || *s1_ptr == any_ptr[ii - 1])
> > > +								return i;
> > > +
> > > +							ii -= 2;
> > > +						}
> > > +
> > > +						if (ii > 1 && *s1_ptr == any_ptr[2])
> > > +							return i;
> > 
> > Did you experiment also with a simple loop here (using the usual end
> > pointer as well instead of ii)? It should be a good balance between
> > the speed with small and big anyof arrays.
> 
> Yes, I did and for was usually slower. IIRC was definitely slower with mono JIT.

If you coded it like the rest of the method with the additional index
var I can believe it. What about the optimized code I suggested?
Something like:
	char *anyc = any_ptr;
	do {
		if (*s1_ptr == *anyc)
			return i;
		anyc++;
	while (anyc < anyc_endp);

> > > +					if (s1_ptr[6] == value)
> > > +						return startIndex + 6;
> > > +					if (s1_ptr[7] == value)
> > > +						return startIndex + 7;
> > > +
> > > +					s1_ptr += 8;
> > > +					startIndex += 8;
> > > +					count -= 8;
> > > +				}
> > 
> > Use an end pointer here, too and you can drop one hot var.
> 
> I tried this but it is definitely slower on x86. If I replace
> 
> if (s1_ptr[1] == value)
> 
> with
> 
> if (*++s1_ptr == value)

This is obviously not what I suggested. See the GetHashCode I mentioned.
The code would look like:

	while (s1_ptr < endp) {
		if (s1_ptr [0] == value)
			return s1_ptr - start;
		if (s1_ptr [1] == value)
			return s1_ptr - start + 1;
		if (s1_ptr [2] == value)
			return s1_ptr - start + 2;
		...
		s1_ptr += 8;
	}
As you see there is one less hot variable that needs to be accessed in
the loop and one is only read and never written to (endp as opposed to
count in your code). There can be oly an issue if the jit thinks start
should go into a register, but in that case the code would not be worse
in any case: we'd still avoid two adds per loop.

> > > +				if (count >= 2) {
> > > +					if (*s1_ptr == value)
> > > +						return startIndex;
> > > +					if (s1_ptr[1] == value)
> > > +						return startIndex + 1;
> > > +
> > > +					s1_ptr += 2;
> > > +					startIndex += 2;
> > > +					count -= 2;
> > > +				}
> > > +
> > > +				return count != 0 && *s1_ptr == value ? startIndex : -1;
> > 
> > A simple loop for the last 3 chars would likely be better here.
> > Also check if unrolling 8 times really has big perf differences:
> > if we could unroll just four times it would be better for smaller
> > strings.
> 
> Unrolling is always faster when I remove 8 times unrolling I lose
> 10-15%.

Do it with the optimized code and also with a test bench that doesn't
only exercise the path where big unrolling is faster.

Thanks.

lupus

-- 
-----------------------------------------------------------------
lupus at debian.org                                     debian/rules
lupus at ximian.com                             Monkeys do it better



More information about the Mono-devel-list mailing list