[Mono-dev] String class speed improvements

marek safar marek.safar at seznam.cz
Wed Nov 22 18:10:20 EST 2006


Hello Paulo,

 I added some immediate comments.


> > +		unsafe int InternalIndexOfAny (char[] anyOf, int startIndex, int count)
> > +		{
> > +			if (anyOf.Length == 0)
> > +				return -1;
> > +
> > +			if (anyOf.Length == 1)
> > +				return IndexOfImpl (anyOf [0], startIndex, count);
> > +
> > +			int anyOfLength = anyOf.Length;
> > +			fixed (char* any_ptr = anyOf) {
> > +				char highest = *any_ptr;
> > +				char lowest = *any_ptr;
> > +
> > +				for (int i = 1; i != anyOfLength; ++i) {
> > +					if (any_ptr[i] > highest) {
> 
> When I did similar optimizations I used a slightly different pattern.
> Instead of using an index var (i in this case), use a pointer to the end
> of the data and change the condition in the loop to p < endp, with p
> being the pointer to the char that is increased at each iteration
> instead of increasing i and then indexing.
> Anyway, I think we should have a fast path here for small strings: if
> the string is small and anyOf is large, we waste a lot of time.
> This is way I always suggested, when optimizing string methods, to
> create benchmarks that exercise many different situations, so that a
> proper balance can be had. Your test bench should have different lengths
> of strings, typically of 0, 3-4, 6-8, 10-20, 30-40, 100 and 1000 chars.
> The anyof array could be of 0, 1, 3, 5, 10, 20 chars.

I had this on the mind I will try to add some fast paths.

> 
> > +						if (*s1_ptr > highest || *s1_ptr < lowest) {
> > +							s1_ptr++;
> > +							continue;
> > +						}
> > +
> > +						// We have always at least 2 characters
> > +						if (*s1_ptr == *any_ptr || *s1_ptr == any_ptr[1])
> > +							return i;
> > +
> > +						int ii = anyOfLength;
> > +						while (ii > 2) {
> > +							if (*s1_ptr == any_ptr[ii] || *s1_ptr == any_ptr[ii - 1])
> > +								return i;
> > +
> > +							ii -= 2;
> > +						}
> > +
> > +						if (ii > 1 && *s1_ptr == any_ptr[2])
> > +							return i;
> 
> Did you experiment also with a simple loop here (using the usual end
> pointer as well instead of ii)? It should be a good balance between
> the speed with small and big anyof arrays.

Yes, I did and for was usually slower. IIRC was definitely slower with mono JIT.

> 
> > +		{
> > +			fixed (char* start = &start_char) {
> > +				char* s1_ptr = start + startIndex;
> > +
> > +				while (count >= 8) {
> > +					if (*s1_ptr == value)
> > +						return startIndex;
> > +					if (s1_ptr[1] == value)
> > +						return startIndex + 1;
> > +					if (s1_ptr[2] == value)
> > +						return startIndex + 2;
> > +					if (s1_ptr[3] == value)
> > +						return startIndex + 3;
> > +					if (s1_ptr[4] == value)
> > +						return startIndex + 4;
> > +					if (s1_ptr[5] == value)
> > +						return startIndex + 5;
> > +					if (s1_ptr[6] == value)
> > +						return startIndex + 6;
> > +					if (s1_ptr[7] == value)
> > +						return startIndex + 7;
> > +
> > +					s1_ptr += 8;
> > +					startIndex += 8;
> > +					count -= 8;
> > +				}
> 
> Use an end pointer here, too and you can drop one hot var.

I tried this but it is definitely slower on x86. If I replace

if (s1_ptr[1] == value)

with

if (*++s1_ptr == value)

the JIT code is slower (10-15%) because it has to to 
movzx eax,word ptr [edi] anyway plus it has to do 2 times
inc edi.
When I use indexer movzx is still there with extra constant
but no more inc or adds

> > +
> > +				if (count >= 2) {
> > +					if (*s1_ptr == value)
> > +						return startIndex;
> > +					if (s1_ptr[1] == value)
> > +						return startIndex + 1;
> > +
> > +					s1_ptr += 2;
> > +					startIndex += 2;
> > +					count -= 2;
> > +				}
> > +
> > +				return count != 0 && *s1_ptr == value ? startIndex : -1;
> 
> A simple loop for the last 3 chars would likely be better here.
> Also check if unrolling 8 times really has big perf differences:
> if we could unroll just four times it would be better for smaller
> strings.

Unrolling is always faster when I remove 8 times unrolling I lose
10-15%.

> 
> > +		/* This method is culture-insensitive */
> > +		unsafe int LastIndexOfImpl (char value, int startIndex, int count)
> > +		{
> 
> Same comments as for the test above, additionally, it might be better to
> avoid negative indexes.

Surprisingly I found it the fastest way.

> 
> > @@ -1100,7 +1270,14 @@
> >  		/* This method is culture insensitive */
> >  		public String Replace (char oldChar, char newChar)
> >  		{
> > -			return InternalReplace (oldChar, newChar);
> > +			if (this.length == 0 || oldChar == newChar)
> > +				return this;
> > +
> > +			int start_pos = IndexOfImpl (oldChar, 0, this.length);
> > +			if (start_pos == -1)
> > +				return this;
> > +			
> > +			return InternalReplace (oldChar, newChar, start_pos);
> 
> Instead of changing the icall, just implement all the method in managed
> code.

OK, I will change it. It is easier for me. I just thought we are trying to do an allocation
in unmanaged code primary.


Thank you for the review.

Marek



More information about the Mono-devel-list mailing list