[Mono-dev] String class speed improvements

Paolo Molaro lupus at ximian.com
Wed Nov 22 10:36:17 EST 2006


On 11/21/06 Marek Safar wrote:
> Index: System/String.cs
> ===================================================================
> --- System/String.cs	(revision 68265)
> +++ System/String.cs	(working copy)
> @@ -692,6 +692,64 @@
>  			return InternalIndexOfAny (anyOf, startIndex, count);
>  		}
>  
> +		unsafe int InternalIndexOfAny (char[] anyOf, int startIndex, int count)
> +		{
> +			if (anyOf.Length == 0)
> +				return -1;
> +
> +			if (anyOf.Length == 1)
> +				return IndexOfImpl (anyOf [0], startIndex, count);
> +
> +			int anyOfLength = anyOf.Length;
> +			fixed (char* any_ptr = anyOf) {
> +				char highest = *any_ptr;
> +				char lowest = *any_ptr;
> +
> +				for (int i = 1; i != anyOfLength; ++i) {
> +					if (any_ptr[i] > highest) {

When I did similar optimizations I used a slightly different pattern.
Instead of using an index var (i in this case), use a pointer to the end
of the data and change the condition in the loop to p < endp, with p
being the pointer to the char that is increased at each iteration
instead of increasing i and then indexing.
Anyway, I think we should have a fast path here for small strings: if
the string is small and anyOf is large, we waste a lot of time.
This is way I always suggested, when optimizing string methods, to
create benchmarks that exercise many different situations, so that a
proper balance can be had. Your test bench should have different lengths
of strings, typically of 0, 3-4, 6-8, 10-20, 30-40, 100 and 1000 chars.
The anyof array could be of 0, 1, 3, 5, 10, 20 chars.

> +						highest = any_ptr[i];
> +						continue;
> +					}
> +
> +					if (any_ptr[i] < lowest)
> +						lowest = any_ptr[i];
> +				}
> +
> +				anyOfLength--;
> +
> +				fixed (char* start = &start_char) {
> +					char* s1_ptr = start + startIndex;
> +
> +					for (int i = startIndex; i < startIndex + count; ++i) {

Use a end pointer here, too: the index can be retrieved with a simple
sub + shift at the end: reducing the number of variables that needs to
go into a global register is almost always worth it, especially on x86.

> +						if (*s1_ptr > highest || *s1_ptr < lowest) {
> +							s1_ptr++;
> +							continue;
> +						}
> +
> +						// We have always at least 2 characters
> +						if (*s1_ptr == *any_ptr || *s1_ptr == any_ptr[1])
> +							return i;
> +
> +						int ii = anyOfLength;
> +						while (ii > 2) {
> +							if (*s1_ptr == any_ptr[ii] || *s1_ptr == any_ptr[ii - 1])
> +								return i;
> +
> +							ii -= 2;
> +						}
> +
> +						if (ii > 1 && *s1_ptr == any_ptr[2])
> +							return i;

Did you experiment also with a simple loop here (using the usual end
pointer as well instead of ii)? It should be a good balance between
the speed with small and big anyof arrays.

>  		public int IndexOf (String value)
> @@ -786,11 +847,65 @@
>  			if ((startIndex == 0 && this.length == 0) || (startIndex == this.length) || (count == 0))
>  				return -1;
>  
> -			for (int pos = startIndex; pos < startIndex + count; pos++) {
> -				if (this[pos] == value)
> -					return(pos);
> +			return IndexOfImpl (value, startIndex, count);
> +		}
> +
> +		unsafe int IndexOfImpl(char value, int startIndex, int count)
> +		{
> +			fixed (char* start = &start_char) {
> +				char* s1_ptr = start + startIndex;
> +
> +				while (count >= 8) {
> +					if (*s1_ptr == value)
> +						return startIndex;
> +					if (s1_ptr[1] == value)
> +						return startIndex + 1;
> +					if (s1_ptr[2] == value)
> +						return startIndex + 2;
> +					if (s1_ptr[3] == value)
> +						return startIndex + 3;
> +					if (s1_ptr[4] == value)
> +						return startIndex + 4;
> +					if (s1_ptr[5] == value)
> +						return startIndex + 5;
> +					if (s1_ptr[6] == value)
> +						return startIndex + 6;
> +					if (s1_ptr[7] == value)
> +						return startIndex + 7;
> +
> +					s1_ptr += 8;
> +					startIndex += 8;
> +					count -= 8;
> +				}

Use an end pointer here, too and you can drop one hot var.

> +
> +				if (count >= 4) {
> +					if (*s1_ptr == value)
> +						return startIndex;
> +					if (s1_ptr[1] == value)
> +						return startIndex + 1;
> +					if (s1_ptr[2] == value)
> +						return startIndex + 2;
> +					if (s1_ptr[3] == value)
> +						return startIndex + 3;
> +
> +					s1_ptr += 4;
> +					startIndex += 4;
> +					count -= 4;
> +				}
> +
> +				if (count >= 2) {
> +					if (*s1_ptr == value)
> +						return startIndex;
> +					if (s1_ptr[1] == value)
> +						return startIndex + 1;
> +
> +					s1_ptr += 2;
> +					startIndex += 2;
> +					count -= 2;
> +				}
> +
> +				return count != 0 && *s1_ptr == value ? startIndex : -1;

A simple loop for the last 3 chars would likely be better here.
Also check if unrolling 8 times really has big perf differences:
if we could unroll just four times it would be better for smaller
strings.

> +		/* This method is culture-insensitive */
> +		unsafe int LastIndexOfImpl (char value, int startIndex, int count)
> +		{

Same comments as for the test above, additionally, it might be better to
avoid negative indexes.

> @@ -1100,7 +1270,14 @@
>  		/* This method is culture insensitive */
>  		public String Replace (char oldChar, char newChar)
>  		{
> -			return InternalReplace (oldChar, newChar);
> +			if (this.length == 0 || oldChar == newChar)
> +				return this;
> +
> +			int start_pos = IndexOfImpl (oldChar, 0, this.length);
> +			if (start_pos == -1)
> +				return this;
> +			
> +			return InternalReplace (oldChar, newChar, start_pos);

Instead of changing the icall, just implement all the method in managed
code.
Thanks!

lupus

-- 
-----------------------------------------------------------------
lupus at debian.org                                     debian/rules
lupus at ximian.com                             Monkeys do it better



More information about the Mono-devel-list mailing list