[Mono-dev] String class speed improvements
Paolo Molaro
lupus at ximian.com
Wed Nov 22 10:36:17 EST 2006
On 11/21/06 Marek Safar wrote:
> Index: System/String.cs
> ===================================================================
> --- System/String.cs (revision 68265)
> +++ System/String.cs (working copy)
> @@ -692,6 +692,64 @@
> return InternalIndexOfAny (anyOf, startIndex, count);
> }
>
> + unsafe int InternalIndexOfAny (char[] anyOf, int startIndex, int count)
> + {
> + if (anyOf.Length == 0)
> + return -1;
> +
> + if (anyOf.Length == 1)
> + return IndexOfImpl (anyOf [0], startIndex, count);
> +
> + int anyOfLength = anyOf.Length;
> + fixed (char* any_ptr = anyOf) {
> + char highest = *any_ptr;
> + char lowest = *any_ptr;
> +
> + for (int i = 1; i != anyOfLength; ++i) {
> + if (any_ptr[i] > highest) {
When I did similar optimizations I used a slightly different pattern.
Instead of using an index var (i in this case), use a pointer to the end
of the data and change the condition in the loop to p < endp, with p
being the pointer to the char that is increased at each iteration
instead of increasing i and then indexing.
Anyway, I think we should have a fast path here for small strings: if
the string is small and anyOf is large, we waste a lot of time.
This is way I always suggested, when optimizing string methods, to
create benchmarks that exercise many different situations, so that a
proper balance can be had. Your test bench should have different lengths
of strings, typically of 0, 3-4, 6-8, 10-20, 30-40, 100 and 1000 chars.
The anyof array could be of 0, 1, 3, 5, 10, 20 chars.
> + highest = any_ptr[i];
> + continue;
> + }
> +
> + if (any_ptr[i] < lowest)
> + lowest = any_ptr[i];
> + }
> +
> + anyOfLength--;
> +
> + fixed (char* start = &start_char) {
> + char* s1_ptr = start + startIndex;
> +
> + for (int i = startIndex; i < startIndex + count; ++i) {
Use a end pointer here, too: the index can be retrieved with a simple
sub + shift at the end: reducing the number of variables that needs to
go into a global register is almost always worth it, especially on x86.
> + if (*s1_ptr > highest || *s1_ptr < lowest) {
> + s1_ptr++;
> + continue;
> + }
> +
> + // We have always at least 2 characters
> + if (*s1_ptr == *any_ptr || *s1_ptr == any_ptr[1])
> + return i;
> +
> + int ii = anyOfLength;
> + while (ii > 2) {
> + if (*s1_ptr == any_ptr[ii] || *s1_ptr == any_ptr[ii - 1])
> + return i;
> +
> + ii -= 2;
> + }
> +
> + if (ii > 1 && *s1_ptr == any_ptr[2])
> + return i;
Did you experiment also with a simple loop here (using the usual end
pointer as well instead of ii)? It should be a good balance between
the speed with small and big anyof arrays.
> public int IndexOf (String value)
> @@ -786,11 +847,65 @@
> if ((startIndex == 0 && this.length == 0) || (startIndex == this.length) || (count == 0))
> return -1;
>
> - for (int pos = startIndex; pos < startIndex + count; pos++) {
> - if (this[pos] == value)
> - return(pos);
> + return IndexOfImpl (value, startIndex, count);
> + }
> +
> + unsafe int IndexOfImpl(char value, int startIndex, int count)
> + {
> + fixed (char* start = &start_char) {
> + char* s1_ptr = start + startIndex;
> +
> + while (count >= 8) {
> + if (*s1_ptr == value)
> + return startIndex;
> + if (s1_ptr[1] == value)
> + return startIndex + 1;
> + if (s1_ptr[2] == value)
> + return startIndex + 2;
> + if (s1_ptr[3] == value)
> + return startIndex + 3;
> + if (s1_ptr[4] == value)
> + return startIndex + 4;
> + if (s1_ptr[5] == value)
> + return startIndex + 5;
> + if (s1_ptr[6] == value)
> + return startIndex + 6;
> + if (s1_ptr[7] == value)
> + return startIndex + 7;
> +
> + s1_ptr += 8;
> + startIndex += 8;
> + count -= 8;
> + }
Use an end pointer here, too and you can drop one hot var.
> +
> + if (count >= 4) {
> + if (*s1_ptr == value)
> + return startIndex;
> + if (s1_ptr[1] == value)
> + return startIndex + 1;
> + if (s1_ptr[2] == value)
> + return startIndex + 2;
> + if (s1_ptr[3] == value)
> + return startIndex + 3;
> +
> + s1_ptr += 4;
> + startIndex += 4;
> + count -= 4;
> + }
> +
> + if (count >= 2) {
> + if (*s1_ptr == value)
> + return startIndex;
> + if (s1_ptr[1] == value)
> + return startIndex + 1;
> +
> + s1_ptr += 2;
> + startIndex += 2;
> + count -= 2;
> + }
> +
> + return count != 0 && *s1_ptr == value ? startIndex : -1;
A simple loop for the last 3 chars would likely be better here.
Also check if unrolling 8 times really has big perf differences:
if we could unroll just four times it would be better for smaller
strings.
> + /* This method is culture-insensitive */
> + unsafe int LastIndexOfImpl (char value, int startIndex, int count)
> + {
Same comments as for the test above, additionally, it might be better to
avoid negative indexes.
> @@ -1100,7 +1270,14 @@
> /* This method is culture insensitive */
> public String Replace (char oldChar, char newChar)
> {
> - return InternalReplace (oldChar, newChar);
> + if (this.length == 0 || oldChar == newChar)
> + return this;
> +
> + int start_pos = IndexOfImpl (oldChar, 0, this.length);
> + if (start_pos == -1)
> + return this;
> +
> + return InternalReplace (oldChar, newChar, start_pos);
Instead of changing the icall, just implement all the method in managed
code.
Thanks!
lupus
--
-----------------------------------------------------------------
lupus at debian.org debian/rules
lupus at ximian.com Monkeys do it better
More information about the Mono-devel-list
mailing list