[Mono-dev] String class speed improvements
marek safar
marek.safar at seznam.cz
Wed Nov 22 18:10:20 EST 2006
Hello Paulo,
I added some immediate comments.
> > + unsafe int InternalIndexOfAny (char[] anyOf, int startIndex, int count)
> > + {
> > + if (anyOf.Length == 0)
> > + return -1;
> > +
> > + if (anyOf.Length == 1)
> > + return IndexOfImpl (anyOf [0], startIndex, count);
> > +
> > + int anyOfLength = anyOf.Length;
> > + fixed (char* any_ptr = anyOf) {
> > + char highest = *any_ptr;
> > + char lowest = *any_ptr;
> > +
> > + for (int i = 1; i != anyOfLength; ++i) {
> > + if (any_ptr[i] > highest) {
>
> When I did similar optimizations I used a slightly different pattern.
> Instead of using an index var (i in this case), use a pointer to the end
> of the data and change the condition in the loop to p < endp, with p
> being the pointer to the char that is increased at each iteration
> instead of increasing i and then indexing.
> Anyway, I think we should have a fast path here for small strings: if
> the string is small and anyOf is large, we waste a lot of time.
> This is way I always suggested, when optimizing string methods, to
> create benchmarks that exercise many different situations, so that a
> proper balance can be had. Your test bench should have different lengths
> of strings, typically of 0, 3-4, 6-8, 10-20, 30-40, 100 and 1000 chars.
> The anyof array could be of 0, 1, 3, 5, 10, 20 chars.
I had this on the mind I will try to add some fast paths.
>
> > + if (*s1_ptr > highest || *s1_ptr < lowest) {
> > + s1_ptr++;
> > + continue;
> > + }
> > +
> > + // We have always at least 2 characters
> > + if (*s1_ptr == *any_ptr || *s1_ptr == any_ptr[1])
> > + return i;
> > +
> > + int ii = anyOfLength;
> > + while (ii > 2) {
> > + if (*s1_ptr == any_ptr[ii] || *s1_ptr == any_ptr[ii - 1])
> > + return i;
> > +
> > + ii -= 2;
> > + }
> > +
> > + if (ii > 1 && *s1_ptr == any_ptr[2])
> > + return i;
>
> Did you experiment also with a simple loop here (using the usual end
> pointer as well instead of ii)? It should be a good balance between
> the speed with small and big anyof arrays.
Yes, I did and for was usually slower. IIRC was definitely slower with mono JIT.
>
> > + {
> > + fixed (char* start = &start_char) {
> > + char* s1_ptr = start + startIndex;
> > +
> > + while (count >= 8) {
> > + if (*s1_ptr == value)
> > + return startIndex;
> > + if (s1_ptr[1] == value)
> > + return startIndex + 1;
> > + if (s1_ptr[2] == value)
> > + return startIndex + 2;
> > + if (s1_ptr[3] == value)
> > + return startIndex + 3;
> > + if (s1_ptr[4] == value)
> > + return startIndex + 4;
> > + if (s1_ptr[5] == value)
> > + return startIndex + 5;
> > + if (s1_ptr[6] == value)
> > + return startIndex + 6;
> > + if (s1_ptr[7] == value)
> > + return startIndex + 7;
> > +
> > + s1_ptr += 8;
> > + startIndex += 8;
> > + count -= 8;
> > + }
>
> Use an end pointer here, too and you can drop one hot var.
I tried this but it is definitely slower on x86. If I replace
if (s1_ptr[1] == value)
with
if (*++s1_ptr == value)
the JIT code is slower (10-15%) because it has to to
movzx eax,word ptr [edi] anyway plus it has to do 2 times
inc edi.
When I use indexer movzx is still there with extra constant
but no more inc or adds
> > +
> > + if (count >= 2) {
> > + if (*s1_ptr == value)
> > + return startIndex;
> > + if (s1_ptr[1] == value)
> > + return startIndex + 1;
> > +
> > + s1_ptr += 2;
> > + startIndex += 2;
> > + count -= 2;
> > + }
> > +
> > + return count != 0 && *s1_ptr == value ? startIndex : -1;
>
> A simple loop for the last 3 chars would likely be better here.
> Also check if unrolling 8 times really has big perf differences:
> if we could unroll just four times it would be better for smaller
> strings.
Unrolling is always faster when I remove 8 times unrolling I lose
10-15%.
>
> > + /* This method is culture-insensitive */
> > + unsafe int LastIndexOfImpl (char value, int startIndex, int count)
> > + {
>
> Same comments as for the test above, additionally, it might be better to
> avoid negative indexes.
Surprisingly I found it the fastest way.
>
> > @@ -1100,7 +1270,14 @@
> > /* This method is culture insensitive */
> > public String Replace (char oldChar, char newChar)
> > {
> > - return InternalReplace (oldChar, newChar);
> > + if (this.length == 0 || oldChar == newChar)
> > + return this;
> > +
> > + int start_pos = IndexOfImpl (oldChar, 0, this.length);
> > + if (start_pos == -1)
> > + return this;
> > +
> > + return InternalReplace (oldChar, newChar, start_pos);
>
> Instead of changing the icall, just implement all the method in managed
> code.
OK, I will change it. It is easier for me. I just thought we are trying to do an allocation
in unmanaged code primary.
Thank you for the review.
Marek
More information about the Mono-devel-list
mailing list