String performance boost (Re: [Mono-dev] Patch to boost speed ofUnicodeEncoding)

Fri Mar 17 06:11:54 EST 2006

Wow!!!

I dispeared for a few days to work on this and came back to find I 
wasn't alone. :-P

I was testing some similar concepts myself. I've also been testing 
moving things back and forth to unamanged internal calls to see if I 
could speed it up and so far I've got mixed results. Some things 
faster; some things slower. Mostly slower for most of the functions, 
but it really depended on the size of the string I was working with 
which was the deciding factor if it was faster to be unmanged or not.

One thing I found is that NUnit itself can't be trusted 100% with 
testing some of this :-P It calls corelib use these same functions we 
are tweaking are slowing it down or speeding it up in some cases. It's 
fine because the test suite is a test in of itself in a way. Just 
something interesting I noticed when I was testing, when I managed to 
slow down strings so much in one test (lovely printf on each char) that 
it took NUnit about 30 seconds to initialize before my test even ran :-)

I'll post all my benchmarks and patches in the morning when I can 
assembly them in some readable format.

-- 
Zac Bowling
http://zacbowling.com/

----- Message from atsushi at ximian.com ---------
    Date: Fri, 17 Mar 2006 19:02:50 +0900
    From: Atsushi Eno <atsushi at ximian.com>
Reply-To: Atsushi Eno <atsushi at ximian.com>
Subject: Re: String performance boost (Re: [Mono-dev] Patch to boost 
speed ofUnicodeEncoding)
      To: Andreas Nahr <ClassDevelopment at A-SoftTech.com>

> Hi,
>
> Your patch is based on pretty old code(!) ;-) I tried it for
> each functionality:
>
> - CopyTo():
>   The patch makes it faster. But I guess CopyChar() functionality
>   overlaps Paolo's memcpy() implementation in the latest code,
>   though it was not simply replaceable.
> - ToCharArray():
>   ditto. The improvement rate is like you showed, about 5%.
> - Trim():
>   The patch resulted much better, as you showed. I think it is
>   the best improvement.
> - GetHashCode(), Insert(), Remove(),
>   ToUpperInvariant(), ToLowerInvariant():
>   We already have managed implementation like yours.
> - ToUpper(), ToLower():
>   Based on your patch, I rather made TextInfo.ToUpper() and
>   ToLower() based on pointer implementation, so the mind is already
>   checked in svn :-)
> - Replace():
>   There was a one-liner bug in the implementation. Even after
>   fixing it, the result became much worse than existing one, about
>   1.5x slower :-(
> - IndexOf():
>   It didn't make improvement - I think it is because we already
>   have managed implementation by now.
> - IndexOfAny():
>   The patch made results much worse :-( It doubled execution time.
> - Join():
>   The patch was a bit worse, about 7-8% loss.
> - PadLeft(), PadRight():
>   Almost no difference - and seems like it still has bugs (some
>   NUnit tests fail).
> - Split():
>   It looks pretty fast but it is too buggy to fix right now :(
>
> I'm attaching a patch based on the original source, so anyone can
> try them handy. Those which is buggy or makes performance worse
> are commented out.
>
> I think Trim() can be checked in, with a few changes
> (changing CharCopy() to InternalStrcpy()). For CharCopy(),
> I'd wait for further review. ToCharArray() is also nicer
> to have, but it depends on CharCopy().
>
> Thanks a bunch, Andreas.
>
> Atsushi Eno
>
> Andreas Nahr wrote:
>> It's just bad naming ;)
>> The String.cs.old contains the managed implementations. Please note 
>> that there are usually multiple implementation-tries for each method 
>> and all but one is just commented out. Its extremely unlikely that 
>> the ones currently "active" are the best ones.
>> Also I can remember that at least one of the stats is completely 
>> wrong because there was a bug in the implemenation that copied only 
>> half the data.
>> If looking at the numbers I guess it is Replace because the 
>> difference is really huge
>>
>> I'll also attach the file I used for the microbenchmarks. But it is 
>> a complete mess and I doubt anyone could use anything from it.
>>
>> Andreas
>>
>>> Wow, the numbers are quite impressive. Can you please attach your
>>> String.cs(.new) ? :-)
>>>
>>> Atsushi Eno
>>>
>>> Andreas Nahr wrote:
>>>> Hi Zac, Hi Kornél,
>>>>
>>>> I've been working quite some time on improving the existing String class a
>>>> long time ago (about 2-3 years), but as I got a new job back then 
>>>> never had
>>>> the time to finish anything. My findings back then were that 
>>>> purely managed
>>>> implementations basically always outperformed the internalcalls 
>>>> (And I guess
>>>> the JIT is now even more evolved than it was 3 years ago).
>>>> However as I sayed it was never finished and contains bugs. Moreover it
>>>> doesn't care at all for alignment issues.
>>>> If anyone want's to look at it - I attach my String-Testing class. You'll
>>>> find lots of different tries to optimize the Methods. But beware, the code
>>>> is in horrible shape, far from being usable.
>>>> Some optimizations use specific string-domain knowlege (like "equals"
>>>> testing first for the first char and after that from the end of 
>>>> the string)
>>>>
>>>> My conclusion was: We should have a few managed functions to do the work
>>>> (MemoryCopy, MemoryCompare, possibly for Byte* and Char*). They should be
>>>> managed so that optimizers and optimizing compilers are able to do
>>>> optimizations even on IL level. Whenever possible the JIT should replace
>>>> these at runtime (provided they aren't optimized away) with
>>>> architecture-specific assembly versions with the managed version 
>>>> as fallback
>>>> mechanism.
>>>>
>>>>
>>>> Here are some findings from microbenchmarks made back then (the 
>>>> first number
>>>> is always the time in milliseconds for the existing unmanaged 
>>>> implementation
>>>> (internalcall), the second for a tested managed implementation, the Number
>>>> in () is the length of the string tested):
>>>>
>>>> Some Methods still need internalcalls to create new Strings, but 
>>>> were still
>>>> faster than the native implementations (The optimum would be to 
>>>> internalize
>>>> InternalAllocateStr calls).
>>>>
>>>> CopyTo (000): 810 -> 381
>>>> CopyTo (010): 832 -> 441
>>>> CopyTo (100): 1352 -> 881
>>>> CopyTo (512): 3395 -> 3014
>>>>
>>>> ToCharArray (000): 5067 -> 4466
>>>> ToCharArray (002): 5317 -> 4857
>>>> ToCharArray (015): 8041 -> 7691
>>>> ToCharArray (960): 2915 -> 2894
>>>>
>>>> ToCharArray (with parameters): Similar to above
>>>>
>>>> Trim (): 6930 -> 6760
>>>> Trim (custom search Chars): 10596 -> 9413
>>>> Trim (default search Chars): 10455 -> 7210
>>>> Trim (no trimmed chars, long string): 1893 -> 661
>>>> Trim (no trimmed chars, short string): 1893 -> 631
>>>>
>>>> Replace (004 - one replace): 37264 -> 3135
>>>> Replace (004 - nothing to replace): 3735 -> 310
>>>> Replace (961 - everything replaced): 2584 -> 501
>>>> Replace (961 - only last char replaced): 2463 -> 481
>>>>
>>>> Split (default split Chars, long string, lots splitted): 42421 -> 8523
>>>> Split (custom split Chars, long string, non found): 2944 -> 2263
>>>> Split (custom split Chars, long string, lots found): 22062 -> 7330
>>>> Split (default split Chars, short string, non found): 2093 -> 761
>>>> Split (default split Chars, short string, nearly only splitChars): 8002 ->
>>>> 5067
>>>>
>>>> IndexOf (17): 1132 -> 791
>>>> IndexOf (2162): 10576 -> 7862
>>>>
>>>> LastIndexOf (similar to above)
>>>>
>>>> IndexOfAny (long string, nothing found): 25867 -> 2984 (Break even bei ca.
>>>> 75chars)
>>>>
>>>> LastIndexOfAny  (similar to above)
>>>>
>>>> PadLeft/ PadRight: slightly slower that current (Should get faster once
>>>> optimized CharCopy is available): 1012 -> 1031
>>>>
>>>> Remove: slightly slower that current (Should get faster once optimized
>>>> CharCopy is available): 2153 -> 2283
>>>>
>>>>
>>>> If somebody is interesting in picking this up I might be able to help a
>>>> litte.
>>>> Andreas
>>>
>>> _______________________________________________
>>> Mono-devel-list mailing list
>>> Mono-devel-list at lists.ximian.com
>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>>
>

----- End message from atsushi at ximian.com -----