String performance boost (Re: [Mono-dev] Patch to boost speed ofUnicodeEncoding)
zac at zacbowling.com
Fri Mar 17 06:11:54 EST 2006
I dispeared for a few days to work on this and came back to find I
wasn't alone. :-P
I was testing some similar concepts myself. I've also been testing
moving things back and forth to unamanged internal calls to see if I
could speed it up and so far I've got mixed results. Some things
faster; some things slower. Mostly slower for most of the functions,
but it really depended on the size of the string I was working with
which was the deciding factor if it was faster to be unmanged or not.
One thing I found is that NUnit itself can't be trusted 100% with
testing some of this :-P It calls corelib use these same functions we
are tweaking are slowing it down or speeding it up in some cases. It's
fine because the test suite is a test in of itself in a way. Just
something interesting I noticed when I was testing, when I managed to
slow down strings so much in one test (lovely printf on each char) that
it took NUnit about 30 seconds to initialize before my test even ran :-)
I'll post all my benchmarks and patches in the morning when I can
assembly them in some readable format.
----- Message from atsushi at ximian.com ---------
Date: Fri, 17 Mar 2006 19:02:50 +0900
From: Atsushi Eno <atsushi at ximian.com>
Reply-To: Atsushi Eno <atsushi at ximian.com>
Subject: Re: String performance boost (Re: [Mono-dev] Patch to boost
To: Andreas Nahr <ClassDevelopment at A-SoftTech.com>
> Your patch is based on pretty old code(!) ;-) I tried it for
> each functionality:
> - CopyTo():
> The patch makes it faster. But I guess CopyChar() functionality
> overlaps Paolo's memcpy() implementation in the latest code,
> though it was not simply replaceable.
> - ToCharArray():
> ditto. The improvement rate is like you showed, about 5%.
> - Trim():
> The patch resulted much better, as you showed. I think it is
> the best improvement.
> - GetHashCode(), Insert(), Remove(),
> ToUpperInvariant(), ToLowerInvariant():
> We already have managed implementation like yours.
> - ToUpper(), ToLower():
> Based on your patch, I rather made TextInfo.ToUpper() and
> ToLower() based on pointer implementation, so the mind is already
> checked in svn :-)
> - Replace():
> There was a one-liner bug in the implementation. Even after
> fixing it, the result became much worse than existing one, about
> 1.5x slower :-(
> - IndexOf():
> It didn't make improvement - I think it is because we already
> have managed implementation by now.
> - IndexOfAny():
> The patch made results much worse :-( It doubled execution time.
> - Join():
> The patch was a bit worse, about 7-8% loss.
> - PadLeft(), PadRight():
> Almost no difference - and seems like it still has bugs (some
> NUnit tests fail).
> - Split():
> It looks pretty fast but it is too buggy to fix right now :(
> I'm attaching a patch based on the original source, so anyone can
> try them handy. Those which is buggy or makes performance worse
> are commented out.
> I think Trim() can be checked in, with a few changes
> (changing CharCopy() to InternalStrcpy()). For CharCopy(),
> I'd wait for further review. ToCharArray() is also nicer
> to have, but it depends on CharCopy().
> Thanks a bunch, Andreas.
> Atsushi Eno
> Andreas Nahr wrote:
>> It's just bad naming ;)
>> The String.cs.old contains the managed implementations. Please note
>> that there are usually multiple implementation-tries for each method
>> and all but one is just commented out. Its extremely unlikely that
>> the ones currently "active" are the best ones.
>> Also I can remember that at least one of the stats is completely
>> wrong because there was a bug in the implemenation that copied only
>> half the data.
>> If looking at the numbers I guess it is Replace because the
>> difference is really huge
>> I'll also attach the file I used for the microbenchmarks. But it is
>> a complete mess and I doubt anyone could use anything from it.
>>> Wow, the numbers are quite impressive. Can you please attach your
>>> String.cs(.new) ? :-)
>>> Atsushi Eno
>>> Andreas Nahr wrote:
>>>> Hi Zac, Hi Kornél,
>>>> I've been working quite some time on improving the existing String class a
>>>> long time ago (about 2-3 years), but as I got a new job back then
>>>> never had
>>>> the time to finish anything. My findings back then were that
>>>> purely managed
>>>> implementations basically always outperformed the internalcalls
>>>> (And I guess
>>>> the JIT is now even more evolved than it was 3 years ago).
>>>> However as I sayed it was never finished and contains bugs. Moreover it
>>>> doesn't care at all for alignment issues.
>>>> If anyone want's to look at it - I attach my String-Testing class. You'll
>>>> find lots of different tries to optimize the Methods. But beware, the code
>>>> is in horrible shape, far from being usable.
>>>> Some optimizations use specific string-domain knowlege (like "equals"
>>>> testing first for the first char and after that from the end of
>>>> the string)
>>>> My conclusion was: We should have a few managed functions to do the work
>>>> (MemoryCopy, MemoryCompare, possibly for Byte* and Char*). They should be
>>>> managed so that optimizers and optimizing compilers are able to do
>>>> optimizations even on IL level. Whenever possible the JIT should replace
>>>> these at runtime (provided they aren't optimized away) with
>>>> architecture-specific assembly versions with the managed version
>>>> as fallback
>>>> Here are some findings from microbenchmarks made back then (the
>>>> first number
>>>> is always the time in milliseconds for the existing unmanaged
>>>> (internalcall), the second for a tested managed implementation, the Number
>>>> in () is the length of the string tested):
>>>> Some Methods still need internalcalls to create new Strings, but
>>>> were still
>>>> faster than the native implementations (The optimum would be to
>>>> InternalAllocateStr calls).
>>>> CopyTo (000): 810 -> 381
>>>> CopyTo (010): 832 -> 441
>>>> CopyTo (100): 1352 -> 881
>>>> CopyTo (512): 3395 -> 3014
>>>> ToCharArray (000): 5067 -> 4466
>>>> ToCharArray (002): 5317 -> 4857
>>>> ToCharArray (015): 8041 -> 7691
>>>> ToCharArray (960): 2915 -> 2894
>>>> ToCharArray (with parameters): Similar to above
>>>> Trim (): 6930 -> 6760
>>>> Trim (custom search Chars): 10596 -> 9413
>>>> Trim (default search Chars): 10455 -> 7210
>>>> Trim (no trimmed chars, long string): 1893 -> 661
>>>> Trim (no trimmed chars, short string): 1893 -> 631
>>>> Replace (004 - one replace): 37264 -> 3135
>>>> Replace (004 - nothing to replace): 3735 -> 310
>>>> Replace (961 - everything replaced): 2584 -> 501
>>>> Replace (961 - only last char replaced): 2463 -> 481
>>>> Split (default split Chars, long string, lots splitted): 42421 -> 8523
>>>> Split (custom split Chars, long string, non found): 2944 -> 2263
>>>> Split (custom split Chars, long string, lots found): 22062 -> 7330
>>>> Split (default split Chars, short string, non found): 2093 -> 761
>>>> Split (default split Chars, short string, nearly only splitChars): 8002 ->
>>>> IndexOf (17): 1132 -> 791
>>>> IndexOf (2162): 10576 -> 7862
>>>> LastIndexOf (similar to above)
>>>> IndexOfAny (long string, nothing found): 25867 -> 2984 (Break even bei ca.
>>>> LastIndexOfAny (similar to above)
>>>> PadLeft/ PadRight: slightly slower that current (Should get faster once
>>>> optimized CharCopy is available): 1012 -> 1031
>>>> Remove: slightly slower that current (Should get faster once optimized
>>>> CharCopy is available): 2153 -> 2283
>>>> If somebody is interesting in picking this up I might be able to help a
>>> Mono-devel-list mailing list
>>> Mono-devel-list at lists.ximian.com
----- End message from atsushi at ximian.com -----
More information about the Mono-devel-list