String performance boost (Re: [Mono-dev] Patch to boost speed ofUnicodeEncoding)

Fri Mar 17 05:02:50 EST 2006

Hi,

Your patch is based on pretty old code(!) ;-) I tried it for
each functionality:

- CopyTo():
   The patch makes it faster. But I guess CopyChar() functionality
   overlaps Paolo's memcpy() implementation in the latest code,
   though it was not simply replaceable.
- ToCharArray():
   ditto. The improvement rate is like you showed, about 5%.
- Trim():
   The patch resulted much better, as you showed. I think it is
   the best improvement.
- GetHashCode(), Insert(), Remove(),
   ToUpperInvariant(), ToLowerInvariant():
   We already have managed implementation like yours.
- ToUpper(), ToLower():
   Based on your patch, I rather made TextInfo.ToUpper() and
   ToLower() based on pointer implementation, so the mind is already
   checked in svn :-)
- Replace():
   There was a one-liner bug in the implementation. Even after
   fixing it, the result became much worse than existing one, about
   1.5x slower :-(
- IndexOf():
   It didn't make improvement - I think it is because we already
   have managed implementation by now.
- IndexOfAny():
   The patch made results much worse :-( It doubled execution time.
- Join():
   The patch was a bit worse, about 7-8% loss.
- PadLeft(), PadRight():
   Almost no difference - and seems like it still has bugs (some
   NUnit tests fail).
- Split():
   It looks pretty fast but it is too buggy to fix right now :(

I'm attaching a patch based on the original source, so anyone can
try them handy. Those which is buggy or makes performance worse
are commented out.

I think Trim() can be checked in, with a few changes
(changing CharCopy() to InternalStrcpy()). For CharCopy(),
I'd wait for further review. ToCharArray() is also nicer
to have, but it depends on CharCopy().

Thanks a bunch, Andreas.

Atsushi Eno

Andreas Nahr wrote:
> It's just bad naming ;)
> The String.cs.old contains the managed implementations. Please note that 
> there are usually multiple implementation-tries for each method and all 
> but one is just commented out. Its extremely unlikely that the ones 
> currently "active" are the best ones.
> Also I can remember that at least one of the stats is completely wrong 
> because there was a bug in the implemenation that copied only half the 
> data.
> If looking at the numbers I guess it is Replace because the difference 
> is really huge
> 
> I'll also attach the file I used for the microbenchmarks. But it is a 
> complete mess and I doubt anyone could use anything from it.
> 
> Andreas
> 
>> Wow, the numbers are quite impressive. Can you please attach your
>> String.cs(.new) ? :-)
>>
>> Atsushi Eno
>>
>> Andreas Nahr wrote:
>>> Hi Zac, Hi Kornél,
>>>
>>> I've been working quite some time on improving the existing String 
>>> class a
>>> long time ago (about 2-3 years), but as I got a new job back then 
>>> never had
>>> the time to finish anything. My findings back then were that purely 
>>> managed
>>> implementations basically always outperformed the internalcalls (And 
>>> I guess
>>> the JIT is now even more evolved than it was 3 years ago).
>>> However as I sayed it was never finished and contains bugs. Moreover it
>>> doesn't care at all for alignment issues.
>>> If anyone want's to look at it - I attach my String-Testing class. 
>>> You'll
>>> find lots of different tries to optimize the Methods. But beware, the 
>>> code
>>> is in horrible shape, far from being usable.
>>> Some optimizations use specific string-domain knowlege (like "equals"
>>> testing first for the first char and after that from the end of the 
>>> string)
>>>
>>> My conclusion was: We should have a few managed functions to do the work
>>> (MemoryCopy, MemoryCompare, possibly for Byte* and Char*). They 
>>> should be
>>> managed so that optimizers and optimizing compilers are able to do
>>> optimizations even on IL level. Whenever possible the JIT should replace
>>> these at runtime (provided they aren't optimized away) with
>>> architecture-specific assembly versions with the managed version as 
>>> fallback
>>> mechanism.
>>>
>>>
>>> Here are some findings from microbenchmarks made back then (the first 
>>> number
>>> is always the time in milliseconds for the existing unmanaged 
>>> implementation
>>> (internalcall), the second for a tested managed implementation, the 
>>> Number
>>> in () is the length of the string tested):
>>>
>>> Some Methods still need internalcalls to create new Strings, but were 
>>> still
>>> faster than the native implementations (The optimum would be to 
>>> internalize
>>> InternalAllocateStr calls).
>>>
>>> CopyTo (000): 810 -> 381
>>> CopyTo (010): 832 -> 441
>>> CopyTo (100): 1352 -> 881
>>> CopyTo (512): 3395 -> 3014
>>>
>>> ToCharArray (000): 5067 -> 4466
>>> ToCharArray (002): 5317 -> 4857
>>> ToCharArray (015): 8041 -> 7691
>>> ToCharArray (960): 2915 -> 2894
>>>
>>> ToCharArray (with parameters): Similar to above
>>>
>>> Trim (): 6930 -> 6760
>>> Trim (custom search Chars): 10596 -> 9413
>>> Trim (default search Chars): 10455 -> 7210
>>> Trim (no trimmed chars, long string): 1893 -> 661
>>> Trim (no trimmed chars, short string): 1893 -> 631
>>>
>>> Replace (004 - one replace): 37264 -> 3135
>>> Replace (004 - nothing to replace): 3735 -> 310
>>> Replace (961 - everything replaced): 2584 -> 501
>>> Replace (961 - only last char replaced): 2463 -> 481
>>>
>>> Split (default split Chars, long string, lots splitted): 42421 -> 8523
>>> Split (custom split Chars, long string, non found): 2944 -> 2263
>>> Split (custom split Chars, long string, lots found): 22062 -> 7330
>>> Split (default split Chars, short string, non found): 2093 -> 761
>>> Split (default split Chars, short string, nearly only splitChars): 
>>> 8002 ->
>>> 5067
>>>
>>> IndexOf (17): 1132 -> 791
>>> IndexOf (2162): 10576 -> 7862
>>>
>>> LastIndexOf (similar to above)
>>>
>>> IndexOfAny (long string, nothing found): 25867 -> 2984 (Break even 
>>> bei ca.
>>> 75chars)
>>>
>>> LastIndexOfAny  (similar to above)
>>>
>>> PadLeft/ PadRight: slightly slower that current (Should get faster once
>>> optimized CharCopy is available): 1012 -> 1031
>>>
>>> Remove: slightly slower that current (Should get faster once optimized
>>> CharCopy is available): 2153 -> 2283
>>>
>>>
>>> If somebody is interesting in picking this up I might be able to help a
>>> litte.
>>> Andreas
>>
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list at lists.ximian.com
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: string-andreas-modified.patch
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20060317/21e3753c/attachment.pl