[Mono-devel-list] [Patch] Manged code is fast!

Fri May 21 17:42:07 EDT 2004

Hi,

I looked at your code and retested some stuff. Seems as with your changes
the [] notation is now really faster (at least for most cases).
I made the neccessary changes to my patch.

Here are some microbenchmark results updated with the new figures (the last
ones):
CopyTo (002): 7190 -> 3595 -> 3645
CopyTo (015): 7611 -> 4446 -> 4236
CopyTo (016): 8982 -> 4486 -> 4186
CopyTo (512): 3174 -> 2995 -> 2784

Also this means that it is now *ALWAYS* faster than managed (tested up to
512KB Strings).
I also found that there seemed to be a bug (in the old original
implementation) with Strings of Size 4MB and bigger.

I've also looked at your hashcode benchmark and added a version which
resembles the patch that I had ready for this. First line is your Hash2
impl, second is my Hash3 impl ;)
Windows precompiled, X86 Athlon 2000XP:
C:\Uni\NBen>mono hash-code-bench.exe
0       21      20
1       30      20
2       30      30
3       40      20
4       50      30
5       40      30
10      60      50
15      80      51
20      110     70
25      110     90
30      130     100
35      161     110
40      180     140
45      191     140
50      220     170
61      251     200
72      300     221
83      340     251
94      370     321
105     420     311
116     460     341
127     500     361
138     541     400
149     621     421
160     621     470
181     691     511
202     781     571
223     871     631
244     922     701
265     1001    741
286     1072    821
307     1161    872
328     1241    912
349     1342    981
370     1382    1032
391     1462    1111
412     1543    1141
433     1592    1242
454     1693    1292
475     1762    1312
496     1833    1402
1000    3675    2784
10000   35651   27340

Andreas

----- Original Message ----- 
From: "Paolo Molaro" <lupus at ximian.com>
To: "Mono Development" <mono-devel-list at lists.ximian.com>
Sent: Friday, May 21, 2004 2:29 PM
Subject: Re: [Mono-devel-list] [Patch] Manged code is fast!

> On 05/21/04 Andreas Nahr wrote:
> > > > private unsafe static void CharCopy (char* source, char*
destination,
> > int count)
> > >
> > > What is the perf here if things are not dword aligned?
> >
> > I think for me thing always were dword aligned. We should ensure that
> > Strings always get the right alignment in the JIT.
>
> We can guarantee the character data in a string will be aligned to a 4
byte
> boundary, but CharCopy can called on data aligned to just 2.
>
> > > > + while (count >= 16) {
> > > > + *((int*) destination) = *((int*) source);
> > > > + destination += 2;
> > > > + source += 2;
> > > > + *((int*) destination) = *((int*) source);
> > > > + destination += 2;
> > > > + source += 2;
> > >
> > > It is probably better to do something like:
> > >
> > > *((int*) dest + x) = ...
> >
> > Did you really test this or are you just guessing?
>
> What? It's much easier to talk than to test! Why should he test? :-)
>
> > For me the above solution (although more source code) always produced
> > superior speed.
> > However I used the notation *((int*) dest[x]) =...
> > But this seems to be compiled into same IL.
>
> When you posted about the low performance and I changed the JIT to
> produce faster code I also investigated a few methods in String and
> methods to do copies. The basic thing to note is to keep the variables
> used in the inner loop to 3 and to do clever unrolling. When unrolling
> in a copy, for example you should not do:
> copy 1
> increase pointers by 1
> copy 1
> increase pointers by 1
> ...
>
> but the more efficient:
> copy 1
> copy 1
> copy 1
> copy 1
> increase pointers by 4
>
> See the attached benchmarks for ideas: GetHashCode() is always faster
> than the C version (on x86, on ppc it's faster until 200 chars and 20%
> slower at 1000, but I didn't optimize that yet). It's twice as fast
> as the current code so I'll get it in cvs in the next few days.
> As for copies: I'd like to have something like the attached memcpy in
> System.String and use it whenever a copy is required (it will eventually
> be used also for the cpblk IL opcode). The memcpy is always faster than
> the C version for me (except when the data is misaligned): I didn't have
> the time to properly test if this is because of bugs in the code:-)
> If someone would write a set of extensive tests for memcpy it'll be
> appreciated.
> Results from both benchmarks on different cpus are also appreciated:
> please provide cpu type and speed and run with -O=all with mono from
> cvs (-O=loop is enough to get most of the speed: I'll enable it by
> default shortly since it has low impact on JIT time).
> A memmove method is also needed for some of the string methods.
> Thanks.
>
> lupus
>
> -- 
> -----------------------------------------------------------------
> lupus at debian.org                                     debian/rules
> lupus at ximian.com                             Monkeys do it better
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: PatchStringTo.txt
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20040521/3a5a1257/attachment.txt 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: hash-code-bench.cs
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20040521/3a5a1257/attachment.pl