[Mono-devel-list] [Patch] Manged code is fast!
Andreas Nahr
ClassDevelopment at A-SoftTech.com
Fri May 21 17:42:07 EDT 2004
Hi,
I looked at your code and retested some stuff. Seems as with your changes
the [] notation is now really faster (at least for most cases).
I made the neccessary changes to my patch.
Here are some microbenchmark results updated with the new figures (the last
ones):
CopyTo (002): 7190 -> 3595 -> 3645
CopyTo (015): 7611 -> 4446 -> 4236
CopyTo (016): 8982 -> 4486 -> 4186
CopyTo (512): 3174 -> 2995 -> 2784
Also this means that it is now *ALWAYS* faster than managed (tested up to
512KB Strings).
I also found that there seemed to be a bug (in the old original
implementation) with Strings of Size 4MB and bigger.
I've also looked at your hashcode benchmark and added a version which
resembles the patch that I had ready for this. First line is your Hash2
impl, second is my Hash3 impl ;)
Windows precompiled, X86 Athlon 2000XP:
C:\Uni\NBen>mono hash-code-bench.exe
0 21 20
1 30 20
2 30 30
3 40 20
4 50 30
5 40 30
10 60 50
15 80 51
20 110 70
25 110 90
30 130 100
35 161 110
40 180 140
45 191 140
50 220 170
61 251 200
72 300 221
83 340 251
94 370 321
105 420 311
116 460 341
127 500 361
138 541 400
149 621 421
160 621 470
181 691 511
202 781 571
223 871 631
244 922 701
265 1001 741
286 1072 821
307 1161 872
328 1241 912
349 1342 981
370 1382 1032
391 1462 1111
412 1543 1141
433 1592 1242
454 1693 1292
475 1762 1312
496 1833 1402
1000 3675 2784
10000 35651 27340
Andreas
----- Original Message -----
From: "Paolo Molaro" <lupus at ximian.com>
To: "Mono Development" <mono-devel-list at lists.ximian.com>
Sent: Friday, May 21, 2004 2:29 PM
Subject: Re: [Mono-devel-list] [Patch] Manged code is fast!
> On 05/21/04 Andreas Nahr wrote:
> > > > private unsafe static void CharCopy (char* source, char*
destination,
> > int count)
> > >
> > > What is the perf here if things are not dword aligned?
> >
> > I think for me thing always were dword aligned. We should ensure that
> > Strings always get the right alignment in the JIT.
>
> We can guarantee the character data in a string will be aligned to a 4
byte
> boundary, but CharCopy can called on data aligned to just 2.
>
> > > > + while (count >= 16) {
> > > > + *((int*) destination) = *((int*) source);
> > > > + destination += 2;
> > > > + source += 2;
> > > > + *((int*) destination) = *((int*) source);
> > > > + destination += 2;
> > > > + source += 2;
> > >
> > > It is probably better to do something like:
> > >
> > > *((int*) dest + x) = ...
> >
> > Did you really test this or are you just guessing?
>
> What? It's much easier to talk than to test! Why should he test? :-)
>
> > For me the above solution (although more source code) always produced
> > superior speed.
> > However I used the notation *((int*) dest[x]) =...
> > But this seems to be compiled into same IL.
>
> When you posted about the low performance and I changed the JIT to
> produce faster code I also investigated a few methods in String and
> methods to do copies. The basic thing to note is to keep the variables
> used in the inner loop to 3 and to do clever unrolling. When unrolling
> in a copy, for example you should not do:
> copy 1
> increase pointers by 1
> copy 1
> increase pointers by 1
> ...
>
> but the more efficient:
> copy 1
> copy 1
> copy 1
> copy 1
> increase pointers by 4
>
> See the attached benchmarks for ideas: GetHashCode() is always faster
> than the C version (on x86, on ppc it's faster until 200 chars and 20%
> slower at 1000, but I didn't optimize that yet). It's twice as fast
> as the current code so I'll get it in cvs in the next few days.
> As for copies: I'd like to have something like the attached memcpy in
> System.String and use it whenever a copy is required (it will eventually
> be used also for the cpblk IL opcode). The memcpy is always faster than
> the C version for me (except when the data is misaligned): I didn't have
> the time to properly test if this is because of bugs in the code:-)
> If someone would write a set of extensive tests for memcpy it'll be
> appreciated.
> Results from both benchmarks on different cpus are also appreciated:
> please provide cpu type and speed and run with -O=all with mono from
> cvs (-O=loop is enough to get most of the speed: I'll enable it by
> default shortly since it has low impact on JIT time).
> A memmove method is also needed for some of the string methods.
> Thanks.
>
> lupus
>
> --
> -----------------------------------------------------------------
> lupus at debian.org debian/rules
> lupus at ximian.com Monkeys do it better
>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: PatchStringTo.txt
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20040521/3a5a1257/attachment.txt
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: hash-code-bench.cs
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20040521/3a5a1257/attachment.pl
More information about the Mono-devel-list
mailing list