[Mono-list] Performance / array access
Tom Fransen
t.fransen@mailned.nl
Sun, 19 Jan 2003 17:02:19 +0100
Hello all,
Mitch, to come back on the issue of bound checking. I looked further into
the
problem and my first conclusion was wrong. Piers already stated that he did
not expect that the MS compiler would be able to optimize all these check
out of the loop.
I again started looking into the assembly code and although I'm not a
assembly programmer (it has been too long since I done some 6502/Z80 :-)
I think the performance difference (>2) comes from the optimization
performered by the MS JIT compiler (I am using version 1.0 of the SDK).
The MS JIT seems to be better at allocating processor registers to certain
variables resulting in much faster code. The execution speed of the Mono
code
is in the same order as the unoptimized code generated by the MS JIT
compiler
(for tight loops accessing arrays). In case of my bubble sorting
algorithm this results in a speed difference of a factor 3. I have
made other tests like applying matrix multiplications on a 2-dimensional
array (to simulate an image processing operation) and also here Mono
lacks behind. I haven't sorted this out but
my first guess is that its again the register optimization that makes the
difference.
For the code snippets below both the MS and Mono code is given:
private void TestMethod(int size)
{
array = new int[size];
for(int i=0; i< size; i++)
{
array[i] = size - i;
}
Console.WriteLine("Done" );
string s = Console.ReadLine();
}
The MS JIT generates the following code. Note that
I first run the program and latter when the program reads
from the console attach the debugger. If I run the program from
the MS development environment the JIT seems to generate unoptimized
code so I need this trick.
MS Code generated by the JIT unoptimized
=========================================
private void TestMethod(int size)
{
array = new int[size];
00000000 push ebp
00000001 mov ebp,esp
00000003 sub esp,14h
00000006 push edi
00000007 push esi
00000008 push ebx
00000009 mov ebx,ecx
0000000b mov esi,edx
0000000d xor edi,edi
0000000f mov dword ptr [ebp-10h],0
00000016 test esi,80000000h
0000001c je 00000025
0000001e xor ecx,ecx
00000020 call 763A2CFF
00000025 mov edx,esi
00000027 mov ecx,2DE6B3Ah
0000002c call FD4A2040
00000031 lea edx,[ebx+4]
00000034 call 76479768
for(int i=0; i< size; i++)
00000039 xor edi,edi
0000003b nop
0000003c jmp 00000056
array[i] = size - i;
0000003e mov eax,dword ptr [ebx+4] <==== Start loop, fetch the
size of the array
00000041 cmp edi,dword ptr [eax+4] <==== Check if within bounds
00000044 jb 0000004D
00000046 xor ecx,ecx
00000048 call 762EED87 <==== Trigger exception
0000004d mov edx,esi <==== esi contain variable size
0000004f sub edx,edi
00000051 mov dword ptr [eax+edi*4+8],edx
for(int i=0; i< size; i++)
00000055 inc edi <==== edi contain variable i
00000056 cmp edi,esi
00000058 jl 0000003E <==== Jump to start of for
loop
Console.WriteLine("Done" );
0000005a mov ecx,dword ptr ds:[01C40178h]
00000060 call 76AA3250
string s = Console.ReadLine();
00000065 call 76AA2F10
0000006a mov dword ptr [ebp-14h],eax
0000006d mov eax,dword ptr [ebp-14h]
00000070 mov dword ptr [ebp-10h],eax
}
00000073 nop
00000074 pop ebx
00000075 pop esi
00000076 pop edi
00000077 mov esp,ebp
00000079 pop ebp
0000007a ret
MS Code generated by the JIT optimized
======================================
private void TestMethod(int size)
{
array = new int[size];
00000000 push edi
00000001 push esi
00000002 mov edi,ecx
00000004 mov esi,edx
00000006 test esi,80000000h
0000000c jne 0000005C
0000000e mov edx,esi
00000010 mov ecx,2DE6A22h
00000015 call FD5B2098
0000001a lea edx,[edi+4]
0000001d call 765897C0
for(int i=0; i< size; i++)
00000022 xor ecx,ecx
00000024 cmp esi,0
00000027 jle 00000040
00000029 mov edx,dword ptr [edi+4]
0000002c mov edi,dword ptr [edx+4]
array[i] = size - i;
0000002f cmp ecx,edi <=== start of loop
00000031 jae 00000064 <=== index out of bounds
00000033 mov eax,esi <=== esi contain
variable size
00000035 sub eax,ecx
00000037 mov dword ptr [edx+ecx*4+8],eax
for(int i=0; i< size; i++)
0000003b inc ecx <==== ecx contain variable i
0000003c cmp ecx,esi
0000003e jl 0000002F <==== jump to start of for loop
Console.WriteLine("Done" );
00000040 mov ecx,dword ptr ds:[01C4086Ch]
00000046 mov edx,dword ptr ds:[01C40110h]
0000004c mov eax,dword ptr [ecx]
0000004e call dword ptr [eax+000000D8h]
string s = Console.ReadLine();
00000054 call 76BB2F68
00000059 pop esi
}
0000005a pop edi
0000005b ret
0000005c xor ecx,ecx
0000005e call 764B2D57
00000063 int 3
00000064 xor ecx,ecx <==== Exception
00000066 call 763FEDDF
0000006b int 3
The Mono (0.18) JIT generates the following code (--dump-asm)
=============================================================
Disassembly of section .text:
00000000 <SpeedBenchmarks.BubbleSort_TestMethod>:
0: 55 push %ebp
1: 8b ec mov %esp,%ebp
3: 53 push %ebx
4: 56 push %esi
5: 83 ec 28 sub $0x28,%esp
8: c7 45 d8 00 00 00 00 movl $0x0,0xffffffd8(%ebp)
f: 8b 45 08 mov 0x8(%ebp),%eax
12: 05 08 00 00 00 add $0x8,%eax
17: 83 38 00 cmpl $0x0,(%eax)
1a: 8b 4d 0c mov 0xc(%ebp),%ecx
1d: 50 push %eax
1e: 51 push %ecx
1f: 52 push %edx
20: 51 push %ecx
21: 68 54 7e 18 08 push $0x8187e54
26: e8 b5 c2 e4 ff call ffe4c2e0
<SpeedBenchmarks.BubbleSort_TestMethod+0xffe4c2e0>
2b: 83 c4 08 add $0x8,%esp
2e: 5a pop %edx
2f: 59 pop %ecx
30: 8b c8 mov %eax,%ecx
32: 58 pop %eax
33: 89 08 mov %ecx,(%eax)
35: be 00 00 00 00 mov $0x0,%esi
<=== esi contain variable i
3a: e9 29 00 00 00 jmp 68
<SpeedBenchmarks.BubbleSort_TestMethod+0x68>
3f: 8b 45 08 mov 0x8(%ebp),%eax
<=== start loop
42: 8b 40 08 mov 0x8(%eax),%eax
\
45: 8b ce mov %esi,%ecx
|
47: 3b 48 0c cmp 0xc(%eax),%ecx
| Bounds check
4a: 72 0a jb 56
<SpeedBenchmarks.BubbleSort_TestMethod+0x56>/
4c: 68 e1 26 11 08 push $0x81126e1
51: e8 ce 56 ed ff call ffed5724
<SpeedBenchmarks.BubbleSort_TestMethod+0xffed5724> <=== Generate exception
56: 8d 44 88 10 lea 0x10(%eax,%ecx,4),%eax
5a: 8b 4d 0c mov 0xc(%ebp),%ecx
<== retrieve size
5d: 2b ce sub %esi,%ecx
5f: 89 08 mov %ecx,(%eax)
61: 8b c6 mov %esi,%eax
<=== move i to eax
63: 40 inc %eax <=== i++
64: 8b d8 mov %eax,%ebx
<=== (detour??)
66: 8b f3 mov %ebx,%esi
<=== move i back to esi
68: 8b 45 0c mov 0xc(%ebp),%eax
<=== retrieve size
6b: 3b f0 cmp %eax,%esi
<=== check if i < size
6d: 0f 8c cc ff ff ff jl 3f
<SpeedBenchmarks.BubbleSort_TestMethod+0x3f> <=== end loop
73: 68 38 da 16 08 push $0x816da38
78: 8b 05 78 df 16 08 mov 0x816df78,%eax
7e: 50 push %eax
7f: 83 38 00 cmpl $0x0,(%eax)
82: 8b 00 mov (%eax),%eax
84: ff 90 a0 00 00 00 call *0xa0(%eax)
8a: 83 c4 08 add $0x8,%esp
8d: 8b 05 78 df 16 08 mov 0x816df78,%eax
93: 50 push %eax
94: 83 38 00 cmpl $0x0,(%eax)
97: 8b 00 mov (%eax),%eax
99: ff 90 c0 00 00 00 call *0xc0(%eax)
9f: 83 c4 04 add $0x4,%esp
a2: 8b 05 80 df 16 08 mov 0x816df80,%eax
a8: 50 push %eax
a9: 83 38 00 cmpl $0x0,(%eax)
ac: 8b 00 mov (%eax),%eax
ae: ff 90 7c 00 00 00 call *0x7c(%eax)
b4: 83 c4 04 add $0x4,%esp
b7: 8b f0 mov %eax,%esi
b9: 8b de mov %esi,%ebx
bb: 8d 65 f8 lea 0xfffffff8(%ebp),%esp
be: 5e pop %esi
bf: 5b pop %ebx
c0: c9 leave
c1: c3 ret
-----Original Message-----
From: Mitchell Skinner [mailto:mitchskin@attbi.com]
Sent: Wednesday, January 15, 2003 12:42 AM
To: Tom Fransen
Cc: Mono-list@ximian.com
Subject: RE: [Mono-list] Performance / array access
Hello,
The runtime inserts array bounds checks to prevent things like buffer
overflows. So the following code:
int len = array.Length;
for (int i = 0; i < len; ++i)
sum += array [i];
becomes something like:
int len = array.Length;
for (int i = 0; i < len; ++i) {
if (i >= array.Length) throw new Exception; //forgot which
sum += array [i];
}
In the fast case, the MS JIT is smart enough to analyze the following
code and figure out that "i" will not go out of the bounds of the array
(because i starts at zero and stays below array.Length):
for (int i = 0; i < array.Length; ++i)
sum += array [i];
If the for loop isn't set up exactly like that, I don't think the MS JIT is
able to eliminate the bounds check (unless perhaps there are constants
involved). Is the code you have below (comparing i with a "size" variable)
the exact code you tested? If "size" is not a constant, I would be suprised
to find that there was a major difference between MS and mono.
Are you using the MS 1.0 release or the 1.1 release?
Mitch
On Tue, 2003-01-14 at 13:38, Tom Fransen wrote:
> Piers,
>
> can you spend a few more words on this. Why is the second case slower?
>
> Furthermore if I use a simple loop to fill an array the
> speed difference is a factor 3. I have a bubble sort method
> that I execute a larger number of times. On Mono it takes 19 seconds
> on the MS stuff ~6 seconds. This is a big difference. So why I am
> losing a factor 3 on the following loop.
>
> // Fill aray worst case, all elements need to be swapped
> for(int i=0; i< size; i++)
> {
> array[i] = size - i;
> }
>
> I am testing with some small benchmarks, but real applications (e.g. an
MP3
> decoder)
> often use tables stored in arrays to do certain calculations.
> So although the benchmark maybe syntetic certain application will
> suffer from these penalties.
>
> regards,
> Tom
>
> -----Original Message-----
> From: Miguel de Icaza [mailto:miguel@ximian.com]
> Sent: Monday, January 13, 2003 3:22 AM
> To: Piers Haken
> Cc: Tom Fransen; Mono-list@ximian.com
> Subject: RE: [Mono-list] Performance / array access
>
>
> Hello,
>
> > Yeah, Microsoft's JIT lifts invariant bounds-checks. But I believe
> > it's pretty limited.
> >
> > For example, the check is removed in the following case:
> >
> > for (int i = 0; i < array.Length; ++i)
> > sum += array [i];
> >
> > But not here:
> >
> > int len = array.Length;
> > for (int i = 0; i < len; ++i)
> > sum += array [i];
> >
> > So the first case is (counter-intuitively) faster than the second.
> >
> > I don't believe Mono's JIT makes this optimization. Maybe the new JIT
> > will ;-)
>
> This is a very good observation. Because it seems that this particular
> kind of loop is detected by the JIT engine.
>
> Array-bounds-check elimination is something we want to do with the new
> JIT, but it is not implemented at this point. The new JIT features a
> new intermediate representation that simplifies implementing this sort
> of thing, but it is still on its bootstrapping phases of life.
>
> Miguel
>
>
> _______________________________________________
> Mono-list maillist - Mono-list@ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-list
--
Mitchell Skinner <mitchskin@attbi.com>