[Mono-list] Performance / array access

Tom Fransen t.fransen@mailned.nl
Sun, 19 Jan 2003 17:02:19 +0100


Hello all,

Mitch, to come back on the issue of bound checking. I looked further into
the
problem and my first conclusion was wrong. Piers already stated that he did
not expect that the MS compiler would be able to optimize all these check
out of the loop.

I again started looking into the assembly code and although I'm not a
assembly programmer (it has been too long since I done some 6502/Z80 :-)
I think the performance difference (>2) comes from the optimization
performered by the MS JIT compiler (I am using version 1.0 of the SDK).

The MS JIT seems to be better at allocating processor registers to certain
variables resulting in much faster code. The execution speed of the Mono
code
is in the same order as the unoptimized code generated by the MS JIT
compiler
(for tight loops accessing arrays). In case of my bubble sorting
algorithm this results in a speed difference of a factor 3. I have
made other tests like applying matrix multiplications on a 2-dimensional
array (to simulate an image processing operation) and also here Mono
lacks behind. I haven't sorted this out but
my first guess is that its again the register optimization that makes the
difference.

For the code snippets below both the MS and Mono code is given:

       private void TestMethod(int size)
        {
            array = new int[size];
            for(int i=0; i< size; i++)
            {
                array[i] = size - i;
            }
            Console.WriteLine("Done" );
            string s = Console.ReadLine();
        }

 The MS JIT generates the following code. Note that
 I first run the program and latter when the program reads
 from the console attach the debugger. If I run the program from
 the MS development environment the JIT seems to generate unoptimized
 code so I need this trick.



 MS Code generated by the JIT unoptimized
=========================================


        private void TestMethod(int size)
        {
            array = new int[size];
00000000  push        ebp
00000001  mov         ebp,esp
00000003  sub         esp,14h
00000006  push        edi
00000007  push        esi
00000008  push        ebx
00000009  mov         ebx,ecx
0000000b  mov         esi,edx
0000000d  xor         edi,edi
0000000f  mov         dword ptr [ebp-10h],0
00000016  test        esi,80000000h
0000001c  je          00000025
0000001e  xor         ecx,ecx
00000020  call        763A2CFF
00000025  mov         edx,esi
00000027  mov         ecx,2DE6B3Ah
0000002c  call        FD4A2040
00000031  lea         edx,[ebx+4]
00000034  call        76479768
            for(int i=0; i< size; i++)
00000039  xor         edi,edi
0000003b  nop
0000003c  jmp         00000056
                array[i] = size - i;
0000003e  mov         eax,dword ptr [ebx+4]    <==== Start loop, fetch the
size of the array
00000041  cmp         edi,dword ptr [eax+4]    <==== Check if within bounds
00000044  jb          0000004D
00000046  xor         ecx,ecx
00000048  call        762EED87 		       <==== Trigger exception
0000004d  mov         edx,esi 	   	       <==== esi contain variable size
0000004f  sub         edx,edi
00000051  mov         dword ptr [eax+edi*4+8],edx
            for(int i=0; i< size; i++)
00000055  inc         edi  		       <==== edi contain variable i
00000056  cmp         edi,esi
00000058  jl          0000003E                 <==== Jump to start of for
loop
            Console.WriteLine("Done" );
0000005a  mov         ecx,dword ptr ds:[01C40178h]
00000060  call        76AA3250
            string s = Console.ReadLine();
00000065  call        76AA2F10
0000006a  mov         dword ptr [ebp-14h],eax
0000006d  mov         eax,dword ptr [ebp-14h]
00000070  mov         dword ptr [ebp-10h],eax
        }
00000073  nop
00000074  pop         ebx
00000075  pop         esi
00000076  pop         edi
00000077  mov         esp,ebp
00000079  pop         ebp
0000007a  ret


MS Code generated by the JIT optimized
======================================


        private void TestMethod(int size)
        {
            array = new int[size];
00000000  push        edi
00000001  push        esi
00000002  mov         edi,ecx
00000004  mov         esi,edx
00000006  test        esi,80000000h
0000000c  jne         0000005C
0000000e  mov         edx,esi
00000010  mov         ecx,2DE6A22h
00000015  call        FD5B2098
0000001a  lea         edx,[edi+4]
0000001d  call        765897C0
            for(int i=0; i< size; i++)
00000022  xor         ecx,ecx
00000024  cmp         esi,0
00000027  jle         00000040
00000029  mov         edx,dword ptr [edi+4]
0000002c  mov         edi,dword ptr [edx+4]
                array[i] = size - i;
0000002f  cmp         ecx,edi				<=== start of loop
00000031  jae         00000064 				<=== index out of bounds
00000033  mov         eax,esi                           <=== esi contain
variable size
00000035  sub         eax,ecx
00000037  mov         dword ptr [edx+ecx*4+8],eax
            for(int i=0; i< size; i++)
0000003b  inc         ecx  				<==== ecx contain variable i
0000003c  cmp         ecx,esi
0000003e  jl          0000002F 				<==== jump to start of for loop
            Console.WriteLine("Done" );
00000040  mov         ecx,dword ptr ds:[01C4086Ch]
00000046  mov         edx,dword ptr ds:[01C40110h]
0000004c  mov         eax,dword ptr [ecx]
0000004e  call        dword ptr [eax+000000D8h]
            string s = Console.ReadLine();
00000054  call        76BB2F68
00000059  pop         esi
        }
0000005a  pop         edi
0000005b  ret
0000005c  xor         ecx,ecx
0000005e  call        764B2D57
00000063  int         3
00000064  xor         ecx,ecx 				<==== Exception
00000066  call        763FEDDF
0000006b  int         3


The Mono (0.18) JIT generates the following code (--dump-asm)
=============================================================

Disassembly of section .text:

00000000 <SpeedBenchmarks.BubbleSort_TestMethod>:
   0:	55                   	push   %ebp
   1:	8b ec                	mov    %esp,%ebp
   3:	53                   	push   %ebx
   4:	56                   	push   %esi
   5:	83 ec 28             	sub    $0x28,%esp
   8:	c7 45 d8 00 00 00 00 	movl   $0x0,0xffffffd8(%ebp)
   f:	8b 45 08             	mov    0x8(%ebp),%eax
  12:	05 08 00 00 00       	add    $0x8,%eax
  17:	83 38 00             	cmpl   $0x0,(%eax)
  1a:	8b 4d 0c             	mov    0xc(%ebp),%ecx
  1d:	50                   	push   %eax
  1e:	51                   	push   %ecx
  1f:	52                   	push   %edx
  20:	51                   	push   %ecx
  21:	68 54 7e 18 08       	push   $0x8187e54
  26:	e8 b5 c2 e4 ff       	call   ffe4c2e0
<SpeedBenchmarks.BubbleSort_TestMethod+0xffe4c2e0>
  2b:	83 c4 08             	add    $0x8,%esp
  2e:	5a                   	pop    %edx
  2f:	59                   	pop    %ecx
  30:	8b c8                	mov    %eax,%ecx
  32:	58                   	pop    %eax
  33:	89 08                	mov    %ecx,(%eax)
  35:	be 00 00 00 00       	mov    $0x0,%esi
<=== esi contain variable i
  3a:	e9 29 00 00 00       	jmp    68
<SpeedBenchmarks.BubbleSort_TestMethod+0x68>
  3f:	8b 45 08             	mov    0x8(%ebp),%eax
<=== start loop
  42:	8b 40 08             	mov    0x8(%eax),%eax
\
  45:	8b ce                	mov    %esi,%ecx
|
  47:	3b 48 0c             	cmp    0xc(%eax),%ecx
| Bounds check
  4a:	72 0a                	jb     56
<SpeedBenchmarks.BubbleSort_TestMethod+0x56>/
  4c:	68 e1 26 11 08       	push   $0x81126e1
  51:	e8 ce 56 ed ff       	call   ffed5724
<SpeedBenchmarks.BubbleSort_TestMethod+0xffed5724> <=== Generate exception
  56:	8d 44 88 10          	lea    0x10(%eax,%ecx,4),%eax
  5a:	8b 4d 0c             	mov    0xc(%ebp),%ecx
<== retrieve size
  5d:	2b ce                	sub    %esi,%ecx
  5f:	89 08                	mov    %ecx,(%eax)
  61:	8b c6                	mov    %esi,%eax
<=== move i to eax
  63:	40                   	inc    %eax						<=== i++
  64:	8b d8                	mov    %eax,%ebx
<=== (detour??)
  66:	8b f3                	mov    %ebx,%esi
<=== move i back to esi
  68:	8b 45 0c             	mov    0xc(%ebp),%eax
<=== retrieve size
  6b:	3b f0                	cmp    %eax,%esi
<=== check if i < size
  6d:	0f 8c cc ff ff ff    	jl     3f
<SpeedBenchmarks.BubbleSort_TestMethod+0x3f>  <=== end loop
  73:	68 38 da 16 08       	push   $0x816da38
  78:	8b 05 78 df 16 08    	mov    0x816df78,%eax
  7e:	50                   	push   %eax
  7f:	83 38 00             	cmpl   $0x0,(%eax)
  82:	8b 00                	mov    (%eax),%eax
  84:	ff 90 a0 00 00 00    	call   *0xa0(%eax)
  8a:	83 c4 08             	add    $0x8,%esp
  8d:	8b 05 78 df 16 08    	mov    0x816df78,%eax
  93:	50                   	push   %eax
  94:	83 38 00             	cmpl   $0x0,(%eax)
  97:	8b 00                	mov    (%eax),%eax
  99:	ff 90 c0 00 00 00    	call   *0xc0(%eax)
  9f:	83 c4 04             	add    $0x4,%esp
  a2:	8b 05 80 df 16 08    	mov    0x816df80,%eax
  a8:	50                   	push   %eax
  a9:	83 38 00             	cmpl   $0x0,(%eax)
  ac:	8b 00                	mov    (%eax),%eax
  ae:	ff 90 7c 00 00 00    	call   *0x7c(%eax)
  b4:	83 c4 04             	add    $0x4,%esp
  b7:	8b f0                	mov    %eax,%esi
  b9:	8b de                	mov    %esi,%ebx
  bb:	8d 65 f8             	lea    0xfffffff8(%ebp),%esp
  be:	5e                   	pop    %esi
  bf:	5b                   	pop    %ebx
  c0:	c9                   	leave
  c1:	c3                   	ret


-----Original Message-----
From: Mitchell Skinner [mailto:mitchskin@attbi.com]
Sent: Wednesday, January 15, 2003 12:42 AM
To: Tom Fransen
Cc: Mono-list@ximian.com
Subject: RE: [Mono-list] Performance / array access


Hello,

The runtime inserts array bounds checks to prevent things like buffer
overflows.  So the following code:
   int len = array.Length;
   for (int i = 0; i < len; ++i)
      sum += array [i];
becomes something like:
   int len = array.Length;
   for (int i = 0; i < len; ++i) {
      if (i >= array.Length) throw new Exception; //forgot which
      sum += array [i];
   }

In the fast case, the MS JIT is smart enough to analyze the following
code and figure out that "i" will not go out of the bounds of the array
(because i starts at zero and stays below array.Length):
   for (int i = 0; i < array.Length; ++i)
      sum += array [i];

If the for loop isn't set up exactly like that, I don't think the MS JIT is
able to eliminate the bounds check (unless perhaps there are constants
involved).  Is the code you have below (comparing i with a "size" variable)
the exact code you tested?  If "size" is not a constant, I would be suprised
to find that there was a major difference between MS and mono.

Are you using the MS 1.0 release or the 1.1 release?

Mitch


On Tue, 2003-01-14 at 13:38, Tom Fransen wrote:
> Piers,
>
> can you spend a few more words on this. Why is the second case slower?
>
> Furthermore if I use a simple loop to fill an array the
> speed difference is a factor 3. I have a bubble sort method
> that I execute a larger number of times. On Mono it takes 19 seconds
> on the MS stuff ~6 seconds. This is a big difference. So why I am
> losing a factor 3 on the following loop.
>
>                 // Fill aray worst case, all elements need to be swapped
>                 for(int i=0; i< size; i++)
>                 {
>                     array[i] = size - i;
>                 }
>
> I am testing with some small benchmarks, but real applications (e.g. an
MP3
> decoder)
> often use tables stored in arrays to do certain calculations.
> So although the benchmark maybe syntetic certain application will
> suffer from these penalties.
>
> regards,
> Tom
>
> -----Original Message-----
> From: Miguel de Icaza [mailto:miguel@ximian.com]
> Sent: Monday, January 13, 2003 3:22 AM
> To: Piers Haken
> Cc: Tom Fransen; Mono-list@ximian.com
> Subject: RE: [Mono-list] Performance / array access
>
>
> Hello,
>
> > Yeah, Microsoft's JIT lifts invariant bounds-checks. But I believe
> > it's pretty limited.
> >
> > For example, the check is removed in the following case:
> >
> >   for (int i = 0; i < array.Length; ++i)
> >     sum += array [i];
> >
> > But not here:
> >
> >   int len = array.Length;
> >   for (int i = 0; i < len; ++i)
> >     sum += array [i];
> >
> > So the first case is (counter-intuitively) faster than the second.
> >
> > I don't believe Mono's JIT makes this optimization. Maybe the new JIT
> > will ;-)
>
> This is a very good observation.  Because it seems that this particular
> kind of loop is detected by the JIT engine.
>
> Array-bounds-check elimination is something we want to do with the new
> JIT, but it is not implemented at this point.  The new JIT features a
> new intermediate representation that simplifies implementing this sort
> of thing, but it is still on its bootstrapping phases of life.
>
> Miguel
>
>
> _______________________________________________
> Mono-list maillist  -  Mono-list@ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-list
--
Mitchell Skinner <mitchskin@attbi.com>