[Mono-bugs] [Bug 76545][Wis] Changed - [SSAPRE] Performance Regression

Fri Oct 28 05:09:40 EDT 2005

Please do not reply to this email- if you want to comment on the bug, go to the
URL shown below and enter your comments there.

Changed by massi at ximian.com.

http://bugzilla.ximian.com/show_bug.cgi?id=76545

--- shadow/76545	2005-10-25 03:36:25.000000000 -0400
+++ shadow/76545.tmp.19359	2005-10-28 05:09:40.000000000 -0400
@@ -1,12 +1,12 @@
 Bug#: 76545
 Product: Mono: Runtime
 Version: 1.1
 OS: unknown
 OS Details: Fedora Core 4, Mono 1.1.9.2
-Status: NEW   
+Status: ASSIGNED   
 Resolution: 
 Severity: Unknown
 Priority: Wishlist
 Component: JIT
 AssignedTo: massi at ximian.com                            
 ReportedBy: Alexander.Markhonko at gmail.com               
@@ -74,6 +74,128 @@
 
 On my Duron 1.3 GHz system app running about 13 seconds without
 optimization and 40 seconds with 'ssapre' optimization
 
 ------- Additional Comments From bmaurer at users.sf.net  2005-10-25 03:36 -------
 Since ssapre is not enabled by default, this is not Major.
+
+------- Additional Comments From massi at ximian.com  2005-10-28 05:09 -------
+
+I cannot reproduce the described behavior.
+On my system, the performance of the program with and without SSAPRE is
+roughly the same, maybe SSAPRE on average loses a 0.5% but it is really
+difficult to measure (sometimes it also appears to gain something).
+
+Anyway, I inspected the logs, to see what SSAPRE does (add five times
+the '-v' option to the command line to see the logs).
+
+In the FloatTest method (which is the only interesting one) SSAPRE does
+just one modification to the code: it factors the "(double)i" expression
+inside the loop.
+In fact, this is the only redundancy in the whole method.
+
+So, the original code is like this:
+
+for (int i = 1; i < iInitVal; i++) {
+	f0 = (f1 / (double)i) - f2 + (f3 * (double)i);
+}
+
+and the modified code is like this:
+
+for (int i = 1; i < iInitVal; i++) {
+	double d = (double)i;
+	f0 = (f1 / d) - f2 + (f3 * d);
+}
+
+In terms of internal representation in the JIT:
+
+Before SSAPRE:
+ (stind.r8 regoffset[-0x20(%ebp)] (float_add (float_sub (float_div
+(ldind.r8 regoffset[-0x28(%ebp)]) (conv.r8 (ldind.i4 regvar[%esi])))
+(ldind.r8 regoffset[-0x30(%ebp)])) (float_mul (ldind.r8
+regoffset[-0x38(%ebp)]) (conv.r8 (ldind.i4 regvar[%esi])))))
+ (stind.i4 regvar[%esi] (add (ldind.i4 regvar[%esi]) iconst[1]))
+
+After SSAPRE:
+ (stind.r8 regoffset[-0x40(%ebp)] (conv.r8 (ldind.i4 regvar[%esi])))
+ (stind.r8 regoffset[-0x20(%ebp)] (float_add (float_sub (float_div
+(ldind.r8 regoffset[-0x28(%ebp)]) (ldind.r8 regoffset[-0x40(%ebp)]))
+(ldind.r8 regoffset[-0x30(%ebp)])) (float_mul (ldind.r8
+regoffset[-0x38(%ebp)]) (ldind.r8 regoffset[-0x40(%ebp)]))))
+ (stind.i4 regvar[%esi] (add (ldind.i4 regvar[%esi]) iconst[1]))
+
+
+Here it is evident that "regoffset[-0x40(%ebp)]" is used to store the
+result
+of "conv.r8 (ldind.i4 regvar[%esi])".
+This is the only modification that SSAPRE does to the code (in fact
+there are no
+other redundancies).
+
+
+Looking at the generated code on x86, we have the following:
+
+Before SSAPRE:
+  50:	dd 45 d8             	fldl   0xffffffd8(%ebp)
+  53:	56                   	push   %esi
+  54:	db 04 24             	fildl  (%esp)
+  57:	83 c4 04             	add    $0x4,%esp
+  5a:	de f9                	fdivrp %st,%st(1)
+  5c:	dd 45 d0             	fldl   0xffffffd0(%ebp)
+  5f:	de e9                	fsubrp %st,%st(1)
+  61:	dd 45 c8             	fldl   0xffffffc8(%ebp)
+  64:	56                   	push   %esi
+  65:	db 04 24             	fildl  (%esp)
+  68:	83 c4 04             	add    $0x4,%esp
+  6b:	de c9                	fmulp  %st,%st(1)
+  6d:	de c1                	faddp  %st,%st(1)
+  6f:	dd 5d e0             	fstpl  0xffffffe0(%ebp)
+  72:	46                   	inc    %esi
+
+
+After SSAPRE:
+  50:	56                   	push   %esi
+  51:	db 04 24             	fildl  (%esp)
+  54:	83 c4 04             	add    $0x4,%esp
+  57:	dd 5d c0             	fstpl  0xffffffc0(%ebp)
+  5a:	dd 45 d8             	fldl   0xffffffd8(%ebp)
+  5d:	dd 45 c0             	fldl   0xffffffc0(%ebp)
+  60:	de f9                	fdivrp %st,%st(1)
+  62:	dd 45 d0             	fldl   0xffffffd0(%ebp)
+  65:	de e9                	fsubrp %st,%st(1)
+  67:	dd 45 c8             	fldl   0xffffffc8(%ebp)
+  6a:	dd 45 c0             	fldl   0xffffffc0(%ebp)
+  6d:	de c9                	fmulp  %st,%st(1)
+  6f:	de c1                	faddp  %st,%st(1)
+  71:	dd 5d e0             	fstpl  0xffffffe0(%ebp)
+  74:	46                   	inc    %esi
+
+
+To make sense of this, consider that variable "i" got allocated to
+register
+"esi", therefore "(double)i" (or "conv.r8 (ldind.i4 regvar[%esi])") now is
+the sequence "push %esi; fildl (%esp); add $0x4,%esp;"
+
+And indeed that sequence occurs twice without SSAPRE, and just once when
+SSAPRE is used.
+
+The "problem" is that the value is stored back into the stack afterwords,
+with "fstpl 0xffffffc0(%ebp)", and then loaded back two times (see the two
+occurrences of "fldl 0xffffffc0(%ebp)".
+I say "problem" because ideally we should find a way to keep that
+value in a
+floating point register all the time.
+
+This is one of the typical bad interactions of SSAPRE with the regalloc.
+The work SSAPRE does is correct in principle (one redundancy is
+eliminated)
+but the additional pressure on the regalloc is not handled optimally.
+
+Anyway, performance does not visibly drop here (not that much).
+
+If you really see that performance drop, please produce the logs and look
+at the generated code, and followup here with a comment.
+
+Or just send the two logs to me by mail.
+
+And by the way, which version of Mono are you using?
+