[Mono-bugs] [Bug 76545][Wis] Changed - [SSAPRE] Performance
Regression
bugzilla-daemon at bugzilla.ximian.com
bugzilla-daemon at bugzilla.ximian.com
Fri Oct 28 05:09:40 EDT 2005
Please do not reply to this email- if you want to comment on the bug, go to the
URL shown below and enter your comments there.
Changed by massi at ximian.com.
http://bugzilla.ximian.com/show_bug.cgi?id=76545
--- shadow/76545 2005-10-25 03:36:25.000000000 -0400
+++ shadow/76545.tmp.19359 2005-10-28 05:09:40.000000000 -0400
@@ -1,12 +1,12 @@
Bug#: 76545
Product: Mono: Runtime
Version: 1.1
OS: unknown
OS Details: Fedora Core 4, Mono 1.1.9.2
-Status: NEW
+Status: ASSIGNED
Resolution:
Severity: Unknown
Priority: Wishlist
Component: JIT
AssignedTo: massi at ximian.com
ReportedBy: Alexander.Markhonko at gmail.com
@@ -74,6 +74,128 @@
On my Duron 1.3 GHz system app running about 13 seconds without
optimization and 40 seconds with 'ssapre' optimization
------- Additional Comments From bmaurer at users.sf.net 2005-10-25 03:36 -------
Since ssapre is not enabled by default, this is not Major.
+
+------- Additional Comments From massi at ximian.com 2005-10-28 05:09 -------
+
+I cannot reproduce the described behavior.
+On my system, the performance of the program with and without SSAPRE is
+roughly the same, maybe SSAPRE on average loses a 0.5% but it is really
+difficult to measure (sometimes it also appears to gain something).
+
+Anyway, I inspected the logs, to see what SSAPRE does (add five times
+the '-v' option to the command line to see the logs).
+
+In the FloatTest method (which is the only interesting one) SSAPRE does
+just one modification to the code: it factors the "(double)i" expression
+inside the loop.
+In fact, this is the only redundancy in the whole method.
+
+So, the original code is like this:
+
+for (int i = 1; i < iInitVal; i++) {
+ f0 = (f1 / (double)i) - f2 + (f3 * (double)i);
+}
+
+and the modified code is like this:
+
+for (int i = 1; i < iInitVal; i++) {
+ double d = (double)i;
+ f0 = (f1 / d) - f2 + (f3 * d);
+}
+
+In terms of internal representation in the JIT:
+
+Before SSAPRE:
+ (stind.r8 regoffset[-0x20(%ebp)] (float_add (float_sub (float_div
+(ldind.r8 regoffset[-0x28(%ebp)]) (conv.r8 (ldind.i4 regvar[%esi])))
+(ldind.r8 regoffset[-0x30(%ebp)])) (float_mul (ldind.r8
+regoffset[-0x38(%ebp)]) (conv.r8 (ldind.i4 regvar[%esi])))))
+ (stind.i4 regvar[%esi] (add (ldind.i4 regvar[%esi]) iconst[1]))
+
+After SSAPRE:
+ (stind.r8 regoffset[-0x40(%ebp)] (conv.r8 (ldind.i4 regvar[%esi])))
+ (stind.r8 regoffset[-0x20(%ebp)] (float_add (float_sub (float_div
+(ldind.r8 regoffset[-0x28(%ebp)]) (ldind.r8 regoffset[-0x40(%ebp)]))
+(ldind.r8 regoffset[-0x30(%ebp)])) (float_mul (ldind.r8
+regoffset[-0x38(%ebp)]) (ldind.r8 regoffset[-0x40(%ebp)]))))
+ (stind.i4 regvar[%esi] (add (ldind.i4 regvar[%esi]) iconst[1]))
+
+
+Here it is evident that "regoffset[-0x40(%ebp)]" is used to store the
+result
+of "conv.r8 (ldind.i4 regvar[%esi])".
+This is the only modification that SSAPRE does to the code (in fact
+there are no
+other redundancies).
+
+
+Looking at the generated code on x86, we have the following:
+
+Before SSAPRE:
+ 50: dd 45 d8 fldl 0xffffffd8(%ebp)
+ 53: 56 push %esi
+ 54: db 04 24 fildl (%esp)
+ 57: 83 c4 04 add $0x4,%esp
+ 5a: de f9 fdivrp %st,%st(1)
+ 5c: dd 45 d0 fldl 0xffffffd0(%ebp)
+ 5f: de e9 fsubrp %st,%st(1)
+ 61: dd 45 c8 fldl 0xffffffc8(%ebp)
+ 64: 56 push %esi
+ 65: db 04 24 fildl (%esp)
+ 68: 83 c4 04 add $0x4,%esp
+ 6b: de c9 fmulp %st,%st(1)
+ 6d: de c1 faddp %st,%st(1)
+ 6f: dd 5d e0 fstpl 0xffffffe0(%ebp)
+ 72: 46 inc %esi
+
+
+After SSAPRE:
+ 50: 56 push %esi
+ 51: db 04 24 fildl (%esp)
+ 54: 83 c4 04 add $0x4,%esp
+ 57: dd 5d c0 fstpl 0xffffffc0(%ebp)
+ 5a: dd 45 d8 fldl 0xffffffd8(%ebp)
+ 5d: dd 45 c0 fldl 0xffffffc0(%ebp)
+ 60: de f9 fdivrp %st,%st(1)
+ 62: dd 45 d0 fldl 0xffffffd0(%ebp)
+ 65: de e9 fsubrp %st,%st(1)
+ 67: dd 45 c8 fldl 0xffffffc8(%ebp)
+ 6a: dd 45 c0 fldl 0xffffffc0(%ebp)
+ 6d: de c9 fmulp %st,%st(1)
+ 6f: de c1 faddp %st,%st(1)
+ 71: dd 5d e0 fstpl 0xffffffe0(%ebp)
+ 74: 46 inc %esi
+
+
+To make sense of this, consider that variable "i" got allocated to
+register
+"esi", therefore "(double)i" (or "conv.r8 (ldind.i4 regvar[%esi])") now is
+the sequence "push %esi; fildl (%esp); add $0x4,%esp;"
+
+And indeed that sequence occurs twice without SSAPRE, and just once when
+SSAPRE is used.
+
+The "problem" is that the value is stored back into the stack afterwords,
+with "fstpl 0xffffffc0(%ebp)", and then loaded back two times (see the two
+occurrences of "fldl 0xffffffc0(%ebp)".
+I say "problem" because ideally we should find a way to keep that
+value in a
+floating point register all the time.
+
+This is one of the typical bad interactions of SSAPRE with the regalloc.
+The work SSAPRE does is correct in principle (one redundancy is
+eliminated)
+but the additional pressure on the regalloc is not handled optimally.
+
+Anyway, performance does not visibly drop here (not that much).
+
+If you really see that performance drop, please produce the logs and look
+at the generated code, and followup here with a comment.
+
+Or just send the two logs to me by mail.
+
+And by the way, which version of Mono are you using?
+
More information about the mono-bugs
mailing list