[Mono-dev] mono numerical performance

Jonathan Shore jonathan.shore at gmail.com
Sun Nov 20 08:01:44 EST 2011


Here is a link to and entry in bugzilla with attached code.  I could not send to the list:

http://bugzilla.xamarin.com/show_bug.cgi?id=2098

On Nov 20, 2011, at 7:41 AM, Jonathan Shore wrote:

> Did the code I attached get filtered?  I'll post the tar.gz into bugzilla and send the link.
> 
> Below are code snippets to calculate Ordinary Least Squares, a simpler example.   I found this to be 4x slower than C++ / Java:
> 
> Here is the "safe" and "unsafe" versions of OLS which I ran on an array size of 50,000 10,000 x:
> 
> public class SafeOLS
> {
> 	public static double OLS (double[] x, double[] y)
> 	{
> 		var eXY = 0.0;
> 		var eXX = 0.0;
> 		var eX = 0.0;
> 		var eY = 0.0;
> 			
> 		var len = x.Length;
> 		
> 		for (int i = 0 ; i < len ; i++)
> 		{
> 			var vx = x[i];
> 			var vy = y[i];
> 		
> 			eXY += vx*vy;
> 			eXX += vx*vx;
> 			eX += vx;
> 			eY += vy;
> 		}
> 		
> 		var n = (double)len;
> 		return (eXY - eX * eY / n) / (eXX - eX * eX / n);
> 	}
> }
> 
> 
> public class UnSafeOLS
> {
> 	unsafe public static double OLS (double[] x, double[] y)
> 	{
> 		var eXY = 0.0;
> 		var eXX = 0.0;
> 		var eX = 0.0;
> 		var eY = 0.0;
> 		
> 		var len = x.Length;
> 		
> 		fixed (double* px = x)
> 		fixed (double* py = y)
> 		{
> 			double* vpx = px;
> 			double* vpy = py;
> 			
> 			for (int i = 0 ; i < len ; i++)
> 			{
> 				var vx = *vpx++;
> 				var vy = *vpy++;
> 			
> 				eXY += vx*vy;
> 				eXX += vx*vx;
> 				eX += vx;
> 				eY += vy;
> 			}
> 		}
> 			
> 		var n = (double)len;
> 		return (eXY - eX * eY / n) / (eXX - eX * eX / n);
> 	}
> }
> 
> 
> One can use the following as a driver, parameterized with 50000, 10000 or something like that:
> 
> private static void TestUnSafeOLS (int dim, int iterations)
> {
> 	double[] x = new double[dim];
> 	double[] y = new double[dim];
> 
> 	for (int i = 0 ; i < x.Length ; i++)
> 	{
> 		x[i] = i;
> 		y[i] = i*i / 1000.0;
> 	}
> 
> 	Stopwatch watch = new Stopwatch ();
> 	watch.Start();
> 			
> 	double sum = 0;
> 	for (int i = 0 ; i < iterations ; i++)
> 	{
> 		sum += UnSafeOLS.OLS (x,y);
> 		x[100] = sum;
> 	}
> 			
> 	watch.Stop();
> 	Console.WriteLine ("unsafe ols: " + sum + ", elapsed: " + watch.Elapsed);
> }
> 
> 
> Here is the C++ version of OLS:
> 
> 
> static double OLS (double* x, double* y, int len)
> {
> 	double eXY = 0.0;
> 	double eXX = 0.0;
> 	double eX = 0.0;
> 	double eY = 0.0;
> 	
> 	for (int i = 0 ; i < len ; i++)
> 	{
> 		double vx = x[i];
> 		double vy = y[i];
> 	
> 		eXY += vx*vy;
> 		eXX += vx*vx;
> 		eX += vx;
> 		eY += vy;
> 	}
> 	
> 	double n = (double)len;
> 	return (eXY - eX * eY / n) / (eXX - eX * eX / n);
> }
> 
> static void TestOLS (int dim, int iterations)
> {
> 	double* x = new double[dim];
> 	double* y = new double[dim];
> 
> 	for (int i = 0 ; i < dim ; i++)
> 	{
> 		x[i] = i;
> 		y[i] = i*i / 1000.0;
> 	}
> 
> 	long Tstart = CurrentTimeMilli();
> 	
> 	double sum = 0;
> 	for (int i = 0 ; i < iterations ; i++)
> 	{
> 		sum += OLS (x,y, dim);
> 		x[100] = sum;
> 	}
> 	
> 	long Tend = CurrentTimeMilli();
> 	long Telapsed = (Tend-Tstart);
> 	
> 	printf ("OLS: %lf, elapsed: %02d:%02d:%03d\n", sum, (int)(Telapsed / 60000), (int)(Telapsed % 60000) / 1000, (int)(Telapsed % 1000));
> }
> 
> int main (int argc, char *argv[])
> {
> 	TestOLS (50000, 100000);
> 	return 0;
> }
> 
>  
> Thanks in advance for any pointers and analysis:
> 
> I will send another post with the link in a bit.
> Jonathan
> 
> 
> On Nov 20, 2011, at 3:28 AM, Stefanos A. wrote:
> 
>> 2011/11/20 Jonathan Shore <jonathan.shore at gmail.com>
>> Slide, not really.  If mono SIMD had a more general mapping to the GPU, or could operate on very large vectors or matrices, possibly.   Linear algebra is an easy mapping to that stuff.   However, I do more complicated stuff around timeseries, so does not really fit into linear alg stuff.
>> 
>> I guess, what I'm really after is to understand why the unsafe implementation is hardly faster than the "safe" version.   Whereas on the .NET CLR is 2x as fast, and nearly as fast as the C++ implementation.    There is no GC or object creation involved here, just arrays and computations.
>> 
>> Without sharing some code, it's almost impossible to tell what might be the cause of the discrepancy or any ways to improve performance. Have you measured performance with the regular JITter rather than LLVM?
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20111120/6715083f/attachment-0001.html 


More information about the Mono-devel-list mailing list