[Mono-dev] [PATCH] Update AuxV memcpy and icbi optimizations

Steven Munroe munroesj at us.ibm.com
Thu Mar 5 22:03:16 EST 2009


This patch address Paolo's concerns

We need this committed in some form so I can move on to other
optimizations.

Paolo Molaro wrote:

> > <snip>
> >
> > Headers included without proper checks, besides, pthread is not needed
> > here, elf.h can be simply avoided (there was already a structure to
> > represent the auxv data that you deleted).
> >
> >   
>   
cleaned up the use of header to be the minimum.

>> >> +volatile static ElfW(auxv_t) *linux_auxv_buf = NULL;
>> >>     
>>     
> >
> > We don't want to keep this data duplicated in memory: there is no use
> > except using more memory.
> >
> >   
>   
I Now free the buffer at the end of mono_arch_cpu_init.


>> >> +		/* Older kernels did not support /proc/<PID>/auxv. But
>> >> +		   the auvx table does exist in the process address
>> >> +		   space following the env table. So try scanning over
>> >> +		   the environment table to find the auxv. */
>> >> +			if (errno == ENOENT) {
>> >> +				auxv_temp = (ElfW(auxv_t)*) linux_find_auxv();
>> >> +		/* If someone has done a setenv() the __environ pointer
>> >> +		   may have been moved and the assumption that the auxv
>> >> +		   follows is not true. So look at the first entry and
>> >> +		   verify that it is an auxv entry. */
>> >>     
>>     
> >
> > If this detection is not reliable we should simply not do it. We can
> > document that users should have a 2.6 kernel.
> >
> >   
>   
This the method we support in libauxv.so so I don't why mono should be
different.

>> >> +			} else {
>> >> +				perror("Error opening file for reading");
>> >>     
>>     
> >
> > We don't want to bother users with an error message that is both
> > non-informative and that they can't do anything about.
> >
> >   
>   
Ok, removed these and the associated headers.


>> >> +static pthread_once_t auxv_once_control = PTHREAD_ONCE_INIT;
>> >>     
>>     
> >
> > There is no need to use pthread_once_t: just doing the setup in the init
> > call is ok.
> >
> >   
>   
Ok, removed

> > <snip>
>   
>> >> +/* Number of independent load store pipes in each core. */
>> >> +static int		linux_ppc_LSUs = 0;
>> >> +/* Number of independent fixed point pipes in each core. */
>> >> +static int		linux_ppc_FXUs = 0;
>> >> +/* Number of independent floating point pipes in each core. */
>> >> +static int		linux_ppc_FPUs = 0;
>> >>     
>>     
> >
> > A single uint bitmask can replace all of the above for the jit usage.
> > Besides, having one or more load/store units is not a linux property,
> > so the names of the variables is incorrect.
> >
> >   
>   
Actually with current and future processors in the pipeline a bitmask is
not enough.


>> >> +#ifdef PPC_FEATURE_ICACHE_SNOOP
>> >> +#define HAS_ICACHE_SNOOP (linux_ppc_hwcap & PPC_FEATURE_ICACHE_SNOOP)
>> >> +#else
>> >> +#define HAS_ICACHE_SNOOP 0
>> >> +#endif
>> >> +
>> >> +static int
>> >> +linux_init_ppc_SMP(void)
>> >> +{
>> >> +	struct utsname u;
>> >> +	/* FIXME For 2.6.26 kernels we can try to use sysfs
>> >> +	   /sys/devices/system/cpu/possible,  but we would have to fall
>> >> +	   back to uname for early kernels anyway. */
>> >> +
>> >> +	if (uname(&u) != 0) {
>> >> +		perror("Error uname syscall failer\n");
>> >> +		return 1;
>> >> +	}
>> >> +
>> >> +	if (strstr(u.version, "SMP"))
>> >> +		return 1;
>> >> +	else {
>> >> +		if (strstr(u.version, "smp"))
>> >> +			return 1;
>> >> +		else
>> >> +			return 0;
>> >>     
>>     
> >
> > This doesn't look like a reliable way. We have already utility calls to
> > get the number of processors and that should be enough.
> >
> >   
>   
I have not found any evidence of a framework for detecting SMP in the
mono source. Also in the server world, where dynamic CPU add is common,
a SMP capable kernel is the best evidence.


> > <snip>
>   
>> >> +#else
>> >> +	if ((linux_ppc_LSUs > 1) && (dreg != ppc_r12) && (sreg != ppc_r12)) { 
>> >> +		while (size >= 8) {
>> >> +			ppc_load_reg (code, ppc_r0, soffset, sreg);
>> >> +			ppc_load_reg (code, ppc_r12, soffset+4, sreg);
>> >> +			ppc_store_reg (code, ppc_r0, doffset, dreg);
>> >> +			ppc_store_reg (code, ppc_r12, doffset+4, dreg);
>> >> +			size -= 8;
>> >> +			soffset += 8;
>> >> +			doffset += 8; 
>> >> +		}
>> >> +	}
>> >>     
>>     
> >
> > This optimization should be in its own separate patch (together with
> > numbers that show it's worthwhile).
> >
> >   
>   
This was the way processor designers told us to implement memcpy. That
is why we used this used this in both GCC and GLIBC memcpy. The GLIBC
memcpy improvement where on the order of 20% for micro benchmarks.

>> >> @@ -664,21 +890,29 @@
>> >>  		isync
>> >>  	}
>> >>  #else
>> >> -	if (1) {
>> >> -		for (p = start; p < endp; p += cachelineinc) {
>> >> -			asm ("dcbf 0,%0;" : : "r"(p) : "memory");
>> >> +	/* For POWER5/6 with ICACHE_SNOOP the dcbst/icbi is not required.  */
>> >> +	if (!HAS_ICACHE_SNOOP) {
>> >> +		if (linux_ppc_SMP) {
>> >> +			for (p = start; p < endp; p += cachelineinc) {
>> >> +				asm ("dcbf 0,%0;" : : "r"(p) : "memory");
>> >> +			}
>> >> +		} else {
>> >> +			for (p = start; p < endp; p += cachelineinc) {
>> >> +				asm ("dcbst 0,%0;" : : "r"(p) : "memory");
>> >> +			}
>> >>     
>>     
> >
> > The changes to icache flush should be a separate patch. As I said already
> > on irc, though, unless there is a bug in the code, changes to it should
> > be done only if there are hard numbers showing that the change is a
> > performance benefit in some real-world case. The code as it is written
> > fixed some random crashes happening a few years ago, so there must be
> > very strong reasons to change it even a single instruction.
> >
> >   
>   
I provided results in a separate email.





-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: ppc-part4-opt-20090303.txt
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20090305/830280ac/attachment-0001.txt 


More information about the Mono-devel-list mailing list