[Mono-devel-list] [PATCH] Move of Interlocked.Increment/Decrement/Exchange I4 to op codes

Torstensson, Patrik patrik.torstensson at intel.com
Tue Nov 23 16:08:46 EST 2004

> This patch moves the implementation of Interlocked functions for x86
> op codes. The patch detects uniprocessor machines to allow skipping
> bus lock prefix (2,5 x speed difference).
>This is looking very nice.


>> this patch; it should ignore the mp check when doing AOT, havn't had
>> time to fix that in this patch (simple.. Just check for AOT in
>> detection)

>No, I don't think so. AOT code should be considered to be specific to
>box it is generated on. For example, if we are on a p4, we can use cmov
>and sse.

Fine, not my call.

>For `OP_ATOMIC_EXCHANGE_I4', I never understood why we can't use xchg
>here. Is there any specific reason.

Yes, there are sevral reasons. First, xchg is always doing a buslock,
second, we can't be sure if we exchange our value with xchg because
another cpu may have changed our value before the xchg op, read
http://msdn.microsoft.com/msdnmag/issues/0700/Win32/ for more info.

>Also, if you want something in eax or another register, you should just
>specify that in the regalloc spec, and not us push/pop.

Again, there is a reason for the code. Our current reg allocator doesn't
support the functionality needed to fully support cmpxch ops, therefore
I thought it was best to leave that right now and get this patch in.
It's easy to fix this when the reg allocator supports it, maybe add a
bug about it. We need to support forcing dest reg (not dest as we do
today for div etc) to eax and sreg2 not in eax.

>I think that the inc/dec should use a common path OP_ATOMIC_ADD. In 2.0
>there is Interlocked.Add, which adds any number. 

Not sure about that. I would add a new OP_ATOMIC_ADD for it.. But it
doesn't matter.

>ALso, we should add optimizations so that:

>Interlocked.Increment (ref foo)
>gets turned into
>[lock; ] inc [foo]
>rather than an xadd type thing.

Fine, let's get the patch in first then optimize. This patch is 2,5
faster on mp machines as it is.

-- Patrik

More information about the Mono-devel-list mailing list