Miguel de Icaza
25 May 2002 10:17:12 -0400
> > memcpy already takes care of copying in the fastest way posible.
> That's right, but we still have a call, a ret, and a conditional or two ;-)
I was going to say exactly that ;-)
> By inlining we can get rid of these things (especially if size is known up-front).
> Moreover, due to JIT's dynamic nature it's possible to generate faster code at run-time.
> For example, the following (generic) memcpy is faster on pre-Pentium x86s (Intel syntax):
> mov esi, $src
> mov ecx, $size
> mov edi, $dest
> shr ecx,1
> rep movsw
> adc cl,cl
> rep movsb
> For const size==1 we could just mov al, [src]; mov [dest],al
> BTW, MS JIT uses similar optimizations for cpblk/initblk.
Exactly. The same logic that lives in memmove() for the data size
quantum can be inlined by the JIT engine trivially.
However, how often does this happen? Until a couple of days ago we did
not have cpblk, so my guess is that measuring the performance impact
might not be immediately noticeable.
I would very much like to see this at some point.