[Mono-devel-list] mono AES performance woes (was: poor PPC JIT output)

Wed Jul 20 10:55:20 EDT 2005

Hello Allan,

Sorry it took me some time to reply but I'm out of town and didn't had
access to the mailing-list from from here (well this email account).

First if your application performance is directly linked to the
performance of AES then I strongly suggest you to use a native library
(p/invoking). As your numbers shows (MS managed versus native) this is
where you'll get the maximum performance.

The .NET crypto framework makes it easy, using CryptoConfig, to extend
the crypto classes. So you could have a RijndaelNative class, inheriting
from System.Security.Cryptography.Rijndael, that could be used
transparently by any existing .NET application* (at least if it was
coded correctly). Doing so may even be more interesting if you want to
support optional hardware acceleration in the future for your
application.

*Note that your native implementation should support all .NET
cipher/padding modes before changing the machine.config file (at least
changing the "default" AES implementation) - or some apps may fails as
this is a global setting.

> Here's some times for 1000 encrypts/decrypts of 32768 byte chunks  
> from some machines we have here in the office, ordered by speed:
> 57.7 seconds under mono 1.1.8.1, OS X 10.4.2 (1.67 Ghz G4 1.2)
> 55.0 seconds under mono 1.1.8.1, Linux 2.6.9 (1.8 Ghz Athlon XP 2500+)
> 45.8 seconds under mono 1.1.8.1, Linux 2.6.9 (2.2 Ghz Athlon 64 3200+)
> 42.4 seconds under mono 1.1.8.1, OS X 10.4.2 (2.0 Ghz G5 3.0)
> 9.01 seconds under Microsoft .NET 1.1.4322, Windows XP Pro SP2 (2.0  
> Ghz Athlon 64 3200+)
> 
> If you look at the benchmark code, it uses RijndaelManaged to do  
> encrypt/decrypt. This class is supposedly 100% managed code in the  
> Microsoft implementation.

AFAIK this is 100% managed but it's the _only_ (symmetric) crypto class
to be managed. Being the only class can have some advantages, like
supporting efficiently the cipher modes (ECB, CBC...) and paddings
(None, PKCS5/7...).

OTOH Mono has all it's symmetric crypto in managed code and they all
share the same basic, and generic, SymmetricTransform class to deal with
ciphers and padding modes. It makes it easy to add new algorithms but it
can't be optimal for any of them.

Anyway that only explains a part of the performance difference...

> Included in the tarball is some native code that links against  
> OpenSSL to do the same thing. This is what native performance for the  
> same sized chunks looks like:
> 
> 1.67 seconds under OpenSSL 0.9.7a, Linux 2.6.9 (1.8 Ghz Athlon XP 2500+)
> 1.44 seconds under OpenSSL 0.9.7, OS X 10.4.2 (1.67 Ghz G4 1.2)
> 1.05 seconds under OpenSSL 0.9.7, OS X 10.4.2 (2.0 Ghz G5 3.0)
> .67 seconds under OpenSSL 0.9.7a, Linux 2.6.9 (2.2 Ghz Athlon 64 3200+)
> 
> To be fair, the native implementation is able to take advantage of 64- 
> bit processors when available, while all mono builds in the above  
> benchmarks are 32-bit. The Windows XP machine is the standard 32-bit  
> install, even though the processor is 64-bit. This is a pretty  
> informal benchmark, but all I'm interested in showing here is how bad  
> the AES performance under mono is.
> 
> It was suggested in #mono that I try compiling the mono AES  
> implementation under VS.NET and run it under the Microsoft VM to  
> compare performance..
> The resulting project is available here:
> http://strangecargo.org/~allan/mono/AESSpeedTest.zip
> 
> The same operation benchmarks thusly:
> 22.76 seconds under Microsoft .NET 1.1.4322, Windows XP Pro SP2 (2.0  
> Ghz Athlon 64 3200+)
> 
> The AES code is taken from mono svn, so it may be different from the  
> code used in the mono 1.1.8.1 benchmarks above.
> 
> While switching to the Microsoft VM boosts speed significantly, it  
> looks like significant gains could be made by optimizing the mono  
> RijndaelManaged code.

Yes it could, like many other crypto and non-crypto stuff, gain much by
being optimized. Actually I'd like that very much but don't have much
time to do this.

The biggest performance boost for crypto is, generally, unrolling all
the loops and inlining all methods. However this creates _very_ big
methods (and JIT cost). The current SHA1 code currently use such a trick
(and not everyone likes that ;-).

Another performance enhancement, more a .NET one, would be to return
specialized transforms for each AES modes (e.g. different key size,
different block size), all of them optimized for their specific
settings. However 2.0 has made the RijndaelTransform public so it kinds
of defeat/limit this option. Note that this also creates bigger code -
at least IL code and possible JITted code as well (if the previous
optimization is also applied).

There's also a paper on the Internet about generating specific IL code
for AES depending on your key and other data. This can get very good
performance but, because you can't easily unload generated code, can't
be used in corlib (e.g. someone re-keying frequently would find this
worst than the current code).

Finally, as Paolo suggested, using unsafe code would help - but I really
dislike it. Doing all the crypto in managed code was to gain the
advantages of managed code and we would be loosing some of them that
way. Anyway if you really need more than the 2 previous suggestions I'd
really recommend using a native approach.