[Mono-dev] [PATCH] Set hash algorithms block size

Thu Nov 29 08:04:22 EST 2007

Hey,

On Thu, 2007-11-29 at 13:10 +0100, Javier Mart� wrote:
> In all Mono implementations (up to tonight's SVN) of the hash algorithms in
> [corlib]System.Security.Cryptography, all of them fail to override the
> InputBlockSize and OutputBlockSize properties, leaving them at 1 byte.

Actually they don't *fail* it's the *right*, even if not obvious, value.
The meaning isn't all that clear, the [Input|Output]BlockSize is
different than the algorithm block size.

The former is about what the implementation accept as a valid input,
i.e. it's valid to hash a single byte (which doesn't really answer the
output block size ;-)

> This should be mainly harmless, 

yes it is ;-)

> but, combined another bug in CryptoStream which
> makes it ignore the CanTransformMultipleBlocks property of its ICryptoTransform
> when reading (though not when writing!, brings about this doomsday scenario:
> reading a 3MB file through a CryptoStream of, say, SHA1, issues about 3 million
> calls to Buffer.BlockCopy and the same number to HashAlgorithm.TransformBlock.
> That is, CryptoStream and HashAlgorithm are having fun passing millions of
> one-byte arrays between them (and copying them). The speed penalty is extreme,
> as stated below.

Please fill a bug into bugzilla.novell.com for this one.

> This fix is twofold: to begin with, I've modified the abstract base classes of
> all hash algorithms in Sys.Sec.Crypto to include a protected const field called
> SPEC_BLOKCSIZE and overriden InputBlockSize and OutputBlockSize to return this
> value. 

Ok, even if the [Input|Output]BlockSize was bad (and it's not) you still
cannot add new protected methods inside mscorlib (or other most of Mono
assemblies). This would make code compiled with Mono incompatible with
MS.

We have some tools that can help you track if Mono is matching, or not,
MS implementation of the framework (check the web site for code
completion status pages).

> In principle, it should be cleaner to just add them to implementations
> (SHA1Managed instead of SHA1), but the default provided is the one given by the
> spec defining the algorithm, so most (sane) implementations are going to use it
> and those who don't because they compute the hash through some other equivalent
> method can just override the properties again.
> 
> The results are self-evident: even with the bug in CryptoStream (which I'll try
> to patch shortly), the 3M calls turn into about 50K (3M / 64) in SHA1 or 25K
> (3M / 128) in SHA2-512. In terms of time, hashing that 3MB file through CS went
> down from 2.5s to about 0.5s. Thus, the speedup achieved was about 500%!

well you shouldn't be calling any Hash byte-by-byte as they don't do
much buffering themselves. 

Also CryptoStream design isn't very helpful(*) for buffering since it
must deal with other stuff (like padding, modes...). In a perfect world
(or maybe in Crimson) a real buffering class would be helpful in front
of hash algorithms.

(*) I personally discourage it's use for hashing alone.

> 
> The (upcoming) second patch should reduce this number even more in algorithms
> supporting CanTransformMultipleBlocks, so the number of calls to TransformBlock
> (and thus buffer allocs & copies) would be divided by the size passed to Read.
> 
> PD: I'm the "stalker" studend from yesterday Summit session. Sorry I won't be
>     able to attend the rest of the sessions: Boo looked interesting.

Oh, sorry I missed the chance to talk to you :(

> PD2: It's the first time I send a formal diff patch to a project, so sorry if

For some reason part of your patch doesn't apply cleanly to SVN (only
RIPEMD160 applied, all others were rejected).

>      I missed some format rule - though I tried to replicate the formatting of
>      code around my additions. Also, I've compiled the code (no Ws/Es) and
>      lightly tested it hashing some files. It _seems_ I didn't break anything,

While doing your own tests is a good thing to do you should *always* run
the unit tests before sending a patch. Mono's crypto code is well tested
(so any error will likely get caught there) and you'll also have seen
that the [Input|Ouput]BlockSize were meant to be 1 (and not the
algorithm block size).

Sebastien