[Mono-dev] poor compression with mono 2.6 (Re: mono 2.6+ : Garbage added to BinaryWriter/GZipStream output?)

Hin-Tak Leung hintak_leung at yahoo.co.uk
Wed Jan 27 21:49:58 EST 2010


Hi,

I found the source of my problem with the poor GZipStream compression with mono 2.6 - it is r138254, which rewrites the gzipstream implementation:

------------------
Author: gonzalo <gonzalo at e3ebcda4-bce8-0310-ba0a-eca2169e7518>
Date:   Tue Jul 21 02:25:00 2009 +0000

    2009-07-20 Gonzalo Paniagua Javier <gonzalo at novell.com>
    
        * Makefile.am: replaced zlib_macros.c with zlib-helper.c
        * zlib_macros.c: Removed file.
        * zlib-helper.c: new interface for DeflateStream. Flush() actually
        does something.
------------------

The problem is that this change makes the zlib code compress per each write (before the change, it buffers by the zlib default, which is trying to compress per 32k input). Most of my little program culculates and write 1 byte out, so it is mostly 5-byte zlib header overhead + 1 byte! and the file size is increased about 6 times.

I have tried and tested this patch, which restore the zlib default to the compression code to buffer data before compression. With this patch, I can just replace libMonoPosixHelper.so and get compression behavior similiar to mono 2.4.

So the mono-2.4 behavior and with this patch, 1.2MB data is written out as 280k ; without this patch, mono 2.6 writes 8.1MB out (about x6). Curiously, Microsoft .Net's runtime's Gzipstream implementation seems to do something between - it generates a filesize of 2MB, which possibly means it buffers and compresses by 4-byte chunks, not 1 byte and not 32k. (5-byte overhead + 2-3 bytes after compression).

So a question for Gonzalo Paniagua Javier: do I need to file a bug report properly to get the attached patch committed? It should be obvious why it does what it does.

Cheers,
Hin-Tak

--- On Wed, 27/1/10, Hin-Tak Leung <hintak_leung at yahoo.co.uk> wrote:

> (I am not on the list - please CC)
> 
> I have a small application which writes gzip'ed data like
> this:
> 
> sw = new BinaryWriter(new GZipStream(new
> FileStream(filename,
>       FileMode.Create, FileAccess.Write,
> FileShare.None),
>       CompressionMode.Compress, true
>       );
> sw.Write(...);
> sw.Write(...);
> sw.Flush();
> sw.Close();
> 
> It use to work fine with mono 2.4, and still does in a way
> with mono 2.6 . 
> What happens is that now it seems to append a lot of extra
> garbage to the end of the output.
> 
> The uncompressed data is 1234127 bytes, and still
> recoverable in a indentical manner from the output with gzip
> -dc; but the output file is now 8126815 bytes with mono
> 2.6.x instead of 282363 bytes under mono 2.4.x , so almost
> 8MB of garbage is added somehow. I tried truncating the
> file, but it seems to truncate the gz stream as well.
> 
> So does anybody else observe similiar problems and/or know
> of any reasons why mono 2.6 behaves differently in this
> regard compared to mono 2.4?
> 
> 
> 
> 
>


      
-------------- next part --------------
A non-text attachment was scrubbed...
Name: gzipstream-patch
Type: application/octet-stream
Size: 423 bytes
Desc: not available
Url : http://lists.ximian.com/pipermail/mono-devel-list/attachments/20100128/9ef66638/attachment-0001.obj 


More information about the Mono-devel-list mailing list