[Mono-dev] Should we replace MemoryStream?

Tue Nov 10 08:11:56 EST 2009

After taking a closer look at the constructors for MemoryStream, I  
need to amend my earlier response.  In all cases where a byte array is  
passed into the constructor, the MemoryStream has fixed capacity.  So  
the case described below where a MemoryStream would start off with a  
byte array and then be appended to cannot happen.

In short, MemoryStream can operate in two modes: fixed and variable.   
For fixed, there is nothing to be gained by using chunks.  The  
existing implementation is optimal for all cases.  This is the  
situation that Andreas is referring to.

For variable, a chunked implementation will work better once the  
stream exceeds the maximum size for the first chunk.  Additional  
considerations for this case are: (a) should the first chunk have a  
smaller size initially to be more efficient for short streams, (b)  
should chunks be reusable and thus bypass the alloc/free cycle, (c)  
should a call to GetBuffer() automatically reset the first chunk with  
the newly created byte array?

Am I missing anything?

On Nov 10, 2009, at 4:47 AM, Steve Bjorg wrote:

> Allowing the first chunk to be variable sized doesn't make the code  
> that much more complex.  This would mean in read-only cases, all  
> operations would remain O(1) since the original byte array would be  
> preserved.  For write operations, new chunks would be allocated as  
> needed.  Determining which chunk to read from or write to would need  
> to take into account the first chunk size, but that's it.
>
> For the case where someone initializes the ChunkedMemoryStream with  
> an existing byte array, then appends to it, and then calls GetBuffer 
> (), we would end up with the same overhead as before since the  
> MemoryStream would have needed to reallocate the byte array when the  
> first append operation occurred, whereas the ChunkedMemoryStream  
> does it on GetBuffer().  However, if the array needed to be extended  
> multiple times due to many append operations, then the  
> ChunkedMemoryStream will come out ahead again  since it only  
> realloacted the buffer once.  At which point, the realloacted buffer  
> could replace the first chunk so we don't do this again for repeated  
> calls to GetBuffer().
>
>
> On Nov 10, 2009, at 4:21 AM, Leszek Ciesielski wrote:
>
>> Choice is not always good, and I think this is one of the cases when
>> the default (i.e. the MemoryStream implementation) should make the
>> choices instead presenting them to the user. Though I agree that the
>> case of constructing a MemoryStream from an existing byte[] would
>> require a special path in the code, as this is a stream that most
>> likely won't be resized and in this case users are expecting the
>> constructor to have a complexity of O(1) and GetBuffer to also be
>> O(1). The same expectation is probably also true with a fixed size
>> MemoryStream.
>>
>> On Tue, Nov 10, 2009 at 1:09 PM, pablosantosluac at terra.es
>> <pablosantosluac at terra.es> wrote:
>>> I agree (especially thinking about the chunk-pool I mentioned)  
>>> having
>>> separate classes can be better, so that everyone can choose.
>>>
>>> Andreas Nahr wrote:
>>>> I'm still not sure this is a good idea. A lot of this depends on  
>>>> the
>>>> use-case for MemoryStream.
>>>> If
>>>> 1) A MemoryStream is created with a parameterless constructor and  
>>>> then a lot
>>>> of data written to it multiple times the ChunkedStream will be  
>>>> better
>>>> always.
>>>> 2) If a MemoryStream is created with a parameterless constructor  
>>>> and only
>>>> gets a few bytes long ChunkedStream might bring considerable  
>>>> overhead.
>>>> 3) If MemoryStream is created with a fixed size then  
>>>> ChunkedStream will be
>>>> somewhat, but acceptably slower and have a higher overhead. But  
>>>> it will be
>>>> totally abysmal once GetBuffer comes into play.
>>>> 4) If MemoryStream is constructed from a (large) byte array (in the
>>>> scientific field I'm coming from this is by far the most common  
>>>> usage I've
>>>> seem; that is basically using MemoryStream as a (read-only)  
>>>> Stream-Wrapper
>>>> around a byte array) then performance will be abysmal when  
>>>> constructing (if
>>>> you chunkify e.g. a 500MB byte array) AND again with GetBuffer  
>>>> (recreate the
>>>> array). So would be O (n) or even O (2*n) instead of O (0).
>>>>
>>>> It might be possible to create an implementation that can deal  
>>>> with all this
>>>> (would need to have variable sized buffers, keep things it gets  
>>>> passed in
>>>> the constructor alive with small overhead, etc.), but it will be  
>>>> quite
>>>> complex and come with a large base overhead. And even then the  
>>>> GetBuffer
>>>> O(n) problem remains in a few scenarios.
>>>>
>>>> Maybe it would be better to just leave the class as is and  
>>>> document that for
>>>> certain scenarios alternative implementations are available that  
>>>> do a MUCH
>>>> better job. Everybody can easily replace the use of MemoryStream  
>>>> with an
>>>> alternative implementation if needed. But nobody expects this  
>>>> class to
>>>> behave completely different from how it originally did (and seems  
>>>> to do in
>>>> MS.Net).
>>>>
>>>> Andreas
>>>>
>>>>
>>> _______________________________________________
>>> Mono-devel-list mailing list
>>> Mono-devel-list at lists.ximian.com
>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>>
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list at lists.ximian.com
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>
>
> - Steve
>
> --------------
> Steve G. Bjorg
> http://mindtouch.com
> http://twitter.com/bjorg
> irc.freenode.net #mindtouch
>
>