[Mono-dev] Should we replace MemoryStream?

Tue Nov 10 08:20:44 EST 2009

Hi,

> For variable, a chunked implementation will work better once the stream
> exceeds the maximum size for the first chunk.  Additional considerations
> for this case are: (a) should the first chunk have a smaller size
> initially to be more efficient for short streams, (b) should chunks be
> reusable and thus bypass the alloc/free cycle, (c) should a call to
> GetBuffer() automatically reset the first chunk with the newly created
> byte array?

I think (b) can be great, but obviously it can always be a
PooledCunkedMemoryStream class, or a different constructor.

> On Nov 10, 2009, at 4:47 AM, Steve Bjorg wrote:
> 
>> Allowing the first chunk to be variable sized doesn't make the code
>> that much more complex.  This would mean in read-only cases, all
>> operations would remain O(1) since the original byte array would be
>> preserved.  For write operations, new chunks would be allocated as
>> needed.  Determining which chunk to read from or write to would need
>> to take into account the first chunk size, but that's it.
>>
>> For the case where someone initializes the ChunkedMemoryStream with an
>> existing byte array, then appends to it, and then calls GetBuffer(),
>> we would end up with the same overhead as before since the
>> MemoryStream would have needed to reallocate the byte array when the
>> first append operation occurred, whereas the ChunkedMemoryStream does
>> it on GetBuffer().  However, if the array needed to be extended
>> multiple times due to many append operations, then the
>> ChunkedMemoryStream will come out ahead again  since it only
>> realloacted the buffer once.  At which point, the realloacted buffer
>> could replace the first chunk so we don't do this again for repeated
>> calls to GetBuffer().
>>
>>
>> On Nov 10, 2009, at 4:21 AM, Leszek Ciesielski wrote:
>>
>>> Choice is not always good, and I think this is one of the cases when
>>> the default (i.e. the MemoryStream implementation) should make the
>>> choices instead presenting them to the user. Though I agree that the
>>> case of constructing a MemoryStream from an existing byte[] would
>>> require a special path in the code, as this is a stream that most
>>> likely won't be resized and in this case users are expecting the
>>> constructor to have a complexity of O(1) and GetBuffer to also be
>>> O(1). The same expectation is probably also true with a fixed size
>>> MemoryStream.
>>>
>>> On Tue, Nov 10, 2009 at 1:09 PM, pablosantosluac at terra.es
>>> <pablosantosluac at terra.es> wrote:
>>>> I agree (especially thinking about the chunk-pool I mentioned) having
>>>> separate classes can be better, so that everyone can choose.
>>>>
>>>> Andreas Nahr wrote:
>>>>> I'm still not sure this is a good idea. A lot of this depends on the
>>>>> use-case for MemoryStream.
>>>>> If
>>>>> 1) A MemoryStream is created with a parameterless constructor and
>>>>> then a lot
>>>>> of data written to it multiple times the ChunkedStream will be better
>>>>> always.
>>>>> 2) If a MemoryStream is created with a parameterless constructor
>>>>> and only
>>>>> gets a few bytes long ChunkedStream might bring considerable overhead.
>>>>> 3) If MemoryStream is created with a fixed size then ChunkedStream
>>>>> will be
>>>>> somewhat, but acceptably slower and have a higher overhead. But it
>>>>> will be
>>>>> totally abysmal once GetBuffer comes into play.
>>>>> 4) If MemoryStream is constructed from a (large) byte array (in the
>>>>> scientific field I'm coming from this is by far the most common
>>>>> usage I've
>>>>> seem; that is basically using MemoryStream as a (read-only)
>>>>> Stream-Wrapper
>>>>> around a byte array) then performance will be abysmal when
>>>>> constructing (if
>>>>> you chunkify e.g. a 500MB byte array) AND again with GetBuffer
>>>>> (recreate the
>>>>> array). So would be O (n) or even O (2*n) instead of O (0).
>>>>>
>>>>> It might be possible to create an implementation that can deal with
>>>>> all this
>>>>> (would need to have variable sized buffers, keep things it gets
>>>>> passed in
>>>>> the constructor alive with small overhead, etc.), but it will be quite
>>>>> complex and come with a large base overhead. And even then the
>>>>> GetBuffer
>>>>> O(n) problem remains in a few scenarios.
>>>>>
>>>>> Maybe it would be better to just leave the class as is and document
>>>>> that for
>>>>> certain scenarios alternative implementations are available that do
>>>>> a MUCH
>>>>> better job. Everybody can easily replace the use of MemoryStream
>>>>> with an
>>>>> alternative implementation if needed. But nobody expects this class to
>>>>> behave completely different from how it originally did (and seems
>>>>> to do in
>>>>> MS.Net).
>>>>>
>>>>> Andreas
>>>>>
>>>>>
>>>> _______________________________________________
>>>> Mono-devel-list mailing list
>>>> Mono-devel-list at lists.ximian.com
>>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>>>
>>> _______________________________________________
>>> Mono-devel-list mailing list
>>> Mono-devel-list at lists.ximian.com
>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>
>>
>> - Steve
>>
>> --------------
>> Steve G. Bjorg
>> http://mindtouch.com
>> http://twitter.com/bjorg
>> irc.freenode.net #mindtouch
>>
>>
> 
>