[Mono-devel-list] StringBuilder patch

Paolo Molaro lupus at ximian.com
Wed Jan 14 11:02:10 EST 2004


On 01/13/04 Ben Maurer wrote:
> > The new code already creates a new string if the string buffer is unused
> > for the most part. Maybe the code can be changed in the future to use
> > some different criteria, like, don't waste more than x bytes etc.
> > I don't see any reason to keep the old code around: it's much more a
> > waste to keep it maintained and have two copies of mostly the same code
> > jitted in the same application.
> 
> The real issue is when you create long lasting strings. Lets say you
> create 10 k strings each with a length of 17 bytes. If the buffer is 32
> bytes, you end up wasting 150 kb.

Yes, there are some worst cases, hence my suggestion to experiment with
some other mechanism than the 'half the buffer'. A Mono built for a
low-mem embedded system most likely wants to use a 'don't waste more
than 4 bytes' or something like that.

> In the MCS tokenizer, for large programs such as corlib, we could end up
> taking a megabyte or more extra memory and holding it.

If you know you're going to hold the strings and if there is a lot of
wasted space, you can copy the strings yourself (sb.Subsctring is likely
to do that on all the sb implementations). But what you don't seem to
get is that:
*) you (mscorlib user) don't know how much memory might be wasted by
sb.ToString()
*) it doesn't make sense to uglyfy the mcs source code (or other
program's source code) when the gain has not even been measured
*) [mscorlib]System.Text.Stringbuilder is what the 99.9% of apps will
use anyway, so we need to make it perform well enough for those apps:
with that in mind, mainatining a separate package is worthless
(especially since that one has other unsolvable performance issues as
well)

FWIW, a corlib compile wasted about 700KB and only half of the ToString
calls are in the tokenizer, so, assuming all of them get kept around
(unlikely), it uses about 300KB more memory than needed.
If the stringbuilder check is changed to:
	if (_str.Length - _length > 16)
the amount of possibly wasted memory goes down to 30KB (measured:
compare this to your 'prediction' of a megabyte or more. It would be
good if people did their homework before writing emails, but maybe this
is an old-fashioned thing...).
The check likely needs to be more complete, since for large strings it's
going to needlessly copy a lot of data around, but the point is that the
issue can be fixed in the current stringbuilder for everyone, there is no 
need to keep an ugly version of the code around.

lupus

-- 
-----------------------------------------------------------------
lupus at debian.org                                     debian/rules
lupus at ximian.com                             Monkeys do it better



More information about the Mono-devel-list mailing list