[Mono-devel-list] The first (attempt to checkin) managed collation patch

Paolo Molaro lupus at ximian.com
Thu Jul 21 12:40:47 EDT 2005

On 07/21/05 Atsushi Eno wrote:
> > It'd be nice to optimize the format *before* we check in the binary
> > files, since optimizing will require some frequent changes.
> As the quality of data storage, yes they could be made smaller.
> The table could be much smaller even if I introduced simple
> run-length compression.
> But it also means that the live arrays (used in the collator code)
> must be created apart from internal pointer to the managed resources.
> I wonder if it makes sense.

180 KB is reasonable for now. It would be good to be able to
build a configuration that allows just simple us-ascii collation
for embedded systems that really care.
We can look at optimizing the size when the speed issues are sorted out.

> The corresponding code is already in svn,
> mcs/class/corlib/Mono.Globalization.Unicode/MSCompatUnicodeTable.cs.
> It creates a BinaryReader instance for each manifest resource stream,
> and for byte arrays it does Read(array, 0, size).

This is fine while you do testing, but for the release it will need to
be changed to not copy the data in managed arrays. Just make sure access to
the level*, categories, etc fields are encapsulated so you can easily change
them later to be unamanaged byte and ushort pointers.
Also, since I noticed, _never_ use a string as a lock object (forLock
in that file).

> If BinaryWriter.Write() (other than byte parameter) writes its
> stream output in different byte order depending on the platform
> or BinaryReader reads stream as well, how can I know that platform
> dependent byte order?

I'm not sure BinaryWriter/BinaryReader does the right thing, but it's use
must be dropped anyway, because the data must not be copied.

> > It might be nice to put the files in /usr/share. A few things we win by
> > doing that:
> How can we get the precise file location, especially when we specify
> different GAC to reference mscorlib?

Don't worry about that, the runtime will load the file for you, this is the last
of the issues. It would be good if the file contained a version id that you can
check for consisntency, so please add that.

> >       * It keeps the size of our tarballs and monolites down because the
> >         included mscorlib does not have the data
> Similarly, when the collation resources are split, then CompareInfo
> in mscorlib will be messed. It is similar to what happens when we
> have inconsistent version of mscorlib.dll and the runtime.

What I'd like is a small data file embedded inside the mono binary that
contains support for the most common locales, so it's always available.
The rest could be loaded at runtime, if it's installed. How hard would it be
for you code to deal with this case?


lupus at debian.org                                     debian/rules
lupus at ximian.com                             Monkeys do it better

More information about the Mono-devel-list mailing list