[Mono-list] unicode trouble

Marcus mathpup@mylinuxisp.com
Mon, 9 Feb 2004 19:21:27 -0500


As I recall, when the CM3 Modula-3 compiler added support for unicode, they 
used a hybrid scheme where TEXTs (their equivalent of System.String) can 
contain both 8-bit and 16-bit "chars". So only the portions of the string 
that require more than 8 bits use it. Something similar could be done with 
32-bit characters in some future library is compactness were a concern.

By the way, how much performance penality is there for accessing a single 8 
one modern 32-bit processors?

On Monday 09 February 2004 7:16 am, Jonathan Pryor wrote:

> I imagine it's due to a memory trade-off.  The easiest way for the
> programmer do deal with things would be to just use UCS-32 for all
> Unicode strings.  You wouldn't have to worry about code pairs or
> anything else like that.
>
> It would also mean that all strings would require 32-bits for each
> character, which would eat up *lots* of memory for all strings.  The
> most common code points -- US, Europe, Asia -- all easily fit within
> 16-bits, *by design*.