[Mono-list] unicode trouble
Marcus
mathpup@mylinuxisp.com
Mon, 9 Feb 2004 19:21:27 -0500
As I recall, when the CM3 Modula-3 compiler added support for unicode, they
used a hybrid scheme where TEXTs (their equivalent of System.String) can
contain both 8-bit and 16-bit "chars". So only the portions of the string
that require more than 8 bits use it. Something similar could be done with
32-bit characters in some future library is compactness were a concern.
By the way, how much performance penality is there for accessing a single 8
one modern 32-bit processors?
On Monday 09 February 2004 7:16 am, Jonathan Pryor wrote:
> I imagine it's due to a memory trade-off. The easiest way for the
> programmer do deal with things would be to just use UCS-32 for all
> Unicode strings. You wouldn't have to worry about code pairs or
> anything else like that.
>
> It would also mean that all strings would require 32-bits for each
> character, which would eat up *lots* of memory for all strings. The
> most common code points -- US, Europe, Asia -- all easily fit within
> 16-bits, *by design*.