[Mono-list] unicode trouble
Mon, 09 Feb 2004 00:19:03 +0100
as i understand, characters in .net are 16-bit values.
but what about unicode characters, that are simply above the 16-bit
OLD ITALIC LETTER A (unicode code: 10300).
how do you represent those in .net?
i tried to open a textfile containing this old-italic-a:
- the length and indexing methods of string all said that old-italic-a
is actually 2 letters => it doesn't work
- when writing the string back to an utf8 encoded textfile, then it was
so for me it seems that dotnet (mono) uses utf16 as internal encoding
format, but indexing (and length) doesn't use that information.
am i correct?
are there any ways to handle those characters in dotnet?
for example the new java-1.5 contains some new string-methods that can
handle these characters. it's not perfect in java, but at least there is
if someone wants to play with it, i attached a text file containing the
text "marrakesh", encoded in utf8, where i replaced the first "a" with
(it's easy to do with a little iconv to-from ucs4 and hexedit)
Content-Disposition: attachment; filename=marrakesh.txt
Content-Type: text/plain; name=marrakesh.txt; charset=