[Mono-list] unicode trouble
Fabio Montoya [@model-it]
fabio@model-it.com.mx
Mon, 9 Feb 2004 00:07:52 -0600
Sorry I should have said "The original Gabor's question persists..."
Fabio Montoya
| -----Original Message-----
| From: mono-list-admin@lists.ximian.com
| [mailto:mono-list-admin@lists.ximian.com] On Behalf Of Fabio
| Montoya [@model-it]
| Sent: Monday, February 09, 2004 12:04 AM
| To: aranym@adelphia.net; 'gabor'; mono-list@lists.ximian.com
| Subject: RE: [Mono-list] unicode trouble
|
|
|
| Gabor is right Max! The Unicode standard defines characters
| in a 32 bit space, The Unicode Character Space in 32 bits or UCS-32.
|
| For practical reasons, the Unicode standard defines
| transformation formats,
| i.e.:
|
| UTF-8 Unicode transformation format for 8 bits
| UTF-16 Unicode transformation format for 16 bits [Any
| transformation format above 8 bits needs to handle
| byte-ordering issues.]
|
|
| The original Max's question persists...
|
| | > but what about unicode characters, that are simply above
| the 16-bit
| | > limit?
| | >
| | > for example:
| | > OLD ITALIC LETTER A (unicode code: 10300).
| | >
| | > how do you represent those in .net?
|
|
| Cheers!
|
|
| Fabio Montoya
|
|
| | -----Original Message-----
| | From: mono-list-admin@lists.ximian.com
| | [mailto:mono-list-admin@lists.ximian.com] On Behalf Of max
| | Sent: Sunday, February 08, 2004 10:04 PM
| | To: gabor; mono-list@lists.ximian.com
| | Subject: Re: [Mono-list] unicode trouble
| |
| | Hi Gabor,
| | I think you're confused. Characters in .NET are 16 bits
| | BECAUSE they are unicode. 16 bits = 2 bytes = 65536 values.
| |
| | a way to check that is simple. here's some C# example code:
| |
| | string s = "a";
| | s += (char)10300;
| |
| | Console.WriteLine("s = " + s);
| | Console.WriteLine("len = " + s.Length);
| |
| | for (int i = 0; i < s.Length; i++ ) {
| | Console.WriteLine("s["+i+"] = " + (int)s[i]);
| | }
| |
| | max
| |
| | On Sunday 08 February 2004 15:19, gabor wrote:
| | > hi,
| | >
| | > as i understand, characters in .net are 16-bit values.
| | >
| | > but what about unicode characters, that are simply above
| the 16-bit
| | > limit?
| | >
| | > for example:
| | > OLD ITALIC LETTER A (unicode code: 10300).
| | >
| | > how do you represent those in .net?
| | >
| | > i tried to open a textfile containing this old-italic-a:
| | >
| | > - the length and indexing methods of string all said that
| | old-italic-a
| | > is actually 2 letters => it doesn't work
| | > - when writing the string back to an utf8 encoded
| textfile, then it
| | > was correctly written.
| | >
| | > so for me it seems that dotnet (mono) uses utf16 as
| | internal encoding
| | > format, but indexing (and length) doesn't use that information.
| | >
| | > am i correct?
| | >
| | > are there any ways to handle those characters in dotnet?
| | >
| | > for example the new java-1.5 contains some new
| | string-methods that can
| | > handle these characters. it's not perfect in java, but at
| | least there
| | > is something.
| | >
| | > if someone wants to play with it, i attached a text file
| containing
| | > the text "marrakesh", encoded in utf8, where i replaced the
| | first "a"
| | > with old-italic-a (it's easy to do with a little iconv
| to-from ucs4
| | > and hexedit)
| | >
| | > thanks,
| | > gabor farkas
| |
| | _______________________________________________
| | Mono-list maillist - Mono-list@lists.ximian.com
| | http://lists.ximian.com/mailman/listinfo/mono-list
| |
| |
|
|
| _______________________________________________
| Mono-list maillist - Mono-list@lists.ximian.com
| http://lists.ximian.com/mailman/listinfo/mono-list
|
|