[Mono-bugs] [Bug 77325][Nor] Changed - Possibly incorrect ISO-8859-6 converter

bugzilla-daemon at bugzilla.ximian.com bugzilla-daemon at bugzilla.ximian.com
Fri Jan 27 07:44:01 EST 2006


Please do not reply to this email- if you want to comment on the bug, go to the
URL shown below and enter your comments there.

Changed by bruno at clisp.org.

http://bugzilla.ximian.com/show_bug.cgi?id=77325

--- shadow/77325	2006-01-26 22:38:18.000000000 -0500
+++ shadow/77325.tmp.7907	2006-01-27 07:44:01.000000000 -0500
@@ -115,6 +115,36 @@
 28596 table. I thought that .TXT support is for those .TXT files from
 unicode.org, not your own resources.
 
 We should depend such other kinds of mapping resources, at least
 should be limited to those popular ones such as unicode.org and ICU.
 
+
+------- Additional Comments From bruno at clisp.org  2006-01-27 07:44 -------
+The ucm2cp patch in the attachment supports both the unicode.org .TXT files and 
+the haible.de .TXT files. That's because the format is sufficiently similar: Every line 
+consists of the byte sequence in hexadecimal, followed by whitespace, followed 
+by the Unicode value in hexadecimal, followed by optional comments or character 
+names. 
+ 
+The http://www.unicode.org/Public/MAPPINGS .TXT files were considered the 
+best reference around 2000. In 2001, however, they declared the entire EASTASIA 
+directory obsolete. So now only the ISO8859 and VENDORS directories are 
+relevant. 
+ 
+For Windows-XP encodings, the sources that I used are listed in 
+  http://www.haible.de/bruno/charsets/conversion-tables/sources.html 
+It's 1) the MultiByteToWideChar function from a Windows-XP system. 
+    2) the http://www.microsoft.com/globaldev/reference/ 
+It turns out that the latter reflects the Microsoft encodings as they were in 2000; 
+its contents appears to not have been updated since then. 
+I converted the results to .TXT format. 
+ 
+> Do you know if there is ucm file for windowsxp-CP28596? 
+ 
+I don't have one, and the ICU team (George Rhoten) doesn't have one either. 
+But I have this info a different format: extracted from MultiByteToWideChar, as file 
+  windows-xp/CP28596.TXT 
+in the tar.bz2 file found at 
+http://www.haible.de/bruno/charsets/conversion-tables/ISO-8859-6.html, 
+and it is the same as the ISO8859/8859-6.TXT table found on ftp.unicode.org. 
+ 


More information about the mono-bugs mailing list