[Mono-bugs] [Bug 23541][Wis] Changed - mcs needs to deal with the encoding of source files
bugzilla-daemon@rocky.ximian.com
bugzilla-daemon@rocky.ximian.com
4 Sep 2002 16:10:00 -0000
Please do not reply to this email- if you want to comment on the bug, go to the
URL shown below and enter your comments there.
Changed by miguel@ximian.com.
http://bugzilla.ximian.com/show_bug.cgi?id=23541
--- shadow/23541 Tue Aug 27 20:29:44 2002
+++ shadow/23541.tmp.23053 Wed Sep 4 12:10:00 2002
@@ -116,6 +116,27 @@
So IMHO this must be done either by the runtime or by our StreamReader implementation.
I don't want to "force" this bug back into the runtime, so keeping it here and setting priority to wishlist.
+
+------- Additional Comments From miguel@ximian.com 2002-09-04 12:10 -------
+CSC has an option called /codepage:XXX which is used for specifying
+the codepage for the input file.
+
+The documentation for codepage claims that if the source code is in
+either the default code page, or Unicode or UTF-8 the compiler will be
+able to figure things out on its own.
+
+I am assuming they mean `UTF-16' when they say Unicode.
+
+Distinguishing what microsoft calls "Unicode" and "Unicode big endian"
+on Windows is easy, the first two bytes are 0xfe 0xff (unicode) or
+0xfe 0xff (unicode big endian).
+
+On *windows* they use 0xef 0xbb 0xbf for Utf-8 encoded files
+
+So I assume the rest is supposed to be encoded in the current
+"codepage", an interesting concept, because I do not know how code
+pages map to character sets on Unix or how to tell what the current
+codepage is.