[Mono-dev] Question about encodings - possible documentation bug?

Mads Bondo Dydensborg mbd at dbc.dk
Wed Jul 4 08:28:49 EDT 2007

Hi there

It is my understanding, that mono strings are always UTF16 internally.

But, what encoding does the source files needs to be?

The documentation (man gmcs) suggest this:

"By default files  will be  processed  in  the  Latin-1 code page."

It happens to be, that I have some source files in Latin1. My locale is 
en_US.UTF-8. It appears that some characters gets to be ignored, when strings 
are constructed:

string test = "foo æøå bar";

(Middle three characters are Danish chars, with binary rep e6  f8 e5 in 
Latin1, aka the encoding of the sourcefile. )

The string test will, on runtime, be printed (and I have checked this by 
traversing the string) as "foo  bar".

Now, using gmcs, with -codepage:28591 (Latin1) makes a string, that _does_ 
have the characters, sugggesting that the documentation are in error.

So, isn't this a bug in the docs, and should (g)mcs not complain, if it finds 
chars not supported?



