[Gtk-sharp-list] Re: Encoding problems

Gaute B Strokkenes gs234-monodevelop@srcf.ucam.org
Fri, 16 Apr 2004 20:23:33 +0200

On 16 apr 2004, jonpryor@vt.edu wrote:

>>> Does it work if you add -codepage:utf8 to the mcs compile line?
>> Yes, it works, thanks :)
>> But shouldn't that be taken care of by MonoDevelop? 
>> And also - is UTF-16 a standard for Gtk# applications?
> Not having used MonoDevelop yet (yes, I'm evil!), I can only
> guess...
> I suspect the problem is the lack of a BOM (Byte Order Mark), which
> would let the compiler know the byte order of the file.

Really, the problem is lack of autodetection.  It's easy enough to
recognise UTF-8, because the byte sequences that form valid UTF-8
sequences are very distinctive: While it's technically possible for,
say a (highly contrived) ISO-8859-1 encoded file to consist only of
byte sequences that are valid UTF-8 sequences, that just doesn't
happen in practise.

> UTF-16 requires the presence of a BOM (0xFFFE or OXFEFF, depending
> on big-endian or little-endian, not necessarily in that order), so
> if the BOM is present the compiler will know what codepage to use.
> UTF-8 doesn't require it.  Which means it is impossible to
> distinguish between a UTF-8 encoded file and a file encoded in the
> local codepage.  Consequently, mcs assumes that the local codepage
> is used.

I would recommend scanning the file to check for UTF-8--ness first (in
the absence of any explicit declaration.)

> The solution is to either tell mcs the correct codepage, which is
> what -codepage:UTF-8 does,

It's always a good idea to be explicit.

> or to insert a UTF-8 encoded BOM at the beginning of the file.

I strongly disrecommend that; the UTF-8 BOM will break a lot of other
stuff on a unix system.

Have a look at:




Gaute Strokkenes                        http://www.srcf.ucam.org/~gs234/
Now, I think it would be GOOD to buy FIVE or SIX STUDEBAKERS