[Mono-dev] Fwd: [Mono-patches] r63710 - in trunk/mcs/class/System.Web: System.Web.UI.WebControls Test/System.Web.UI.WebControls

Tue Aug 15 09:40:22 EDT 2006

Hi,

Atsushi Eno:
> Is saving files in utf-8 without BOM possible in general western
> editors land? If yes I like the idea. If not then maybe it is not
> a good solution for us (yeah, not using non-ASCII letters is the
> most pessimistic option).
>
> (BTW I guess, with BOM you guys will get stuck, right?)

Kornél Pál:
> Usually I am using Windows XP that has support for UTF-8 and has no 
> problem with BOM. For example Notepad has no support for saving UTF-8 
> without BOM. Microsoft programs (Notepad, Visual Studio, csc, ...) can 
> recognize UTF-8 without BOM (they try to parse the entire file as UTF-8 
> and they treat it as UTF-8 if it's valid UTF-8, otherwise they use the 
> default ANSI code page). And they recognize BOM of course. For example 
> Visual Studio is saving files with BOM when they had originally and save 
> without BOM when they didn't.

Miguel de Icaza:
> Emacs can write files in UTF-8, I do not think it respects BOM though,
> but I could be wrong.

Jonathan Pryor:
> vim also handles files in UTF-8 just fine.
>
> Personally, I'd go for UTF-8 everywhere, but I know this has caused
> problems before when it was attempted across the entire class library...

Rafael Teixeira:
>I normally use gedit for coding, and it works nicely with utf-8.
> More recently I'm also using MonoDevelop that also deals with utf-8 in
> the proper way.
>
> BOM is just a visible space character for both editors and the
> responsability for preserving it is therefore in the user hands.

So using UTF-8 without BOM seems to be a better choice than UTF-8 with BOM 
because some text editors handle BOM as a character and it will be left in 
the middle of the text rather than being used in the beginning of the file.

> When it comes to mcs sources, we wouldn't want to change things.
> It forces us to change all relevant sources to utf-8 because
> with BOMless utf-8 explicit compiler option -codepage:65001 is
> required.

Several months ago I introduced a CODEPAGE variable in the build system so 
if we want to use UTF-8 (without BOM) explicitly we can either modify 
config-default.make or set CODEPAGE on a per Makefile basis.

I've created a simple C# program (Latin1ToUtf8.cs) that converts source 
files in mcs tree to UTF-8 (without BOM).

If you run this program on mcs tree and do an svn diff you will see that 
there are four kind of files:
- ASCII
- Latin 1
- UTF-8 without BOM
- UTF-8 with BOM

The program recognizes all of them and converts them to UTF-8 without BOM 
and reports all non-ASCII files to console.

So I think some encoding cleanup is required on mcs tree. And if we do so I 
think we should use UTF-8 as it is a long-term solution.

Kornél 
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Latin1ToUtf8.cs
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20060815/aefb9430/attachment.pl