[Mono-dev] Fwd: [Mono-patches] r63710 - in trunk/mcs/class/System.Web: System.Web.UI.WebControls Test/System.Web.UI.WebControls
Kornél Pál
kornelpal at gmail.com
Tue Aug 15 09:40:22 EDT 2006
Hi,
Atsushi Eno:
> Is saving files in utf-8 without BOM possible in general western
> editors land? If yes I like the idea. If not then maybe it is not
> a good solution for us (yeah, not using non-ASCII letters is the
> most pessimistic option).
>
> (BTW I guess, with BOM you guys will get stuck, right?)
Kornél Pál:
> Usually I am using Windows XP that has support for UTF-8 and has no
> problem with BOM. For example Notepad has no support for saving UTF-8
> without BOM. Microsoft programs (Notepad, Visual Studio, csc, ...) can
> recognize UTF-8 without BOM (they try to parse the entire file as UTF-8
> and they treat it as UTF-8 if it's valid UTF-8, otherwise they use the
> default ANSI code page). And they recognize BOM of course. For example
> Visual Studio is saving files with BOM when they had originally and save
> without BOM when they didn't.
Miguel de Icaza:
> Emacs can write files in UTF-8, I do not think it respects BOM though,
> but I could be wrong.
Jonathan Pryor:
> vim also handles files in UTF-8 just fine.
>
> Personally, I'd go for UTF-8 everywhere, but I know this has caused
> problems before when it was attempted across the entire class library...
Rafael Teixeira:
>I normally use gedit for coding, and it works nicely with utf-8.
> More recently I'm also using MonoDevelop that also deals with utf-8 in
> the proper way.
>
> BOM is just a visible space character for both editors and the
> responsability for preserving it is therefore in the user hands.
So using UTF-8 without BOM seems to be a better choice than UTF-8 with BOM
because some text editors handle BOM as a character and it will be left in
the middle of the text rather than being used in the beginning of the file.
> When it comes to mcs sources, we wouldn't want to change things.
> It forces us to change all relevant sources to utf-8 because
> with BOMless utf-8 explicit compiler option -codepage:65001 is
> required.
Several months ago I introduced a CODEPAGE variable in the build system so
if we want to use UTF-8 (without BOM) explicitly we can either modify
config-default.make or set CODEPAGE on a per Makefile basis.
I've created a simple C# program (Latin1ToUtf8.cs) that converts source
files in mcs tree to UTF-8 (without BOM).
If you run this program on mcs tree and do an svn diff you will see that
there are four kind of files:
- ASCII
- Latin 1
- UTF-8 without BOM
- UTF-8 with BOM
The program recognizes all of them and converts them to UTF-8 without BOM
and reports all non-ASCII files to console.
So I think some encoding cleanup is required on mcs tree. And if we do so I
think we should use UTF-8 as it is a long-term solution.
Kornél
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: Latin1ToUtf8.cs
Url: http://lists.ximian.com/pipermail/mono-devel-list/attachments/20060815/aefb9430/attachment.pl
More information about the Mono-devel-list
mailing list