[MonoDevelop] Souce files are UTF-8... are we sure?

Steve Deobald steve@citygroup.ca
Thu, 8 Apr 2004 00:36:29 -0600 (CST)


Hey guys (mostly Todd),

I wrote this tonight (er, yesterday?), with no luck:

// src/Addins/DisplayBindings/SourceEditor/SourceEditorBuffer.cs:
public static SourceEditorBuffer CreateTextBufferFromFile (string filename)
{
  FileStream fs = new FileStream(filename, FileMode.Open);
  fs.Position = 0;
  byte[] preamble = Encoding.UTF8.GetPreamble();
  for (int j = 0; j < preamble.Length; j++)
  {
    if (preamble[j] != fs.ReadByte())
    {
      System.Console.WriteLine("CreateTextBufferFromFile(): file is not
UTF-8. Skipping.");
      return (null);
    }
  }
  System.Console.WriteLine("CreateTextBufferFromFile(): file is UTF-8.
Loading into sourcebuffer...");
  SourceEditorBuffer buff = new SourceEditorBuffer ();
  buff.LoadFile (filename);
  return buff;
}
// end


So I weaseled my XP box back from a very cute girl who was over here using
it so I could write a test case of .NET running this code properly. I
wrote `Class.cs' that you can find here:
http://nofeet.com/_garbage/enc_bug/
...and tested it against `blah.txt' and `blah.exe' found in that same
directory.

Just before submitting the System.Text bugreport, however, I tried running
the same test case using those 2 Windows files on this FC1/mono box. Lo
and behold, it recognizes the .txt as UTF-8 (which was set in Notepad)
just fine.

>From playing around, I know the MD code above recognizes all text files
(in a 'hello world' GTK# app, or the MD source tree) as Encoding.ASCII -
but it also recognizes binary files this way, unfortunately.

Does anyone have any suggestions? Are the files in the MD doubtlessly
UTF-8? (In which case I'll have to file this bug.) Or is it possible that
they are encoded differently?

Just thought I'd check with you guys before I posted a bug that wasn't.
Thanks!

.steve