[Mono-bugs] [Bug 534137] New: StreamReader fails to detect "Unicode" encoding if only given 2 characters, not 4 in first read

bugzilla_noreply at novell.com bugzilla_noreply at novell.com
Tue Aug 25 13:27:40 EDT 2009


http://bugzilla.novell.com/show_bug.cgi?id=534137


           Summary: StreamReader fails to detect "Unicode" encoding if
                    only given 2 characters, not 4 in first read
    Classification: Mono
           Product: Mono: Class Libraries
           Version: unspecified
          Platform: x86
        OS/Version: Linux
            Status: NEW
          Severity: Normal
          Priority: P5 - None
         Component: System
        AssignedTo: mono-bugs at lists.ximian.com
        ReportedBy: novellbugzilla at c-hett.de
         QAContact: mono-bugs at lists.ximian.com
          Found By: ---


User-Agent:       Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1;
NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.21022; .NET CLR
3.5.30729; .NET CLR 1.1.4322; .NET CLR 3.0.30729)

I open a process and redirect StandardOutput and this process first writes the
little endian "Unicode" BOM to stdout. Then it opens a file (needs some time)
and then outputs the text.

I see race conditions in my whole program (c# and helper process together),
sometimes I can read the text in C# from the helper process, sometimes not.

I use the Basestream from Standardoutput of the process and use my own instance
of StreamReader:

m_reader=new
StreamReader(m_process.StandardOutput.BaseStream,Encoding.GetEncoding(949),true);

I looked inside 
"C:\Programme\Mono-2.2\mono-2.2\mcs\class\corlib\System.IO\StreamReader.cs"

(I'm sorry, I cannot download the latest source code, the BZ2-Files seems to be
broken, my winrar only fives me 900kb worth of TAR-file. I'll check that later)

#if !NET_2_0
                if (input_buffer [0] == 0xff && input_buffer [1] == 0xfe){
                    this.encoding = Encoding.Unicode;
                    return 2;
                }
#endif

This WOULD work, but I use the .NET 2.0 profile

                if (input_buffer [0] == 0xfe && input_buffer [1] == 0xff){
                    this.encoding = Encoding.BigEndianUnicode;
                    return 2;
                }

No match

                if (count < 3)
                    return 0; 

Here it fails for me: I'm thrown out whenevery my C# programm was quick enough
to read the first 2 bytes from the helper before the helper was able to write
some real data and not only the BOM to my stream


#if NET_2_0
                if (count < 4) {
                    if (input_buffer [0] == 0xff && input_buffer [1] == 0xfe &&
input_buffer [2] != 0) {
                        this.encoding = Encoding.Unicode;
                        return 2;
                    }
                    return 0;
                }

This would be my code path, but because my BaseStream only provided 2 bytes on
its first read in some instances, no luck

Reproducible: Sometimes

Steps to Reproduce:
Write a programm (or a network server) that outputs a 2 byte "Unicode" BOM,
waits some time and then outputs its data.
Or fake it using a custom Stream.

Use a streamreader on it. Let it detect the encoding.

Make the data that comes from the Stream encoded in "Unicode"-Encoding that is
in little endian

Actual Results:  
It will not detect the right encoding unless the first read from the basestream
returns 3 or 4 bytes at least.

Expected Results:  
It should default to UCS16. UTF32 is very rare. Does anybody use that?
I think that if StreamReader is given only 2 bytes and it looks like a
"Unicode" BOM, then it should be considered like that. Microsoft does the same.
Only if you have 3 or 4 characters, check whether the third byte is 0.
Otherwise default to UCS16.
Those people who use UTF32 will have no problem if THEIR Basestream is able to
deliver at least 4 bytes on its first read. But now I have a problem and THEY
woudldn't gain anything because with too few bytes I think no detection happens
at all.

Or block returning from StreamReader until you have enough bytes for a 100%
perfect BOM detection, but that would break everything I think

-- 
Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.


More information about the mono-bugs mailing list