[Mono-bugs] [Bug 534137] New: StreamReader fails to detect "Unicode" encoding if only given 2 characters, not 4 in first read
bugzilla_noreply at novell.com
bugzilla_noreply at novell.com
Tue Aug 25 13:27:40 EDT 2009
http://bugzilla.novell.com/show_bug.cgi?id=534137
Summary: StreamReader fails to detect "Unicode" encoding if
only given 2 characters, not 4 in first read
Classification: Mono
Product: Mono: Class Libraries
Version: unspecified
Platform: x86
OS/Version: Linux
Status: NEW
Severity: Normal
Priority: P5 - None
Component: System
AssignedTo: mono-bugs at lists.ximian.com
ReportedBy: novellbugzilla at c-hett.de
QAContact: mono-bugs at lists.ximian.com
Found By: ---
User-Agent: Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.0; SLCC1;
NET CLR 2.0.50727; Media Center PC 5.0; .NET CLR 3.5.21022; .NET CLR
3.5.30729; .NET CLR 1.1.4322; .NET CLR 3.0.30729)
I open a process and redirect StandardOutput and this process first writes the
little endian "Unicode" BOM to stdout. Then it opens a file (needs some time)
and then outputs the text.
I see race conditions in my whole program (c# and helper process together),
sometimes I can read the text in C# from the helper process, sometimes not.
I use the Basestream from Standardoutput of the process and use my own instance
of StreamReader:
m_reader=new
StreamReader(m_process.StandardOutput.BaseStream,Encoding.GetEncoding(949),true);
I looked inside
"C:\Programme\Mono-2.2\mono-2.2\mcs\class\corlib\System.IO\StreamReader.cs"
(I'm sorry, I cannot download the latest source code, the BZ2-Files seems to be
broken, my winrar only fives me 900kb worth of TAR-file. I'll check that later)
#if !NET_2_0
if (input_buffer [0] == 0xff && input_buffer [1] == 0xfe){
this.encoding = Encoding.Unicode;
return 2;
}
#endif
This WOULD work, but I use the .NET 2.0 profile
if (input_buffer [0] == 0xfe && input_buffer [1] == 0xff){
this.encoding = Encoding.BigEndianUnicode;
return 2;
}
No match
if (count < 3)
return 0;
Here it fails for me: I'm thrown out whenevery my C# programm was quick enough
to read the first 2 bytes from the helper before the helper was able to write
some real data and not only the BOM to my stream
#if NET_2_0
if (count < 4) {
if (input_buffer [0] == 0xff && input_buffer [1] == 0xfe &&
input_buffer [2] != 0) {
this.encoding = Encoding.Unicode;
return 2;
}
return 0;
}
This would be my code path, but because my BaseStream only provided 2 bytes on
its first read in some instances, no luck
Reproducible: Sometimes
Steps to Reproduce:
Write a programm (or a network server) that outputs a 2 byte "Unicode" BOM,
waits some time and then outputs its data.
Or fake it using a custom Stream.
Use a streamreader on it. Let it detect the encoding.
Make the data that comes from the Stream encoded in "Unicode"-Encoding that is
in little endian
Actual Results:
It will not detect the right encoding unless the first read from the basestream
returns 3 or 4 bytes at least.
Expected Results:
It should default to UCS16. UTF32 is very rare. Does anybody use that?
I think that if StreamReader is given only 2 bytes and it looks like a
"Unicode" BOM, then it should be considered like that. Microsoft does the same.
Only if you have 3 or 4 characters, check whether the third byte is 0.
Otherwise default to UCS16.
Those people who use UTF32 will have no problem if THEIR Basestream is able to
deliver at least 4 bytes on its first read. But now I have a problem and THEY
woudldn't gain anything because with too few bytes I think no detection happens
at all.
Or block returning from StreamReader until you have enough bytes for a 100%
perfect BOM detection, but that would break everything I think
--
Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.
More information about the mono-bugs
mailing list