[Mono-dev] UTF8 encoding/decoding
marek.safar at gmail.com
Tue Apr 30 07:53:25 UTC 2013
> I've discovered the directory /mono/mcs/class/corlib/Test/System.Text just
> now ... I never noticed it before, sorry.
> The problem with my tests is that they are written for comparing its
> output against the output of an existing reference runtime (in this moment
> ms-net). I will take a look to the existing UTF8EncodingTest.cs (and
> others), and I will try to add more test cases to cover the bugs that I
> reported and some other problems that I saw in the old implementation.
> As you can imagine I'm not very familiarized with the Mono sources, and I
> haven't found a mono tests' policy (how tests are integrated, best
> practices, etc), how the test mechanism is used (other description more
> specific for the mono case than than http://nunit.com/ web page), ... So
> any help like docs, links... will be useful :D
You can find general Mono tests page here
> On Mon, Apr 29, 2013 at 6:48 PM, Marek Safar <marek.safar at gmail.com>wrote:
>> Hi Gerardo,
>> This is very good Mono improvement. Could change your tests to fit mono
>> nunit test style. It should not be too hard if you need guidance let me
>> know or look how it's done for other mono parts.
>> Secondly to make the review and merge easier for us could you send pull
>> request with your changes.
>> I am Gerardo García Peña and I'm new in this list.
>>> Some months ago started working with mono in a project which demands a
>>> very precise manipulation of UTF8 (and other encodings) streams. When I
>>> started to write code I observed that the mono UTF8 implementation is very
>>> buggy, while the MS.NET implementation is quite good. Then I started to
>>> isolate the bugs and filled some bugs in the Ximian's bugzilla  .
>>> They're still there and unfixed, but I think they are important: an
>>> incompatibility in the text codec subsystems virtually affects any
>>> application that need portability between Microsoft and Mono platforms.
>>> Specially from the data integrity point of view, and in some cases
>>> availability security issues (indexes and counters reported by the
>>> conversion methods and throwed exceptions could make apps running on the
>>> Mono environment to enter into infinite loops, making apps running on the
>>> mono runtime vulnerable to DoS attacks).
>>> The bugs are still there (unresolved), and during this time I have found
>>> some more, so I decided to start patching the UTF8 libraries (and in the
>>> future, if this patch is accepted, I will continue working on other buggy
>>> codec that appears).
>>> The patch that I propose is an important modification of the file
>>> /mono/mcs/class/corlib/System.Text/UTF8Encoding.cs and some minor changes
>>> in other generic classes in System.Text. The targets of my patch are the
>>> - give a complete and good quality UTF8 coder & decoder implementation,
>>> - at least it is as much efficient as the old implementation,
>>> - better error handling and quick resync when bad sequences are found,
>>> - fix the index field in the Fallback exceptions (it is a key feature
>>> if one
>>> program want to handle strings with errors),
>>> - refactorize and make code more maintainable,
>>> - full compatibility with the .NET implementation (behaviour is
>>> exactly the
>>> same in front of bad and good sequences),
>>> - complete some pending or incomplete features (MonoTODO) like
>>> Encoder::FallbackException::IsUnknownSurrogate() or use of BOM
>>> Please note that in spite of presenting a full-compatible implementation
>>> of this codec with the Microsoft implementation, my changes are not based
>>> on Microsoft's work, and they are totally written from scratch. I have not
>>> reversed any code and the behaviour of my patches has been tunned using an
>>> extensive and exhaustive test case.
>>> My test case uses several public UTF8 test cases and one specific and
>>> giant UTF16 test case built automatically. The test case must be executed
>>> first on the Mono runtime environment and once again on the Microsoft
>>> runtime. The output of the test case are two directories (one for mono,
>>> another for net) documenting the output of (and exceptions thrown) the
>>> Convert() method. Once both executions are finished, it should not exist
>>> any difference between the
>>> two output directories.
>>> The test case is focused only on the Convert() method because it allows
>>> to test any variation of the input. My implementation (and probably
>>> Microsoft's too) is based on two coder/decoder functions that are called by
>>> all the other public methods. Because that reason the best way to test both
>>> implementations is using the method that exposes more directly the internal
>>> I posted the changes and my test suite in a github branch, and I also
>>> have attached them to this mail (if you want to test it quickly without
>>> doing any git operation):
>>> - mono branch with my patches
>>> - test suite
>>> To run the test suite, run the makefile and then execute the program
>>> convert.exe in the two platforms. You'll get a 'cnvout-mono' and
>>> 'cnvout-other' directories which will contain the output of each test run.
>>> Once they have finished run the 'mkdiff.sh' shell script. This script will
>>> make a 'cnvout-diff' directory, which should be empty if all files are
>>> I know that it is an important patch because it affects the corlib
>>> libraries which are critical for the Mono runtime. If you have any question
>>> or note about the code, or if I can do anything to improve this patch, I
>>> will be glad to help.
>>> Thanks in advance,
>>> Gerardo García Peña
>>>  10692 https://bugzilla.xamarin.com/show_bug.cgi?id=10692
>>>  10697 https://bugzilla.xamarin.com/show_bug.cgi?id=10697
>>> Mono-devel-list mailing list
>>> Mono-devel-list at lists.ximian.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Mono-devel-list