[Mono-dev] UTF8 encoding/decoding
Gerardo García Peña
killabytenow at gmail.com
Mon Apr 29 21:03:46 UTC 2013
I've discovered the directory /mono/mcs/class/corlib/Test/System.Text just
now ... I never noticed it before, sorry.
The problem with my tests is that they are written for comparing its output
against the output of an existing reference runtime (in this moment
ms-net). I will take a look to the existing UTF8EncodingTest.cs (and
others), and I will try to add more test cases to cover the bugs that I
reported and some other problems that I saw in the old implementation.
As you can imagine I'm not very familiarized with the Mono sources, and I
haven't found a mono tests' policy (how tests are integrated, best
practices, etc), how the test mechanism is used (other description more
specific for the mono case than than http://nunit.com/ web page), ... So
any help like docs, links... will be useful :D
As you requested I have made a pull request through github (#622). I hope
you like it.
On Mon, Apr 29, 2013 at 6:48 PM, Marek Safar <marek.safar at gmail.com> wrote:
> Hi Gerardo,
> This is very good Mono improvement. Could change your tests to fit mono
> nunit test style. It should not be too hard if you need guidance let me
> know or look how it's done for other mono parts.
> Secondly to make the review and merge easier for us could you send pull
> request with your changes.
> I am Gerardo García Peña and I'm new in this list.
>> Some months ago started working with mono in a project which demands a
>> very precise manipulation of UTF8 (and other encodings) streams. When I
>> started to write code I observed that the mono UTF8 implementation is very
>> buggy, while the MS.NET implementation is quite good. Then I started to
>> isolate the bugs and filled some bugs in the Ximian's bugzilla  .
>> They're still there and unfixed, but I think they are important: an
>> incompatibility in the text codec subsystems virtually affects any
>> application that need portability between Microsoft and Mono platforms.
>> Specially from the data integrity point of view, and in some cases
>> availability security issues (indexes and counters reported by the
>> conversion methods and throwed exceptions could make apps running on the
>> Mono environment to enter into infinite loops, making apps running on the
>> mono runtime vulnerable to DoS attacks).
>> The bugs are still there (unresolved), and during this time I have found
>> some more, so I decided to start patching the UTF8 libraries (and in the
>> future, if this patch is accepted, I will continue working on other buggy
>> codec that appears).
>> The patch that I propose is an important modification of the file
>> /mono/mcs/class/corlib/System.Text/UTF8Encoding.cs and some minor changes
>> in other generic classes in System.Text. The targets of my patch are the
>> - give a complete and good quality UTF8 coder & decoder implementation,
>> - at least it is as much efficient as the old implementation,
>> - better error handling and quick resync when bad sequences are found,
>> - fix the index field in the Fallback exceptions (it is a key feature
>> if one
>> program want to handle strings with errors),
>> - refactorize and make code more maintainable,
>> - full compatibility with the .NET implementation (behaviour is exactly
>> same in front of bad and good sequences),
>> - complete some pending or incomplete features (MonoTODO) like
>> Encoder::FallbackException::IsUnknownSurrogate() or use of BOM
>> Please note that in spite of presenting a full-compatible implementation
>> of this codec with the Microsoft implementation, my changes are not based
>> on Microsoft's work, and they are totally written from scratch. I have not
>> reversed any code and the behaviour of my patches has been tunned using an
>> extensive and exhaustive test case.
>> My test case uses several public UTF8 test cases and one specific and
>> giant UTF16 test case built automatically. The test case must be executed
>> first on the Mono runtime environment and once again on the Microsoft
>> runtime. The output of the test case are two directories (one for mono,
>> another for net) documenting the output of (and exceptions thrown) the
>> Convert() method. Once both executions are finished, it should not exist
>> any difference between the
>> two output directories.
>> The test case is focused only on the Convert() method because it allows
>> to test any variation of the input. My implementation (and probably
>> Microsoft's too) is based on two coder/decoder functions that are called by
>> all the other public methods. Because that reason the best way to test both
>> implementations is using the method that exposes more directly the internal
>> I posted the changes and my test suite in a github branch, and I also
>> have attached them to this mail (if you want to test it quickly without
>> doing any git operation):
>> - mono branch with my patches
>> - test suite
>> To run the test suite, run the makefile and then execute the program
>> convert.exe in the two platforms. You'll get a 'cnvout-mono' and
>> 'cnvout-other' directories which will contain the output of each test run.
>> Once they have finished run the 'mkdiff.sh' shell script. This script will
>> make a 'cnvout-diff' directory, which should be empty if all files are
>> I know that it is an important patch because it affects the corlib
>> libraries which are critical for the Mono runtime. If you have any question
>> or note about the code, or if I can do anything to improve this patch, I
>> will be glad to help.
>> Thanks in advance,
>> Gerardo García Peña
>>  10692 https://bugzilla.xamarin.com/show_bug.cgi?id=10692
>>  10697 https://bugzilla.xamarin.com/show_bug.cgi?id=10697
>> Mono-devel-list mailing list
>> Mono-devel-list at lists.ximian.com
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Mono-devel-list