[Mono-dev] Unhandled Exception in Normalization.cs Combine()

Tom Philpot tom.philpot at logos.com
Fri Jun 19 11:59:53 EDT 2009


Atsushi,

Thanks for the fixes. From a cursory test, it appears this fixes our issues
as well. I assumed that Mono was using MbUnit. I'll send you a test case w/o
MbUnit that's more in line with the current Mono test framework when I get
the chance.

Tom Philpot


On 6/19/09 3:04 AM, "Atsushi Eno" <atsushieno at veritas-vos-liberabit.com>
wrote:

> Actually I was wrong at fixing the first "bug" you reported. It was
> actually .NET which is buggy, though unlike older Mono it doesn't result
> in an unhandled exception.
> 
> http://demo.icu-project.org/icu-bin/nbrowser?t=\u03B1\u0313\u0345&s=&uv=0
> 
> To examine C# implementation, try below:
> 
> foreach (char c in "\u03B1\u0313\u0345".Normalize ())
> Console.Write ("{0:X04} ", (int) c);
> 
> NET outputs: 03B1 0313 0345
> 
> I have a fix that corrects the output as: 1F80
> 
> I'll check in the fix soon. With the fix your test prints all "True".
> 
> Atsushi Eno
> 
> 
> Atsushi Eno wrote:
>> Hi Tom, and Tom :)
>> 
>> I have tried the Hindle version of the test.
>> 
>> Summary: the sample depends on .NET bug; 2 .NET bugs, 1 mono bug.
>> 
>> This exactly shows that .NET Normalization is buggy. Here is the
>> result from ICU normalization results:
>> http://demo.icu-project.org/icu-bin/nbrowser?t=\u00e1bc&s=&uv=0
>> 
>> i.e. in NFKD, \u00e1bc must be decomposed to \u0061\u0301\u0062\u0063,
>> while .NET returns the same string as the input.
>> 
>> The sample code is confusing because it uses "styleName" output
>> to the next input. .NET does not correctly decompose it to
>> \u0061\u0301\u0062\u0063, while Mono is correct. When it ran on mono,
>> it keeps using the correct NFKD as the next input to the following
>> normalizations and hence difference in NFKC (i.e. we have no bug in
>> normalizing NFKC string, unlike the test claims).
>> 
>> I have created a bit visible modification below:
>> http://pastebin.ca/1465907
>> 
>> Though, there seems a mono bug on NFD-to-NFC and NFKD-to-NFKC
>> composition. I have extracted a simpler test:
>> 
>> string s1 = "\u0061\u0301bc";
>> string s2 = "\u00e1bc";
>> Console.WriteLine (s1.Normalize () == s2);
>> 
>> *Both* Mono and .NET says "False", but it must be "True". See
>> ICU conversion results:
>> http://demo.icu-project.org/icu-bin/nbrowser?t=\u0061\u0301bc&s=&uv=0
>> Its NFC must be \u00e1\u0062\u0063 (the string s2 above).
>> 
>> I'll work on fixing the composition part of the issue.
>> 
>> I haven't tried the Philpot version as I have never installed
>> mbunit on this Windows machine - it'd be nicer if the sample just
>> compiles and runs within standard libs to make it possible to
>> integrate our nunit tests.
>> 
>> Atsushi Eno
>> 
>> 
>> Tom Hindle wrote:
>>> Attached small self contained my test case.
>>> I think the output should be 5 trues.
>>> 
>>> I getting 2 Trues and 3 Fails. on mono version r136435
>>> 
>>> Incidentally .NET returns 5 trues for this test case.
>>> 
>>> Is there a Bugzilla entry for this issue?
>>> 
>>> 
>>> 
>>> Also normalization-tables.h is now has windows line endings (CRLF)
>>> 
>>> Thanks
>>> Tom
>>> 
>>> On Thu, 2009-06-18 at 13:51 -0700, Tom Philpot wrote:
>>>> Here is a revision of the test case I sent earlier to the list that
>>>> doesn't
>>>> rely on any specific encoding (only uses '\uXXXX' characters).
>>>> 
>>>> Hopefully this will be helpful.
>>>> 
>>>> Tom
>>>> 
>>>> 
>>>> On 6/18/09 1:49 PM, "Tom Hindle" <tom_hindle at sil.org> wrote:
>>>> 
>>>>> Hi Guys,
>>>>> 
>>>>> With regard to recent Normalization changes I have just run our test
>>>>> suite with recent mono r136422 - and are getting a number of
>>>>> regressions.
>>>>> 
>>>>> 
>>>>> For example:
>>>>> 
>>>>> {
>>>>> string styleName = "\u00e1bc";
>>>>> StStyle style = new StStyle();
>>>>> Cache.LangProject.StylesOC.Add(style);
>>>>> style.Name = styleName;
>>>>> 
>>>>> FwStyleSheet.StyleInfoCollection styleCollection = new
>>>>> FwStyleSheet.StyleInfoCollection();
>>>>> styleCollection.Add(new BaseStyleInfo(style));
>>>>> 
>>>>> 
>>>> Assert.IsTrue(styleCollection.Contains(styleName.Normalize(NormalizationFor
>>>> m.F
>>>>> ormC)));
>>>> Assert.IsTrue(styleCollection.Contains(styleName.Normalize(Normalizat
>>>>> ionForm.FormD)));
>>>> Assert.IsTrue(styleCollection.Contains(styleName.Normalize
>>>>> (NormalizationForm.FormKC)));
>>>> Assert.IsTrue(styleCollection.Contains(styleName
>>>>> .Normalize(NormalizationForm.FormKD)));
>>>>> }
>>>>> 
>>>>> is now failing, as well as other larger unit tests.
>>>>> 
>>>>> I will look info this further to try and produce an example test
>>>> program
>>>>> that doesn't contain references to our code base.
>>>>> 
>>>>> Thanks
>>>>> Tom
>>>>> 
>>>>> On Thu, 2009-06-18 at 15:01 +0900, Atsushi Eno wrote:
>>>>>> Hi,
>>>>>> 
>>>>>> If you mean the test cases by the previous email, then that's what
>>>>>> (I said) includes raw native encoding in your land (Latin1?) and is
>>>>>> what I cannot read. Replace them all with ASCII representation
>>>> (\uxxxx).
>>>>>> Even if the attachment includes encoding (you mean BOMs?), it is
>>>>>> not readable in some environment (like the text editor I use on
>>>>>> Windows). Let me repeat, Latin1 is not universal. Don't depend on
>>>> it
>>>>>> (if you do).
>>>>>> 
>>>>>> Atsushi Eno
>>>>>> 
>>>>>> 
>>>>>> Tom Philpot wrote:
>>>>>>> Atsushi,
>>>>>>> 
>>>>>>> Thanks for the feedback. For some reason, the Mac when displaying
>>>>>>> unicode always composes strings before display. I'll look at the
>>>> test
>>>>>>> case in corlib tomorrow when I get in to work. Would it be helpful
>>>> for
>>>>>>> the test cases if I gave you both the formD bytes and the formC
>>>> bytes
>>>>>>> that I think are correct for the test case I sent? Perhaps the
>>>> encoding
>>>>>>> did not come across in the attachment.
>>>>>>> 
>>>>>>> We have a workaround for the Mac port of our app which would
>>>> require
>>>>>>> overriding string.Normalize to p/invoke to Mac OS X's NSString
>>>> library
>>>>>>> to do normalization. It would work, but we would prefer not to
>>>> have to
>>>>>>> ship a custom build of Mono. The normalization on .NET appears to
>>>> be
>>>>>>> "good enough" for our purposes and we'd just like our Mac version
>>>> to be
>>>>>>> consistent.
>>>>>>> 
>>>>>>> Tom
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Atsushi Eno [mailto:atsushieno at veritas-vos-liberabit.com]
>>>>>>> Sent: Wed 6/17/2009 7:51 PM
>>>>>>> To: Tom Philpot
>>>>>>> Cc: mono-devel-list at ximian.com
>>>>>>> Subject: Re: [Mono-dev] Unhandled Exception in Normalization.cs
>>>> Combine()
>>>>>>> You seem to have embedded raw native encoding in your land that
>>>>>>> is *not* understandable in Japan. Anyways the input string you
>>>>>>> posted in the previous sample was already in FormC which will
>>>>>>> look like "doing nothing" as the conversion results.
>>>>>>> 
>>>>>>> There is a standalone normalization test generated from
>>>> normalization
>>>>>>> conformance test in corlib/Mono.Globalization.Unicode. We fail
>>>>>>> about 26000. Far from good, but still better than 35000 on .NET.
>>>>>>> 
>>>>>>> Atsushi Eno
>>>>>>> 
>>>>>>> Tom Philpot wrote:
>>>>>>>> Now, string.Normalize(NormalizationForm.FormC) doesn't do
>>>> anything using
>>>>>>>> mono (r136228).
>>>>>>>> 
>>>>>>>> I've attached some test cases which will hopefully help in
>>>> tracking down
>>>>>>>> what doesn't work.
>>>>>>>> 
>>>>>>>> On 6/15/09 1:58 AM, "Atsushi Eno"
>>>> <atsushieno at veritas-vos-liberabit.com>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi again,
>>>>>>>>> 
>>>>>>>>> It should be now fixed in trunk.
>>>>>>>>> 
>>>>>>>>> Atsushi Eno
>>>>>>>>> 
>>>>>>>>> Atsushi Eno wrote:
>>>>>>>>>> I'll have a look. However since 4 years have passed since I
>>>> wrote it,
>>>>>>>>>> I'll have to revisit the spec and will take not a little time.
>>>>>>>>>> 
>>>>>>>>>> Atsushi Eno
>>>>>>>>>> 
>>>>>> _______________________________________________
>>>>>> Mono-devel-list mailing list
>>>>>> Mono-devel-list at lists.ximian.com
>>>>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>>> 
>> 
>> _______________________________________________
>> Mono-devel-list mailing list
>> Mono-devel-list at lists.ximian.com
>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>> 
>> 
>> 
> 
> _______________________________________________
> Mono-devel-list mailing list
> Mono-devel-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-devel-list



More information about the Mono-devel-list mailing list