[Mono-dev] Unhandled Exception in Normalization.cs Combine()

Atsushi Eno atsushieno at veritas-vos-liberabit.com
Fri Jun 19 02:34:57 EDT 2009


Hi Tom, and Tom :)

I have tried the Hindle version of the test.

Summary: the sample depends on .NET bug; 2 .NET bugs, 1 mono bug.

This exactly shows that .NET Normalization is buggy. Here is the
result from ICU normalization results:
http://demo.icu-project.org/icu-bin/nbrowser?t=\u00e1bc&s=&uv=0

i.e. in NFKD, \u00e1bc must be decomposed to \u0061\u0301\u0062\u0063,
while .NET returns the same string as the input.

The sample code is confusing because it uses "styleName" output
to the next input. .NET does not correctly decompose it to
\u0061\u0301\u0062\u0063, while Mono is correct. When it ran on mono,
it keeps using the correct NFKD as the next input to the following
normalizations and hence difference in NFKC (i.e. we have no bug in
normalizing NFKC string, unlike the test claims).

I have created a bit visible modification below:
http://pastebin.ca/1465907

Though, there seems a mono bug on NFD-to-NFC and NFKD-to-NFKC
composition. I have extracted a simpler test:

	string s1 = "\u0061\u0301bc";
	string s2 = "\u00e1bc";
	Console.WriteLine (s1.Normalize () == s2);

*Both* Mono and .NET says "False", but it must be "True". See
ICU conversion results:
http://demo.icu-project.org/icu-bin/nbrowser?t=\u0061\u0301bc&s=&uv=0
Its NFC must be \u00e1\u0062\u0063 (the string s2 above).

I'll work on fixing the composition part of the issue.

I haven't tried the Philpot version as I have never installed
mbunit on this Windows machine - it'd be nicer if the sample just
compiles and runs within standard libs to make it possible to
integrate our nunit tests.

Atsushi Eno


Tom Hindle wrote:
> Attached small self contained my test case.
> I think the output should be 5 trues.
> 
> I getting 2 Trues and 3 Fails. on mono version r136435
> 
> Incidentally .NET returns 5 trues for this test case.
> 
> Is there a Bugzilla entry for this issue?
> 
> 
> 
> Also normalization-tables.h is now has windows line endings (CRLF)
> 
> Thanks
> Tom
> 
> On Thu, 2009-06-18 at 13:51 -0700, Tom Philpot wrote:
>> Here is a revision of the test case I sent earlier to the list that
>> doesn't
>> rely on any specific encoding (only uses '\uXXXX' characters).
>>
>> Hopefully this will be helpful.
>>
>> Tom
>>
>>
>> On 6/18/09 1:49 PM, "Tom Hindle" <tom_hindle at sil.org> wrote:
>>
>>> Hi Guys,
>>>
>>> With regard to recent Normalization changes I have just run our test
>>> suite with recent mono r136422 - and are getting a number of
>>> regressions.
>>>
>>>
>>> For example:
>>>
>>> {
>>> string styleName = "\u00e1bc";
>>> StStyle style = new StStyle();
>>> Cache.LangProject.StylesOC.Add(style);
>>> style.Name = styleName;
>>>
>>> FwStyleSheet.StyleInfoCollection styleCollection = new
>>> FwStyleSheet.StyleInfoCollection();
>>> styleCollection.Add(new BaseStyleInfo(style));
>>>
>>>
>> Assert.IsTrue(styleCollection.Contains(styleName.Normalize(NormalizationForm.F
>>> ormC)));
>> Assert.IsTrue(styleCollection.Contains(styleName.Normalize(Normalizat
>>> ionForm.FormD)));
>> Assert.IsTrue(styleCollection.Contains(styleName.Normalize
>>> (NormalizationForm.FormKC)));
>> Assert.IsTrue(styleCollection.Contains(styleName
>>> .Normalize(NormalizationForm.FormKD)));
>>> }
>>>
>>> is now failing, as well as other larger unit tests.
>>>
>>> I will look info this further to try and produce an example test
>> program
>>> that doesn't contain references to our code base.
>>>
>>> Thanks
>>> Tom
>>>
>>> On Thu, 2009-06-18 at 15:01 +0900, Atsushi Eno wrote:
>>>> Hi,
>>>>
>>>> If you mean the test cases by the previous email, then that's what
>>>> (I said) includes raw native encoding in your land (Latin1?) and is
>>>> what I cannot read. Replace them all with ASCII representation
>> (\uxxxx).
>>>> Even if the attachment includes encoding (you mean BOMs?), it is
>>>> not readable in some environment (like the text editor I use on
>>>> Windows). Let me repeat, Latin1 is not universal. Don't depend on
>> it
>>>> (if you do).
>>>>
>>>> Atsushi Eno
>>>>
>>>>
>>>> Tom Philpot wrote:
>>>>> Atsushi,
>>>>>
>>>>> Thanks for the feedback. For some reason, the Mac when displaying
>>>>> unicode always composes strings before display. I'll look at the
>> test
>>>>> case in corlib tomorrow when I get in to work. Would it be helpful
>> for
>>>>> the test cases if I gave you both the formD bytes and the formC
>> bytes
>>>>> that I think are correct for the test case I sent? Perhaps the
>> encoding
>>>>> did not come across in the attachment.
>>>>>
>>>>> We have a workaround for the Mac port of our app which would
>> require
>>>>> overriding string.Normalize to p/invoke to Mac OS X's NSString
>> library
>>>>> to do normalization. It would work, but we would prefer not to
>> have to
>>>>> ship a custom build of Mono. The normalization on .NET appears to
>> be
>>>>> "good enough" for our purposes and we'd just like our Mac version
>> to be
>>>>> consistent.
>>>>>
>>>>> Tom
>>>>>
>>>>> -----Original Message-----
>>>>> From: Atsushi Eno [mailto:atsushieno at veritas-vos-liberabit.com]
>>>>> Sent: Wed 6/17/2009 7:51 PM
>>>>> To: Tom Philpot
>>>>> Cc: mono-devel-list at ximian.com
>>>>> Subject: Re: [Mono-dev] Unhandled Exception in Normalization.cs
>> Combine()
>>>>> You seem to have embedded raw native encoding in your land that
>>>>> is *not* understandable in Japan. Anyways the input string you
>>>>> posted in the previous sample was already in FormC which will
>>>>> look like "doing nothing" as the conversion results.
>>>>>
>>>>> There is a standalone normalization test generated from
>> normalization
>>>>> conformance test in corlib/Mono.Globalization.Unicode. We fail
>>>>> about 26000. Far from good, but still better than 35000 on .NET.
>>>>>
>>>>> Atsushi Eno
>>>>>
>>>>> Tom Philpot wrote:
>>>>>> Now, string.Normalize(NormalizationForm.FormC) doesn't do
>> anything using
>>>>>> mono (r136228).
>>>>>>
>>>>>> I've attached some test cases which will hopefully help in
>> tracking down
>>>>>> what doesn't work.
>>>>>>
>>>>>> On 6/15/09 1:58 AM, "Atsushi Eno"
>> <atsushieno at veritas-vos-liberabit.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi again,
>>>>>>>
>>>>>>> It should be now fixed in trunk.
>>>>>>>
>>>>>>> Atsushi Eno
>>>>>>>
>>>>>>> Atsushi Eno wrote:
>>>>>>>> I'll have a look. However since 4 years have passed since I
>> wrote it,
>>>>>>>> I'll have to revisit the spec and will take not a little time.
>>>>>>>>
>>>>>>>> Atsushi Eno
>>>>>>>>
>>>>>
>>>> _______________________________________________
>>>> Mono-devel-list mailing list
>>>> Mono-devel-list at lists.ximian.com
>>>> http://lists.ximian.com/mailman/listinfo/mono-devel-list
>>
>>



More information about the Mono-devel-list mailing list