[Mono-dev] [PATCHES] Bug 480178 - System.Globalization.CharUnicodeInfo.GetUnicodeCategory() does not handle surrogate characters appropriately.

Andreas Nahr ClassDevelopment at A-SoftTech.com
Sat May 15 07:36:17 EDT 2010


I'm the author (of a large part) of the current Char class and unfortunately I think you overlooked some things with the changes.

What you are doing is 
(heavily) degrading the performance of most of the char methods (some of which are VERY commonly used)
supporting a single (likely VERY rarely used case) in a single (likely not used very much) method.

1. Lets first start with performance. Unfortunately your micro benchmark is completely omitting the most important speed factor: All the methods were designed to be inlinable (because they tend to be called in tight loops, as they are even within corelib). After your change likely most/none of them would be inlinable anymore. Your benchmark, however, does not CALL A SINGLE METHOD. It just measures the overhead of two additional mathematical operators and a memory access. That is not the big speed-sink. But even then your approach looses 25% perf for no additional gain. The original implementation was the result of (literally) thousands of performance tests.

2. Gain vs. impact. I don't see much sense in changing lots of methods in Char to something worse just to support one corner case that could be handled separately.

3. Imho the alternative is relatively simple: Just leave the Char methods as they are and add the special casing only for the one relevant method. You could just add another table (or two for bi-level table compression or one that reuses the data from the first table) as a separate entity. So you can leave all the original methods intact and ONLY if somebody really uses that method and ONLY if he uses it to query the astral plane using surrogates he actually has to pay the price for it (in the runtime loading the relevant table into memory and using that additional lookup).

Happy hacking

P.S. And sorry for being somewhat harsh on the patches.

-----Ursprüngliche Nachricht-----
Von: mono-devel-list-bounces at lists.ximian.com [mailto:mono-devel-list-bounces at lists.ximian.com] Im Auftrag von Damien Diederen
Gesendet: Freitag, 14. Mai 2010 23:10
An: mono-devel-list at lists.ximian.com
Betreff: [Mono-dev] [PATCHES] Bug 480178 - System.Globalization.CharUnicodeInfo.GetUnicodeCategory() does not handle surrogate characters appropriately.


I just uploaded an updated series of patches addressing bug 480178:


This is ready to be committed as far as I am concerned, but comments are
of course welcome.  Note that I do *not* have commit privileges, so
somebody else will have to take care of it even in case of green light.

>From the “cover letter”:

> Should apply cleanly on top of mono/mcs r157360.
> This version uses the same technique as v2, as suggested by Paolo
> (cf. comment #18), but supports two sets of tables: one compatible
> with v2.0.50727...v3.5.21022 of Microsoft's framework, and one
> matching v4.0.30319 (which incorporates substantial changes).  The
> appropriate set is automatically chosen based on the 'corlib' profile.
> The mechanism can be extended to an arbitrary number of versions, and
> has also been tested with v1.1.4322, but these additional tables have
> been left out not to bloat the Mono binaries as recent 'mcs' link to
> the 'net_2_0' profile anyway.
> Please cf. the individual ChangeLog entries for more details about the
> compatibility and performance implications.  The first patch is for
> 'mono'; the six others for 'mcs'.


tel:  +49 21 89 29 39
cell: +49 174 34 89 428

“Strong Opinions, Weakly Held”
                 — Bob Johansen
Mono-devel-list mailing list
Mono-devel-list at lists.ximian.com

More information about the Mono-devel-list mailing list