[Mono-bugs] [Bug 480178] System.Globalization.CharUnicodeInfo.GetUnicodeCategory() does not handle surrogate characters appropriately.

bugzilla_noreply at novell.com bugzilla_noreply at novell.com
Tue May 4 15:01:17 EDT 2010


http://bugzilla.novell.com/show_bug.cgi?id=480178

http://bugzilla.novell.com/show_bug.cgi?id=480178#c18


--- Comment #18 from Damien Diederen <dd at crosstwine.com> 2010-05-04 19:01:15 UTC ---
Hi Paolo,

Here is a second series that accommodates your suggestion.  The code is indeed
much simpler and faster (below are the results of a simple microbenchmark). 
This version still unconditionally mimics .NET 3.5 SP1.

  | Range       | Iterations | Orig. | GLib   | Paolo |
  |-------------+------------+-------+--------+-------|
  | 0000-00FF   |     256000 | 0.30s | 0.43s  | 0.35s |
  | 0000-FFFF   |      16000 | 4.75s | 13.33s | 5.67s |
  | 1000-FFFF   |      15000 | 4.18s | 11.80s | 4.99s |
  | 0000-10FFFF |       1000 | N/A   | 11.18s | 5.64s |
  |-------------+------------+-------+--------+-------|
  | Data size   |            | 64kB  | 22kB   | 30kB  |

"Orig." denotes mono/mcs at 156650, which only supports the BMP, "GLib" is a
straightforward translation of the encoding used in that library, and "Paolo"
is the simple bi-level table with page sharing.

Here is the method which was temporarily added to corlib in order to obtain
these numbers:

    public static int CategoryMB (int count, int from, int to)
    {
        int sum = 0;

        unsafe {
            for (int i = 0; i < count; i++) {
                for (int cp = from; cp <= tp; cp++)
                    // Tweak this to match the internal table format.
            sum += category_data
                        [category_index [c >> 8] + (c & 0xff)];
            }
        }

        return sum;
    }

(I can zip a patch series which includes the various variants of the internal
tables, of this method, and of the benchmark program if somebody wants to play
with this.)

-- 
Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.


More information about the mono-bugs mailing list