[Mono-bugs] [Bug 480178] System.Globalization.CharUnicodeInfo.GetUnicodeCategory() does not handle surrogate characters appropriately.
bugzilla_noreply at novell.com
bugzilla_noreply at novell.com
Tue May 4 15:01:17 EDT 2010
http://bugzilla.novell.com/show_bug.cgi?id=480178
http://bugzilla.novell.com/show_bug.cgi?id=480178#c18
--- Comment #18 from Damien Diederen <dd at crosstwine.com> 2010-05-04 19:01:15 UTC ---
Hi Paolo,
Here is a second series that accommodates your suggestion. The code is indeed
much simpler and faster (below are the results of a simple microbenchmark).
This version still unconditionally mimics .NET 3.5 SP1.
| Range | Iterations | Orig. | GLib | Paolo |
|-------------+------------+-------+--------+-------|
| 0000-00FF | 256000 | 0.30s | 0.43s | 0.35s |
| 0000-FFFF | 16000 | 4.75s | 13.33s | 5.67s |
| 1000-FFFF | 15000 | 4.18s | 11.80s | 4.99s |
| 0000-10FFFF | 1000 | N/A | 11.18s | 5.64s |
|-------------+------------+-------+--------+-------|
| Data size | | 64kB | 22kB | 30kB |
"Orig." denotes mono/mcs at 156650, which only supports the BMP, "GLib" is a
straightforward translation of the encoding used in that library, and "Paolo"
is the simple bi-level table with page sharing.
Here is the method which was temporarily added to corlib in order to obtain
these numbers:
public static int CategoryMB (int count, int from, int to)
{
int sum = 0;
unsafe {
for (int i = 0; i < count; i++) {
for (int cp = from; cp <= tp; cp++)
// Tweak this to match the internal table format.
sum += category_data
[category_index [c >> 8] + (c & 0xff)];
}
}
return sum;
}
(I can zip a patch series which includes the various variants of the internal
tables, of this method, and of the benchmark program if somebody wants to play
with this.)
--
Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.
More information about the mono-bugs
mailing list