[Mono-bugs] [Bug 480178] System.Globalization.CharUnicodeInfo.GetUnicodeCategory() does not handle surrogate characters appropriately.

Wed Apr 21 15:40:49 EDT 2010

http://bugzilla.novell.com/show_bug.cgi?id=480178

http://bugzilla.novell.com/show_bug.cgi?id=480178#c17

--- Comment #17 from Damien Diederen <dd at crosstwine.com> 2010-04-21 19:40:47 UTC ---
Hi Paolo,

(In reply to comment #16)
> I was playing with a bi-level table compression myself a few months
> ago, so the general approach is fine by me.

Okay.

> On the specific implementation, I'm not sure some of the additional complexity
> in your changes is worth it. Let's consider a 256 byte page size. Any category
> lookup could be done with:
>   char_data [char_start [val >> 8] + (val & Oxff)]
> this is more compact and in most cases should be better than the branchy code
> in your patch. Care to try that out or did you already test something similar?

Possibly.  This implementation is a (more or less) direct port of
GLib's, which itself comes from libunicode, and I must admit I haven't
had time to research the history of that solution, nor to explore
alternative ones.

This is definitely worth trying and measuring, though.  I will look
into it before submitting an updated series.

> As for multiple versions of the data, we likely just want to use the
> latest, but once we have numbers about the cost of this we could
> reconsider.

Okay; I will focus on getting numbers for the simple lookup technique
first.

-- 
Configure bugmail: http://bugzilla.novell.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
You are the assignee for the bug.