[Mono-list] conversions
Jonathan Pryor
jonpryor@vt.edu
Tue, 05 Oct 2004 18:52:52 -0400
A quick perusal through Perl's "Category.pl" shows this:
(1) Numbers are categorized as "Nd"
(2) The only ranges that are "Nd" seem to be:
0030 - 0039 '0' - '9'
0660 - 0669 ARABIC-INDIC DIGIT 0 - 9 (same order as ASCII)
06F0 - 06F9 EXTENDED ARABIC-INDIC DIGIT 0-9 ("")
0966 - 096F DEVANAGRAI DIGIT 0-9
09E6 - 09EF BENGALI DIGIT 0-9
0A66 - 0A6F
0AE6 - 0AEF
0B66 - 0B6F
0BE7 - 0BEF
0C66 - 0C6F
0CE6 - 0CEF
0D66 - 0D6F
0E50 - 0E59
0ED0 - 0ED9
0F20 - 0F29
... Plus 8 more...
I'm too lazy to look at all of these ranges, but the ones I did look at
all had digits in the order 0..9. The subtraction should be legal for
all of these glyphs. (Which is probably by design; it would be very odd
-- broken? -- to have so many digits in the "right" order, and then have
a few in a different order...)
Gnome's Character Map program (gucharmap) is very handy for looking up
the Unicode Category a character belongs to. Too bad the opposite
direction (Unicode Category -> characters) tends to be more difficult
(hence consulting Perl's internal tables).
- Jon
On Tue, 2004-10-05 at 07:31, Polton, Richard (IT) wrote:
> Thanks for this. Is it fair to say, then, that only Arabic numerals are
> counted as digits? Even though other numeric characters have integer
> values?
>
> -----Original Message-----
> From: Jonathan Pryor [mailto:jonpryor@vt.edu]
> Sent: 05 October 2004 11:32
> To: Polton, Richard (IT)
> Cc: Jambunathan Jambunathan; mono-list@lists.ximian.com
> Subject: RE: [Mono-list] conversions
>
> On Tue, 2004-10-05 at 04:34, Polton, Richard (IT) wrote:
> > In fact, habing given it further thought, I have a couple of
> questions:
> >
> > i) if I sit at a Japanese terminal (for example) and enter '-', i.e.
> > ichi or 'one', is this a valid Unicode character?
>
> Yes.
>
> > ii) how wide is the 'char' datatype? I assume it contains Unicode
> > rather than single-byte ASCII.
>
> 16-bit unsigned value. It supports Unicode.
>
> > iii) if entering 'ichi' is valid, and char contains Unicode, then I
> > suspect that the below subtration will return a number substantially
> > greater than one.
>
> No. At least, not if it's remotely like CVS HEAD:
>
> public static int Val (char Expression) {
> if (char.IsDigit(Expression)) {
> return Expression - '0';
> }
> else {
> throw new ArgumentException();
> }
> }
>
> Ichi isn't a digit, so it will generate an ArgumentException.
>
> (Assuming that Ichi is Unicode U+4E00, which certainly looks like '-'.
> It's in the Unicode category "Letter, Other".)
>
> The subtraction should be safe, as (1) it's only done on digits, and (2)
> Unicode follows the ASCII character ordering (for glyphs 0-127), which
> permits this subtraction.
>
> - Jon
> --------------------------------------------------------
>
> NOTICE: If received in error, please destroy and notify sender. Sender does not waive confidentiality or privilege, and use is prohibited.
>
> _______________________________________________
> Mono-list maillist - Mono-list@lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-list