[Mono-list] Character coding auto-detection in plain-text files

Antonello Provenzano antonello at deveel.com
Mon Mar 12 04:13:01 EDT 2007


Pedro,

> A port exists in C# but is very outdated

The fact is the port of CharDet to C# is made from Java starting
point: if you've checked the original JCharDet is quite outdated also
(latest release was 3 years ago).

> This library would be of great help to many applications, mostly those
> working with files in different encodings, but basically any
> application reading plain-text files.

I haven't tried yet, but I believe the current version should work for
detection of character encodings, since the encoding table is not
changed since that time.

My 2 cents.

Cheers,
Antonello

On 3/10/07, Pedro Castro <mail at pedrocastro.org> wrote:
> Hi,
>
> This comes first as a question: is there currently a way to autodetect
> encodings in text files / strings?
>
> I realize there isn't, so would like ask if someone's interested on
> going forward with this. Mozilla has a great detector, written in C,
> which has been ported to other languages, like Java
> (http://jchardet.sourceforge.net/) and Python
> (http://chardet.feedparser.org/) for instance. A port exists in C# but
> is very outdated
> (http://www.conceptdevelopment.net/Localization/NCharDet/).
>
> This library would be of great help to many applications, mostly those
> working with files in different encodings, but basically any
> application reading plain-text files.
>
> --
> Pedro Castro
> http://www.pedrocastro.org
> _______________________________________________
> Mono-list maillist  -  Mono-list at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-list
>


More information about the Mono-list mailing list