[mono-vb] WebCrawler in vb.net (mono)

quandary quandary82 at hailmail.net
Thu Feb 18 14:17:48 EST 2010


I've wanted to do that a long time ago.

You can take a look at Apache Lucene, a Java search library, which you
could port to .net.
Perhaps you find a way to compile the lucene library from java
source/bytecode directly to .net.

Another way is to extend this codeproject project:
http://www.codeproject.com/KB/IP/Crawler.aspx

Then you need a ranking algorithm, such as Google PageRank, or perhaps
better something like Yahoo TrustRank, and a parallel computation
library, and a cluster software for computing the Eigenvectors of the
markov chains (indexing).

I found this site about PageRank to be particularly useful because of
it's incredible simplicity:
http://www.peterbe.com/PageRank-in-Python


On 02/17/2010 03:21 PM, Mauro Risonho de Paula Assumpção wrote:
> I am developing an open source software, which need a web crawler. I
> would like help from the list. The idea is to scan the structure of
> the site (HTTP and HTTPS), riding in a treeview in vb.net
> <http://vb.net> with GTK (Mono). Does anyone have any ideas?
>
> Thanks
>
>
> _______________________________________________
> Mono-vb mailing list
> Mono-vb at lists.ximian.com
> http://lists.ximian.com/mailman/listinfo/mono-vb
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.ximian.com/pipermail/mono-vb/attachments/20100218/6c2dd5e7/attachment-0001.html 


More information about the Mono-vb mailing list