Newsgroups : Alt : alt.internet.search-engines : 2006 Feb : Web crawler and Content analysis

www.cryer.info
Managed Newsgroup Archive

Web crawler and Content analysis

Subject:Web crawler and Content analysis
Posted by:"Samir" (samir.sangha..@gmail.com)
Date:8 Feb 2006 00:53:59 -0800

Hello,

I wish to monitor a group of 150-200 websites on daily basis. Sample
information I expect from this solution/application/searchengine is

1. Document statistics like the number and size of html,doc,pdf, etc
2. Compare the recent version of document with the previous version and
show the difference
3. Content / site last update
4. html content parsing - w3c std compliance
5. meta information

Pls help.

Glossary

Replies:

www.cryer.info
Managed Newsgroup Archive