I wish to monitor a group of 150-200 websites on daily basis. Sample
information I expect from this solution/application/searchengine is
1. Document statistics like the number and size of html,doc,pdf, etc
2. Compare the recent version of document with the previous version and
show the difference
3. Content / site last update
4. html content parsing - w3c std compliance
5. meta information