Newsgroups : Borland : borland.public.delphi.internet.winsock : 2006 May : parse HTML result

www.cryer.info
Managed Newsgroup Archive

parse HTML result

Subject:parse HTML result
Posted by:"Bob Bedford" (b..@bedford.com)
Date:Fri, 26 May 2006 16:46:01

Hello there,

We've a commercial website where we have many sport articles from various
real shops.

The main purpose of the site is to provide to our clients a way to sell
their goods without the need to have a complex website like the one we are
building, and also use the improvements we will do in the future. The site
is written in PHP.

Some of our clients have already their website (quite basic) hopefully sold
by the same company with the same structure: a table wich contains 10
articles per page (with many pages if more than 10 articles) and each line
with a link to the article description. Those clients have asked us to get
content from their website to avoid them to enter the values 2 times
manually. So we must create a "robot" that goes on a web page (the main page
of the articles) get all articles from the page (get content from the detail
page linked by the article page) and then read on the values we must use to
fill their article in our database.

I've already worked with Indy and TBrowser (wich doesn't always work fine)
but the results aren't very good. I mean sometimes the pages isn't even
loaded and there is no error code.

What I want is a way to get the pages from any URL, giving the name of the
links I want and those I don't want (if such word in URL then read otherwise
don't). I've found WinHTTrack wich is free and opensource (but I don't know
much on C++). I'd like use such tool but can't read images for example (the
site encoded the url that there is no .jpg file, it's a asp page that
generate the image) or customize the robot. Having our own solution will
give us full control.

It's there any source code I can get to start building such personalized
program ?. It's quite a website copier where I can customize what I want and
what I don't. I must have source code for it in order to customize the way
it work. I only want a start point, not necessarely a full feature working
code. I'll add my own features.

Thanks for helping.

Bob

Replies:

www.cryer.info
Managed Newsgroup Archive