Newsgroups : Borland : borland.public.delphi.internet.winsock : 2006 Apr : Re: Convert HTML to Text
| Subject: | Re: Convert HTML to Text |
| Posted by: | "Eddie Shipman" (mrbaseball34@no_spam_gmail.com) |
| Date: | Mon, 17 Apr 2006 11:18:35 |
In article <44401b57$1@newsgroups.borland.com>, sjohnson@caelix.com
says...
> Hi,
>
> I'm using TIdHTTP to get a web site. The Get method returns all of the data
> and HTML formating codes. Is there a component that will strip the formating
> HTML codes and return just the data?
>
Very easy to parse text from HTML using IHTMLDocument2.
uses ...,mshtml, ActiveX, ComObj;
procedure TForm1.Button1Click(Sender: TObject);
var
IDoc: IHTMLDocument2;
sHTMLFile: String;
v: Variant;
begin
sHTMLFile := idHTTP1.Get('http://www.mysite.com');
Idoc:=CreateComObject(Class_HTMLDOcument) as IHTMLDocument2;
try
IDoc.designMode:='on';
while IDoc.readyState<>'complete' do
Application.ProcessMessages;
v:=VarArrayCreate([0,0],VarVariant);
v[0]:= sHTMLFile;
IDoc.write(PSafeArray(System.TVarData(v).VArray));
IDoc.designMode:='off';
while IDoc.readyState<>'complete' do
Application.ProcessMessages;
Memo1.Lines.Text := IDoc.body.innerText;
finally
IDoc := nil;
end;
end;