Newsgroups : Borland : borland.public.delphi.internet.winsock : 2006 Apr : Re: Convert HTML to Text

www.cryer.info
Managed Newsgroup Archive

Re: Convert HTML to Text

Subject:Re: Convert HTML to Text
Posted by:"Eddie Shipman" (mrbaseball34@no_spam_gmail.com)
Date:Mon, 17 Apr 2006 11:18:35

In article <44401b57$1@newsgroups.borland.com>, sjohnson@caelix.com
says...
> Hi,
>
> I'm using TIdHTTP to get a web site. The Get method returns all of the data
> and HTML formating codes. Is there a component that will strip the formating
> HTML codes and return just the data?
>

Very easy to parse text from HTML using IHTMLDocument2.


uses  ...,mshtml, ActiveX, ComObj;

procedure TForm1.Button1Click(Sender: TObject);
var
  IDoc:      IHTMLDocument2;
  sHTMLFile: String;
  v:         Variant;
begin
  sHTMLFile := idHTTP1.Get('http://www.mysite.com');
  Idoc:=CreateComObject(Class_HTMLDOcument) as IHTMLDocument2;
  try
    IDoc.designMode:='on';
    while IDoc.readyState<>'complete' do
      Application.ProcessMessages;
    v:=VarArrayCreate([0,0],VarVariant);
    v[0]:= sHTMLFile;
    IDoc.write(PSafeArray(System.TVarData(v).VArray));
    IDoc.designMode:='off';
    while IDoc.readyState<>'complete' do
      Application.ProcessMessages;
    Memo1.Lines.Text := IDoc.body.innerText;
  finally
    IDoc := nil;
  end;
end;

Replies:

In response to:

www.cryer.info
Managed Newsgroup Archive