Newsgroups : Borland : borland.public.delphi.internet.winsock : 2007 Nov : Re: Sending a SMTP message with UTF-8 subject

www.cryer.info
Managed Newsgroup Archive

Re: Sending a SMTP message with UTF-8 subject

Subject:Re: Sending a SMTP message with UTF-8 subject
Posted by:"Remy Lebeau (TeamB)" (no.spam@no.spam.com)
Date:Tue, 27 Nov 2007 12:19:33

"Richard" <ritchie872@yahoo.com> wrote in message
news:474c69d0$1@newsgroups.borland.com...

> Mess.Subject = 'some wide string with 3 code pages';

As far as I know, you can't have 3 different code pages in a single string,
or even multiple code pages active in a given thread at the same time.  Even
if you could, there is no way to know which characters are using which
codepage so they can be encoded separately.

Also, such string literals are going to be compiled as Ansi (unless you are
using Delphi.NET, where strings are all Unicode), so there shouldn't be an
implicit conversion performed by the RTL.

> The first one gives uss the subject '?????? ???????? some string'

That happens when a WideString contains characters that cannot be converted
to Ansi or MBCS at all.  There is nothing Indy can do about that.  The
conversion is happening outside of Indy, in the RTL itself.

> The second case will give us some wrongly =?utf8?B? encoded subject
> (double encoded?)

Yes, it is double-encoded.  TIdMessage has no way of knowing that the input
has been encoded manually beforehand.  If you want to encode it manually
while not setting TIdMessage's NoEncode property to True as well, then you
will have to use the TIdMessage.OnInitializeISO event to set the
HeaderEncoding to '8' and the CharSet to 'ISO-8859-1' or 'US-ASCII' or even
just '' (empty string).  Then Indy shold be able to transmit your input data
as-is.

> It seems like passing a UTF8 string to the subject will cause idMessage
> to encode to multibyte just again, given an incorrect result.

If the UTF-8 is in a WideString, then the RTL will encode it to MBCS when
passed to TIdMEssage, which will then convert it to Unicode when encoding it
to UTF-8 if the HeaderEncoding is set to 'utf-8'.

If the UTF-8 is in an AnsiString, then TIdMessage will encode the MBCS as
UTF-8 if you set the HeaderEncoding to 'utf-8'.

So, either way, setting the HeaderEncoding tells TIdMessage to always encode
the input data regardless of whether it is already encoded beforehand.

> You mean, hardcoded in binary??????? Because there must
> be *somewhere* in the code some place where this encoding
> takes place.

The encoding system is invoked inside of TIdMessage.GenerateHeader() for
every header item.  Unless you remove the calls to the EncodeHeader()
function in code for each item and then recompile Indy, the only way to
bypass the actual encoding process is as I described above.


Gambit

Replies:

In response to:

www.cryer.info
Managed Newsgroup Archive