Newsgroups : Borland : borland.public.delphi.internet.winsock : 2006 Oct : Re: problem with UTF-8 encoding
| Subject: | Re: problem with UTF-8 encoding |
| Posted by: | "Remy Lebeau (TeamB)" (no.spam@no.spam.com) |
| Date: | Tue, 3 Oct 2006 10:19:14 |
"Marek Weyda" <marek@slim.cz> wrote in message
news:45224f72$1@newsgroups.borland.com...
> When some e-mail is UTF-8 encoded MailMessage.Subject returns
> something like: '=?utf-8?B?w6HDqcOtw7PDusWvw ... '
As it should be.
> Where's the problem ?
There is no problem. That is perfectly normal behavior. E-mail is
ASCII-based. To send Unicode, it has to be encoded into an ASCII-compatible
format, such as UTF-8. It is the receiver's responsibilty to decode it in
order to access the original Unicode data.
> It means that Indy doesn't know how to decode UTF-8
> encoded e-mail subject or something else ?
That is correct. UTF-8 decoding is not supported at this time. Mainly
because the VCL itself is Ansi-based. Indy still uses AnsiString instead of
WideString. Even if Indy were to decode the UTF-8 data internally, the
higher charcters would be lost when the decoded WideString is converted back
to AnsiString by the VCL runtime.
If you need access to the Unicode data, then you will have to decode it
manually. The line above consists of three parts, formatted as follows:
=?charset?encoding?data?=
When generating the email, the original data is transformed using the
charset, and then that encoded data is transformed using the encoding,
resulting in the final data. In this case, the 'B' stands for base64. So
the Unicode data was transformed into ASCII using UTF-8, and then the UTF-8
data was encoded using base64, and then the base64 data was placed into the
email. To decode, simply reverse the process. Decode the base64 into a
UTF-8 string, and then decode the UTF-8 data into a Unicode string.
Gambit