Garbage, garbage!

paul barrett paul at hivemind.net
Sat Feb 23 03:16:00 CST 2008


Warning: this may be unnecessarily pedantic :) but I thought I'd share 
since I know about email structure having worked on implentation of an 
email server and with many email clients.

First let me say I try to only send plain text and prefer it that way. 
HTML is for web pages and email which needs embedded pictures and so on 
(there are times when this is so, such as site newsletters and coprorate 
mailers) but not for general mail on a mailing list.

Second, this problem has nothing to do with html as such; the problem 
looks like the mail *client* at fault.  What is being sent sounds like 
it conforms 100% to RFC spec.  That I have had no such problems seems to 
indicate that the mails are not malformed.

Ya Sam - I looked at the html source of your mails and there is nothing 
wrong with them, they are 100% correct and should be readable by any 
properly implemented mail client.  You can choose to use plain text 
instead, but there is nothing wrong with the format of your mails as sent.

Those symbols you are seeing are from quoted-printable encoding of html, 
and are standard and necessary to prevent html breaking in systems which 
do not support long lines, which in the early days of email was all of them.

Once there are no legacy systems anywhere which wrap at around 78 
characters, such encoding would no longer be needed.

It must already be happening that newer servers and clients can handle 
html without this encoding because I see more and more html mail which 
is not encoded.  This is still no excuse for a mail client not to handle it.

an = at the end of a line means the line is not actually two lines but 
one - this prevents words being broken into two by the line wrap.

the =3D (for example - there are various 2 character forms depending on 
what was encoded) reference things which are easier to send in an 
encoded form to avoid any possible problems.  =3D itself is the encoding 
of the "=" sign to prevent it being seen as a line wrap where it is 
actually an = sign.

If the archive is converting to plain text and storing as such, then it 
is broken in its implementation of this conversion because it is not 
decoding the quoted-printable first.  If the search does not work it is 
also for the same reason, and not because the mails are broken.  A 
search should always be run on decoded text.  It is not difficult to 
know it is encoded because the mail says so, and the decoding algorithm 
is super simple and freely available.

The reason gmail will display properly is the same reason Thunderbird 
and Outlook do - they properly decode the content.  There is no reason 
for a mail client to not decode it except either lazy or sloppy 
programming.  Those who are not seeing this "garbage" - it's because 
your email clients work properly :)

okay, lecture over :)

and let me just reiterate that I think plain text is better for most 
mails, if only because it uses less bandwidth to transmit the same content.

Krafft, John M. Dr. wrote:
> Hey, no, I'm not complaining about the intellectual level of the posts. But more and more messages lately seem to be coming through with more and more garbage characters like =, two-digit numbers and other annoyances in them, and line-breaks in the middles of words. Some approach being more trouble than they are worth to read, and some are literally unreadable--_all_ garbage. I'd think it was because of some setting in my mail reader, but the messages in the archive are junked up too. Am I the only one with a low threshold?  Anyone have a solution?
> 
> Thanks.
> 
> jmk
> 
> 
> --
> 
> John M. Krafft / English
> Miami University–Hamilton / 1601 University Blvd. / Hamilton, OH 45011-3399
> Tel: 513.785.3031 or 513.868.2330
> Fax: 513.785.3145
> E-mail: krafftjm at muohio.edu
> WWW: http://www.ham.muohio.edu/~krafftjm or http://PynchonNotes.org
> 
> 



More information about the Pynchon-l mailing list