pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-users] Re: ANNOUNCE: pan-attach and pan-attach-kd, version 0.0.2006


From: Duncan
Subject: [Pan-users] Re: ANNOUNCE: pan-attach and pan-attach-kd, version 0.0.2006.10.07.0
Date: Tue, 10 Oct 2006 09:02:34 +0000 (UTC)
User-agent: pan 0.115 (Mrs. Kerr Says Remember the Tip Jar)

Dave <address@hidden> posted
address@hidden, excerpted below, on  Tue, 10 Oct 2006
01:02:38 +0100:

> On Saturday 07 October 2006 11:45, Duncan wrote:
>> pan-attach and pan-attach-kd are scripts designed to allow posting
>> attachments with pan.  URL below.
> 
> Duncan, I had some comments from a couple of users that they are just
> seeing the raw encoded textual information when I post using your script. 
> One is using KNode/0.10.4, the other Thunderbird (Mozilla 4.8 [en]
> (Windows NT 5.0; U))
> 
> Agent and Pan (at least) are both ok with it.
> 
> I got the same comments when I encoded manually and pasted the result in a
> while back too.
> 
> Any ideas?  I don't even know what I should be looking for :-(

AFAIK, neither Thunderbird nor KNode (nor OE for that matter) do yEnc.

yEnc (My Encoding, without the M and abbreviating encoding as enc, that's
the proper capitalization, BTW) is the newest encoding, and the most
popular in many groups despite the fact that a lot of clients don't
understand it, as it's only ~5% overhead (compared to the traditional 33%
overhead, four bytes encoded transmits 3 bytes of binary file), taking
advantage of the fact that news is close to 8-bit clean. (Note that if
news was entirely 8-bit clean, there'd be no need to encode at all --
you'd simply post the files and others would simply download them as they
do with HTTP and FTP.)

The normally accepted rule is that the poster chooses how he'll post,
since he's the one providing the content, while the downloaders get it if
they can cope with it, and get ignored if they can't.  Again, posters tend
to choose yEnc because they can either post that much more in the same
time or bandwidth, or use less time/bandwidth to post the same content.
Downloaders who complain about gibberish where there should be attachments
are told to get a decent news client.  By now, if it can't do yEnc, it's
not considered an acceptable news client for binaries, period.

I suspect that's what you were running into -- clients that don't do yEnc.
Both pan and agent do yEnc -- that is in fact one of the points in their
favor and against any clients that don't.

About the others:

UUE is the oldest of the three encodings.  As with yEnc however, it hasn't
gone thru the full Internet RFC standards process as has MIME, and as it
sort of came to be without a standard or definition of any sort (people
wanted a way to attach binaries to an otherwise text-only medium, and they
experimented with various things until UUE came into being).  UUE BTW
stands for Unix-to-Unix Encoding (I believe I have that correct) -- it was
developed in the Internet formative years when nearly all the machines on
the Internet were Unix machines, either US-DOD or University based, well
before our current Internet mail or news standards were fully defined.  As
mentioned above, the encoding overhead is 33%, four bytes encoded is three
bytes of file.  However, UUE is mail as well as news safe and given its
age, nearly every client that handles attachments at all handles UUE.

While it isn't a choice here for reasons explained below, for
completeness, I'll cover MIME here as well. 

MIME, Multi-purpose Internet Mail Extensions, is actually a broad set of
formally defined standards, aspects of which are used in many other areas
as well.  Among other aspects, most Unix "file-type associations" are
based on the MIME file-types from this standard.  The same set of
file-types is used to define HTTP/web server file-types as well, as I
believe it was Apache that first borrowed them for that purpose.

Heading back to MIME as used in Internet Message standards, the framework
actually defines two different types of 7-bit ASCII text encoding (as used
in Internet Mail messages, the standards of which formed the basis for
news as well).  MIME/base64 is similar to UUE but using a slightly
different defined 64 characters as their encoding base.  This is what is
referred to when we talk of MIME encoding in the context of binary file
attachments.  The other encoding is quoted-printable, which is very close
to plain text and is designed to handle primarily text content as
effectively as possible while still allowing the raw encoding to be for
the most part human-readable.  If you ever come across a message that has
=3D for equals signs and similar =XX hexidecimal codes for certain other
characters, that's very likely either raw MIME/quoted-printable or
a message that started out as MIME/quoted-printable but got corrupted in
some way such that the MIME headers aren't recognized, so the client
treats it as regular 7-bit ASCII text instead of MIME.  The two MIME
encoding formats are designed to be convertible directly one into the
other, but quoted-printable is most efficient with text where it remains
mostly human readable, while base64 is most efficient with binary, at
again the standard 4-bytes encoded text encodes 3-bytes of binary file, a
33% overhead.

Because MIME is the only formally/officially defined and standardized
attachment method for binaries, nearly all modern clients understand it. 
The only major exceptions are very old clients that were around pre-MIME
standard and were never upgraded to comply with it.  As with UUE, it's
both mail and news safe.

The reason it isn't a choice for pan-attach(-kd) is due to the way the
standards are implemented.  It's a full framework, defining a specific
header (MIME-Version: 1.0) that must appear in any MIME compliant message,
with additional headers defining the number of parts and how they are
layed out in the message, and each part containing its own set of
part-headers.  In ordered to properly do MIME, therefore, the MIME encoder
must have control of the entire post, in ordered to define all the headers
appropriately.  pan only forwards the message body to the defined external
editor, and even if it forwarded the entire message at that point, there's
no guarantee that further changes wouldn't be made after pan got the
message back, therefore potentially invalidating some of the headers
declared by the external editor.  Put directly, the only way to properly
do MIME is to have pan do it all.  Since that's not possible at my
skillset level and therefore the level that pan-attach(-kd) is implemented
in, pan-attach can't properly do mime and therefore doesn't have that
choice.

I could go on in some detail about MIME as it's something I've studied in
some depth (well, at least to the point of reading the main RFCs (Requests
for Comments, the way these documents start) on the subject, as I had a
reason to do so at one point and they /were/ rather fascinating, at least
to me). However, I'll leave it there as the mail is really rather long as
it is. I'd certainly encourage others interested in understanding reading
all about the standards that form the basis of the Internet we all use,
however, to read up on these. They aren't nearly as dry and devoid of
interest as wading thru EULAS is, for example, and the MIME RFCs are some
of the more "mere human" accessible of the RFCs.  Google MIME RFCs for a
good start.

In the interest of completeness, I should mention the other "encoding"
choice that pan-attach(-kd) does have, text/identity.  Here "encoding" is
in quotes, simply because there /is/ none -- it simply includes the file
as-is.  As such, it's neither news-safe nor mail-safe to use this for any
binary format files at all.  It WILL break things, either corrupting the
message itself, or at minimum the attached file, if a binary file is
attached in text/identity mode.  In most cases, I'd not expect the post to
even make it to the server successfully as it breaks all the rules.  So
what is it good for then?  Simply this: use this choice if you have a
(7-bit ASCII normal) text file you want to include as-is.  It avoids the
encoding overhead entirely, and will be remain readable as text.  In
effect, this is what pan already does with the sig file if you simply
point it at a text file -- includes it as text.

What's all that come down to in simple form?

1) Choose yEnc encoding if you are posting binary files and care more about
efficiency than download client compatibility (and if you are using
old-pan, which can do so, new-pan can't).

2) Choose text/identity encoding for text files that you want displayed as
part of the message itself, not as attachments.

3) Choose UUE if you are posting binaries and are either worried about
compatibility or are using new-pan, which chokes on yenc, leaving UUE the
only binary posting choice.  Also, where the message will be gated to a
mail, choose UUE, as yEnc WILL break with mail.

4) Expect a certain level of complaints if you choose yEnc, as some people
continue to insist on using now binary group inappropriate clients that
don't handle yenc (inappropriate as yEnc is now the most popular choice
for posting binaries, and users using clients that can't grok it on binary
newsgroups simply need to throw away what might as well be their manual
typewriters and join the age of the computer and Internet).

There's two additional things to keep in mind as well.

1) The guy who originally defined yEnc specified certain conditions
(having to do with ordering and requiring the keyword yEnc as part of the
subject line) for the subject lines of posts containing yEnc encoded files.
Of course, pan-attach(-kd) can't enforce this, but the user can manually,
if desired. However, most clients that understand yEnc don't require the
strict subject line formatting and will recognize it even without it, and
in fact a lot of yEnc posts don't strictly observe the subject line
requirements in any case. There might be a few that won't work unless the
requirements are strictly met, however.  For the technical details of yEnc
including the subject requirements, see the yEnc home page here:
http://www.yenc.org/ (and note that the common short form without the www,
simply yenc.org, doesn't work in this case).

2) As UUE was never formally standardized, there are occasional minor
differences in implementation.  In the vast majority of cases, these won't
matter much and things are compatible, but one might come upon a case
that's not.  Since there's no formal spec to break, one can't properly say
such implementations are "broken", only that they aren't 100% compatible
with the way everybody else implements UUE.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman





reply via email to

[Prev in Thread] Current Thread [Next in Thread]