pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Errors downloading binary news


From: Duncan
Subject: Re: [Pan-users] Errors downloading binary news
Date: Sat, 28 Jul 2018 07:20:20 +0000 (UTC)
User-agent: Pan/0.146 (Hic habitat felicitas; 1de496241)

Volker Wysk posted on Fri, 27 Jul 2018 20:51:49 +0200 as excerpted:

> Hi!
> 
> I've set up pan, and it works fine, but for one problem. When I save
> an article's attachments (to /tmp/news in this case), I get this:
> 
> ------------------------------snip------------------------------
> desktop /tmp/news $ ls -l
> -rw-r--r-- 1 v v 82911 Jul 27 20:41 THBC01-06 - Trisha Yearwood Wvocal - 
> Powerful Thing.zip
> -rw-r--r-- 1 v v   126 Jul 27 20:41 THBC01-06 - Trisha Yearwood Wvocal - 
> Powerful Thing.zip.ERRORS
> 
> desktop /tmp/news $ cat THBC01-06\ -\ Trisha\ Yearwood\ Wvocal\ -\ Powerful\ 
> Thing.zip.ERRORS 
> Warning: Data looks suspicious. Decoded file might be corrupt.
> Warning: Data looks suspicious. Decoded file might be corrupt.

> desktop /tmp/news $ unzip -v *zip
> Archive:  THBC01-06 - Trisha Yearwood Wvocal - Powerful Thing.zip
> error [THBC01-06 - Trisha Yearwood Wvocal - Powerful Thing.zip]:  missing 
> 2995200 bytes in zipfile
>   (attempting to process anyway)
>  Length   Method    Size  Cmpr    Date    Time   CRC-32   Name
> --------  ------  ------- ---- ---------- ----- --------  ----
>  2801664  Defl:N  2713384   3% 2005-08-09 13:25 e19a6a15  THBC01-06 - 
> Yearwood, Trisha wvocal - Powerful Thing.mp3
>  1260096  Defl:N   364051  71% 2005-08-03 22:30 5b936c3c  THBC01-06 - 
> Yearwood, Trisha wvocal - Powerful Thing.cdg
> --------          -------  ---                            -------
>  4061760          3077435  24%                            2 files
> ------------------------------snip------------------------------
> 
> The subject is "Attn : THBC fill-ins - "THBC01-06 - Trisha Yearwood Wvocal - 
> Powerful Thing.zip" yEnc (13/13)".

> Could it have to do something with yEnd?
> 
> I've tried it with various news postings, and it's always the same.
> 
> Any hints?

I only rarely do binaries and I've not seen *.ERRORS files like that,
so I can't say for sure, but I've been using pan and on this list for
over a decade and a half now, and more importantly for this particular
issue I have some pre-pan knowledge of yenc functionality and somewhat
controversial history, and USENET binary experience going back pre-pan
(I was much more active with binary downloading up until about a decade
ago) as well, and based on that I believe I have a reasonable idea what's
going on.

1) Yenc, unlike earlier encoding formats, records the file size and a
checksum (IDR which and it's not important enough to go look it up ATM,
but at a guess I'd say either CRC or md5), so again unlike earlier
encoding formats, yenc *CAN* actually say with reasonable certainty
that something's corrupt, if the file size and checksums don't match
what it's told they should be.

Which explains the *.ERRORS files -- size and checksum aren't matching.

(As the zip listing shows, zip records size and checksum as well, and it
sees corruption too, but that's expected when yenc is already spitting
errors in the decoding.)

2) Also unlike the earlier encoding types, yenc took express advantage
of specific news peering characteristics, namely, the fact that (unlike
standard internet message formats used in mail and earlier news standards
which had to stay clean-conversion-compatible with all sorts of 7-bit
and other technically limited formats) news is /almost/ 8-bit-clean,
allowing it to be much more efficient than the standard 4/3 encoding
size expansion (aka 33% encoding overhead, 4 bytes of encoding stored
only 3 bytes of original data) of the time, with only ~5% overhead.

In practice this means that should a yenc-based message get transferred
using non-8-bit-clean methods instead of staying on the direct
news-peering network that yenc was specifically designed to take advantage
of, it can get corrupted.

It's /possible/ that's what you're seeing, particularly if it's only
messages from specific posters and/or posters uploading via specific news
providers that are exhibiting the problem.  If this is the case, it may
be possible to compare path headers from affected and unaffected posts
and isolate the problem path component that the affected posts have in
common.  

This can help if you have access to accounts on multiple news providers
that may have different peering and thus different routing for the
affected posts, or if you can have someone on a different provider
that gets them "clean" because they don't take the problem path
reupload them to you, or if you have a provider that may be willing
to work with you to try to clean up their receiving path, possibly by
changing their peering a bit to avoid the bad path component.

3) The other possibility, more probable especially if you're using a
low quality provider like the ISP's own news services tended to be
(back in the day when they provided news, few do these days), is
entirely unrelated to yenc except that yenc, due to the size and
checksum recording, may help you spot the corruption easier/sooner.

Backing up a bit...

So the ls of the zip file says it's ~82 KiB in size, but it should
be, according to unzip, ~3 MiB in size.  Obviously something's missing!

First the obvious: Are all the parts showing up?  Is pan displaying
an assembled puzzle icon before you attempt to download?  If it's still
an unassembled puzzle icon, then pan knows all the parts aren't there.
You can still force pan to download, because for some files it's
still possible to play them, with some corruption and skipping the
missing part, particularly if the first part is there, and sometimes
that's useful.  But zip files, etc, that normally won't work, unless
of course there's par files available and there's enough of them usable
to repair the missing and corrupted parts.

Second, are there errors in the log?  Particularly on low quality
servers and/or when messages are already expiring, it's possible the
server will /say/ everything's there, but some parts will actually be
missing when you go to download them.

Third, some parts may simply be corrupt, and possibly much (!!)
smaller than they should be.

One thing that at least /used/ to be common on low quality providers,
again, often ISP level (because unlike the dedicated providers they
aren't selling news directly, they're selling a connection to the net,
so unlike dedicated news providers, you'll likely still continue to
pay the ISP if the news service stinks), was that either they or
their peering connections would limit article sizes.  The most common
symptom of this would be a bunch of last parts showing up, because
they'd be smaller than the full size parts, which wouldn't fit thru
the size filter.

Another possibility, a favorite trick of the RIAA/MPAA etc types,
is to deliberately supersede just one part of each multi-part with
a corrupt version of the same, thus corrupting the entire download.

There are three ways to avoid that, two for the downloader, one for
the uploader.  There are techniques I'm not familiar with to make
it more difficult to supersede/cancel, and the uploader can use them.
For the downloader, it's either get there immediately after
the message posted, hopefully before the supersede has had its chance
to work so you get the original uploaded version not the corrupted
supersede, OR, use a good quality provider that doesn't honor cancels/
supersedes in the binary groups, precisely to avoid problems such as
this.

(Of course at least in the US the news providers must honor copyright
takedown requests or be responsible for the violations themselves, but
that's a longer term thing, and I expect they take down the whole thing,
not just individual parts.  IOW, a good provider won't honor binary
cancels/supersedes, but WILL very likely have to honor full takedown
requests, tho those typically take some time to process.  Anyway, there
again, getting there as soon as the parts are complete is clearly best,
but...)

(They use this trick on P2P as well, saying they have a good copy of
just one chunk of many, but it's corrupt, thus corrupting the entire
thing.  Smart P2Pers and/or the best P2P clients learn how to avoid
that as well, by tracking "bad" peers, etc.)

4) Finally, while it appears unlikely here due to the size differential,
it's possible for a decoder such as pan to assemble the parts in the
wrong order.  Pan used to have a bug that made this somewhat common,
but that one was fixed some years ago now, and I know of no such bug in
reasonably current versions.  But it could still happen under unusual
circumstances like say the poster mislabeling the order.

If you suspect something like this (as I said, unlikely here as the
files are smaller, not just corrupted), it's possible to have pan
save the raw message files as "text", still encoded.  You'd then
do the decoding and assembly manually, likely at the terminal, using
another tool such as uudeview, using the flexibility of the manual
process to specify a custom assembly order and/or do other similar
troubleshooting.


So as they say, the above is a bit of a crapshoot, but hopefully it's
/somewhat/ helpful at least, if in no other way, perhaps at least
by providing a bit of technical background info on yenc.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]