[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] memory usage in fake multi-part posts
From: |
Daisy Flanders |
Subject: |
Re: [Pan-users] memory usage in fake multi-part posts |
Date: |
Tue, 22 May 2018 19:44:49 +0000 (UTC) |
I apologize for the HTML. I rarely participate in online discussions. Please
forgive any further faux pas.
FWIW I've been using Pan since 2003 or earlier and I've seen the improvements
over the years. With 64GB RAM, it can handle over 20M multi-part articles with
around 1B total parts. It's impressive. It takes about 20 minutes to load the
group headers, but that's probably the threading. The memory management is
incredibly efficient.
That's why this attack was interesting. The header lines returned by the NNTP
server are less than 400 bytes, but Pan is using over 2k for each. The combined
subject, author and message id can't account for that. It turns out that most
(70%, according to valgrind massif) of this is reserve()ed for Parts that will
never be found.
This is particularly effective because Pan doesn't currently clear() articles
until the group is unloaded; scoring rules don't save much memory during a
header download because the Articles are still intact. One side-effect is this:
If the group is never loaded in the header pane when getting headers, on
exiting it doesn't get unloaded and the saved group file contains the entire
mess, even if every article scores -9999 and Pan is set to delete score -9999.
Here are some details on the attack:
The group is alt.binaries.teevee. The DoS began 21 Jan 2017 (about -271M from
current XoverHigh) and continued about 9 days, accounting for about 90% of the
post volume during that period, many millions per day. The author was unchanged
and the subject fields were similar for the first week or so; the last couple
of days, both fields were different hex hashes for every post. All were
ostensibly multi-part with only one part ever posted.
A number of issues were reported around that time for Pan (and other
newsreaders). The blind men and an elephant. AFAIK the problem was undiagnosed
and nothing was done.
This is easily reproducible with a small mod to Pan: In task-xover.cc: push
minitasks back rather than front [I would also fix the obvious bugs in header
ranges there]. This might introduce a bug or two but it's nice because it
retrieves headers from oldest to newest. Surprisingly, with no further changes,
this allows exiting Pan and later resuming header downloads with Get New
Headers. To reproduce, set XoverHigh for the group in newsgroups.xov then Get
New Headers to start from there. The attack is in full force in
alt.binaries.teevee at 1195000000 on the UNS family of servers (Easynews,
Astraweb, Newshosting, UsenetServer and about 40 others). Depending on the
server you may have to sample to find the date range but the attack window is
at least 50M articles.
Alternatively, just use a debugger to set the xover ranges before the minitasks
are created.
I don't think a significant re-write is necessary to mitigate this. I have two
small patches in my local branch that I tested separately and in tandem to get
past the attack and download the entire set of headers for the group available
on usenetserver, about 1.45B.
1. Limited pre-filtering - when a line is read from the server, before an
Article is ever created, if it matches a rule just drop it. This is effective
and CPU overhead is minimal for the simple rules I needed. It's something of a
bandaid though. The underlying problem is that Articles, once created, stick
around until the GroupHeaders is deleted.
2. clear() Articles when DataImpl::delete_articles() is called. This involves
re-ordering some things and likely introduces bugs because Articles are not
ref-counted. Somebody who understands the code better could probably get it
right. This is less effective in saving RAM than the first one, partly because
the 120-byte Article struct still hangs around, but I don't think that's the
only problem with my hack. There's also more CPU overhead.
I'd be happy to provide more information. There's no mystery in what's
happening here.