Re: mailman keeps holding for non-subscribers

mailman

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: mailman keeps holding for non-subscribers

From:	Eric Wong
Subject:	Re: mailman keeps holding for non-subscribers
Date:	Mon, 13 Apr 2020 11:52:09 +0000

Bob Proulx <address@hidden> wrote:
> Eric Wong wrote:
> > Bob Proulx <address@hidden> wrote:
> > > Eric Wong wrote:
> > > > OK, so I'm following half the recommendations
> > > > 
> > > > The ones I'm going against are:
> > > > 
> > > >         generic_nonmember_action=hold (I want Accept)
> > > >         default_member_moderation=yes (I want no)
> > > 
> > > May I try to convince you otherwise?  Because there are good reasons
> > > for the recommended settings.
> > 
> > Not unless the maximum delay can be minutes.  In other words,
> > similar to what greylisting gets without any human interaction.
> 
> The initial contact delay is the hill being defended?  On a mailing
> list that may have many interactions over time.  You and I might be
> discussing some topic.  Say the topic of mailing list operations. :-)
> We may send many messages back and forth on the mailing list.  This
> might go on for years and years over many topics.  Each of those
> happen fast and efficiently.  And it is not the continuing problem of
> spam to the mailing list that is a problem.  That spam is okay.  But
> it is the very first initial contact email message delay that is the
> showstopper?  It's beyond the pale?

The delay is one of the factors and far-off Date: headers being
flagged as spam by subscribers' MTAs is another.

The main factor of all the admins being potentially unavailable
for long periods of time (or permanently) is a worry of mine.

Maybe the admin team here is big enough here for that not to be
a big problem.  I've definitely participated in some groups
where admins disappeared for days or even weeks while mail piled
up.

> How about SMTP time greylisting?  I would gather from this discussion
> so far that SMTP greylisting, which is exactly the same and creates a
> delay upon the initial contact, would also be a showstopper too then?
> Greylisting at SMTP time would also be beyond the pale?

Fwiw, I see greylisting as a "less bad" option because messages
still eventually go through when no admins are available.  I
prefer not to have greylisting at all, but it's better than
needing humans to be constantly in the loop.

> I am sorry but IMNHO it is the daily day to day operations that are
> much more important to optimize and make efficient.  Because those are
> things that happen repeatedly, day after day.  One time startup costs
> should not be too onerous, but may have some cost in order to have
> benefit.  Like greylisting.  But it is the repeated operations that I
> think should be targeted for optimization.  And that is the normal day
> to day use of the mailing lists without having them filled with spam.

Interesting that you say that, especially when you rightly
admit humans also make mistakes in letting spam through, below.

> > > > So, should I remove address@hidden from moderators?
> > > > I still want automated spam filters such as SpamAssassin, though.
> > > 
> > > The listhelper anti-spam SpamAssassin et al cancel-bot depends upon
> > > the hold actions.  If messages do not get held then it has no ability
> > > to filter spam.  That's fundamental to how it works with Mailman.
> > 
> > That's unfortunate.  I'm not familiar with Mailman, but can't
> > the MTA feed the message through spam filters before Mailman
> > ever sees it?
> 
> It's interesting that you mention that.  Because for years and years
> the frontend anti-spam was poor.  Very poor.  And this is not a
> reflection upon the current FSF staff who have inherited the present
> situation.  But that is the traditional situation.  For a very long
> time the frontend anti-spam has been very poor.  And therefore we have
> been implementing the anti-spam portion mostly in the Mailman
> interface where it is possible for volunteers to interact with the
> system.
> 
> There has been discussion of how to improve the frontend anti-spam.
> At this time the systems are getting OS upgrades.  Those are dearly
> needed.  And obviously a first step in the improvement of the system.
> And there have been discussion about what needs to be done to improve
> the frontend anti-spam.  This is starting to happen.  But is still
> going to take a while from now to be improved.  As with many things
> life and time is what keeps everything from happening all at once.

understood.

<snip>
> 
> > I use mlmmj for legacy mailing list subscribers, that just runs
> > off cron with no synchronous relationship with the MTA at all.
> > I have replay script which makes it incrementally read mail from
> > public-inbox (git).
> 
> If we are going to start listing out mailing list management software

<snip>
 
> But Mailman is an official GNU Project.  There is a benefit to "eating
> your own dogfood" as the saying goes.  That and due to other reasons
> the lists.gnu.org machine is likely to continue to run GNU Mailman
> instead of other mailing list manager programs for a while to come.

Fwiw, I'm not advocating mlmmj, either; but more wondering if
Mailman and mlmmj are similar enough that it's easy to make the
existing replay script work with Mailman, as well.

> > 100% agreed.  I've been using an inotify + Maildir-based
> > training system since 2008 or so spamc, even pre-public-inbox:
> > 
> >     https://public-inbox.org/dc-dlvr-spam-flow.html
> 
> I looked at the mail flow through the diagram and without having spent
> a huge amount of time understanding it the flow looks similar to the
> way other sites do this.  As users read mail and determine that a
> message is spam or non-spam they divert mail to different places and
> based up on those places the learning engines are trained-on-error.
> That's great!  I do that too on my non-gnu systems end user mailboxes.

OK.

> But that isn't really applicable to the way a mailing list works.
> Because a mailing list delivers (forwards) mail to other people.  The
> delivery of spam to other people's mailbox is very bad.  And it is
> difficult for implementing distributed training feedback from the
> community.  We can't not deliver a message that is spam after already
> having delivered it.

Right.  Though I prefer occasional delivery of spam (which
happens regardless of human moderators) than risking the
situation where messages can be delayed for hours (or weeks
as I've experienced elsewhere).

Fwiw, the mlmmj replay script I have runs via inotify off the
public-inbox git repos.  Since public-inboxes are git repos,
anybody who clones the inbox can also run their own forked
mailing list off the clone.

> > Spam gets trained upon removal from archives.
> 
> Your preferred system (AFAICT) is one of a centralized storage without
> delivery.  Because there is no delivery it does not deliver spam and
> that spam can be removed "quietly behind the scenes" as it were.  That
> is what Google does with Gmail too.  And others.
> 
> However that is not a mailing list.  It's something different.  It is
> more similar to a web forum.  Even if it is also different in many
> ways from a web forum.  It feels more similar than it is different.

A public-inbox is a forkable forum which uses email for posts.
It's git-backed, so "git clone --mirror" works, so it's as
decentralized as git is.  It's fine for clients to poll the
HTTP/NNTP interfaces every minute if they're worried about data
disappearing.

(And I've been meaning to make something like IMAP IDLE for the
 HTTP endpoints to save clients' battery life)

> If I am a subscriber to a mailing list and it passes along spam then I
> will receive that spam.  (Where I can filter it out on my end but that
> is already too late to prevent the delivery of it.)
> 
> Many people would object to the centralized storage based system
> because it is centralized and creates an environment where a cabal
> could, 1984 style, remove historical messages and rewrite history.
> Don't like what someone said?  Simply remove that message from the
> storage.  Or without malice there is the possibility of technical
> failure.  A storage failure without backup would lose the entire
> mailing list history.  These problems are not possible in a
> traditional mailing list as those historical messages already were
> sent and became part of the historical record.  And they were
> distributed among all participants.  Everyone has a copy.

In public-inbox, all normal removals for spam are in git history,
so they show up in clones and "git log" can find them.

There's a nuclear "purge" operation for legally sensitive stuff
that breaks git mirrors by rewriting git history and doing gc.
Folks who know their way around git can get around it, though.
NNTP clients polling frequently would also be immune to purge.

It's all designed for low-end hardware, too, so more users can
make mirrors (and potentially fork) on a budget.

And all the problems with centralized storage also apply to
messages in the Mailman moderator queue, as well.

> > > > So if I'm away and unable to administer address@hidden, and
> > > > generic_nonmember_action is "Hold"; does the "human team" at GNU
> > > > will eventually accept postings in my absence?
> > > 
> > > Yes.  Eventually usually means a few hours.
> > 
> > <snip> yikes, that seems like a lot of human labor :<
> 
> No.  It's only a few minutes a day.
> 
> While typing this message I switched over to the other window and ran
> through the mail queues.  It took less than two minutes before I was
> done and flipped back to this message.  Everything was mostly caught
> up.  There were only a dozen messages needing review at this moment.
> Other listhelpers had been at work.  We interlace randomly.  There was
> no heavy spam wave hitting the system needing a custom rule written.
> Just the normal routine activity.  A couple of minutes.
> 
> Note that I am NOT clicking around in the Mailman web interface.  I am
> either in 'mutt' looking at mail from the moderation emails, or
> running scripts which are doing things.  There is no mouse activity
> involved at all.  That would definitely be tedious.

Good to know :>

> > > It is your mailing list and this is up to you.  But people tend to be
> > > very intolerant of spam on mailing lists.
> > 
> > It depends on the quantity, I suppose.  vger.kernel.org lets a
> > few through and nobody seems to mind.  (I'm just a subscriber
> > on vger, not an admin)
> 
> And lists.gnu.org has infrequent spam slip through too.  No system is
> perfect.  And there are human mistakes at times.  Humans have a
> non-zero bit-error-rate after all.  Worse than the automation
> actually.

I agree with that.  I've certainly mistrained and misdeleted
messages myself (at least with public-inbox|git, reverting
a works).

So with that, I really don't think involving human labor at
initial contact (and potential for human error) is good at all.

<snip>

> > Right.  One of my concerns with increased reliance on whitelisting
> > is that spammers will start using whitelisted addresses themselves.
> > SPF might discourage that, though.
> 
> It's somewhat of a scary potential avenue for abuse.  One that has
> only been infrequently targeted.  But SPF, DKIM, and so forth helps
> with preventing the forgeries.  Many sites do not use those however
> and are still subject to delivery of forgeries from those sites.  I
> have been thinking of ways to defend this particular potential abuse
> avenue on the mailing lists, because it prickles at me.  Hopefully in
> the arms race between user and abuser the user will win.

*shrug*  Yes, it's a frustrating arms race :<

> > Fwiw, vger.kernel.org just drops HTML, which seems to cut a lot
> > of spam, too.  They also do greylisting from what I can discern.

<snip>

> Simply dropping html mail is not a practical solution, regardless of
> how much I would wish the world would do so.  For most of the mailing
> lists we have Mailman convert the html to plain text and that seems to
> be the acceptable compromise.

It is practical for some projects depending on the target user :)

But yeah, I guess the majority of gnu.org projects has a different
demographic than kernel hackers on vger.

[Prev in Thread]

Current Thread

[Next in Thread]

Re: mailman keeps holding for non-subscribers, Bob Proulx, 2020/04/09
- Re: mailman keeps holding for non-subscribers, Eric Wong, 2020/04/09
  - Re: mailman keeps holding for non-subscribers, Bob Proulx, 2020/04/09
    - Re: mailman keeps holding for non-subscribers, Eric Wong, 2020/04/10
    - Re: mailman keeps holding for non-subscribers, Bob Proulx, 2020/04/13
    - Re: mailman keeps holding for non-subscribers, Eric Wong <=
    - Re: mailman keeps holding for non-subscribers, Bob Proulx, 2020/04/15
    - Re: mailman keeps holding for non-subscribers, Eric Wong, 2020/04/15
    - Re: mailman keeps holding for non-subscribers, Carlo Wood, 2020/04/15

Prev by Date: Re: mailman keeps holding for non-subscribers
Next by Date: Seeking feedback on changing visibility of email addresses in the list archives
Previous by thread: Re: mailman keeps holding for non-subscribers
Next by thread: Re: mailman keeps holding for non-subscribers
Index(es):
- Date
- Thread