Re: [Pan-users] Kill files

pan-users
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Pan-users] Kill files

From:	Dieter Britz
Subject:	Re: [Pan-users] Kill files
Date:	Tue, 25 Apr 2017 13:07:37 +0200
Hi Duncan & Pedro et al

thanks for the replies!
Regards
Dieter

On 25 April 2017 at 09:41, Duncan <address@hidden> wrote:
> Dieter Britz posted on Mon, 24 Apr 2017 12:00:15 +0200 as excerpted:
>
>> People talk about setting up a kill file for posters to news groups that
>> annoy others, by off topic postings etc. Is it possible to do that with
>> pan?
>
> This repeats the same idea as the replies by HH, DG and Pedro in the
> other subthread, but with a bit more explanation of what pan's actually
> doing and why, and why it's like binary-choice killfiling (killfiled or
> not) but better. =:^)
>
> First, let's understand the difference between a fine-grained scoring
> mechanism like pan has, where if desired the effects of many scoring
> rules can be applied together to arrive at a final score for a post,
> which then can be used to apply some action (like simply hiding the post,
> or marking it read, or deleting it, or on the other end, hilighting it
> with various colors depending on how high it scores, or automatically
> downloading the post to cache, or saving its attachments), vs a hard
> binary or trinary filter mechanism, which will act immediately on the
> first filter that applies to either kill (generally hide and mark-read,
> sometimes delete, depending on the implementation) or not, possibly (the
> trinary case) with the addition of a watch flag (and perhaps auto-
> download depending on implementation) if the post isn't killed.
>
> So in pan, a score of -9999 is defined as ignored.  That's what binary
> filters would filter out, also known as killing, thus the term killfile.
>
> And a score of +9999 is defined as watched.
>
> Meanwhile, FWIW, there's a number of other preset score category levels
> as well.  These can be seen under the view menu, header pane.  Here's the
> full listing, lowest to highest:
>
> -9999 (or lower): Ignored
>
> Either multiple scoring rules applied to result in the message being
> ignored, *OR* a single scoring rule set ignored/-9999 and stopped further
> processing of further scoring.
>
> By default pan doesn't display these messages, but doesn't take any other
> action (marking them read, deleting them, etc).
>
> -9998 to -1: Low
>
> The result of one or more scoring rules lowered the message score into
> negative territory, but not enough to make it ignored.
>
> 0: Default
>
> Of course 0 is the default score, if no scoring rules apply, or if the
> scoring rules exactly balance each other out.
>
> 1 to 4999: Medium
>
> The result of one or more scoring rules was a moderate scoring boost, to
> less than 5000/high, however.
>
> There's an option to display these in a different color, but I don't
> believe it's on by default.  (FWIW I've been running pan since 2002, a
> decade and a half now, and long ago forgot what the defaults were for
> many of the options I've customized.)
>
> 5000 to 9998: High
>
> The result of one or more scoring rules was a higher scoring boost, more
> than 4999, but less than 9999.
>
> Again, there's an option to display these in a different color, but I
> don't believe it's on by default.
>
> 9999 (or higher): Watched
>
> Either multiple scoring rules resulted in a score at or above 9999, *OR*
> a single scoring rule set it to watched/9999 and stopped further scoring
> rule processing.
>
> Pan should display these in a different color, by default I believe.
> There are options (off by default) that allow auto-downloading or the
> like.
>
>
> As you should already see, scoring allows a far richer and more nuanced
> setup than arbitrary binary kill/show or trinary kill/show/watch
> filters.  But by using the watched/ignored options only, which basically
> set +9999/-9999 respectively and stop further score processing, you can
> have a simpler binary or trinary setup if you wish.
>
> It's up to you. =:^)
>
> Meanwhile, as I already mentioned, there are choices under view, header
> pane, to match (or not) each of these scoring categories separately.
> Again under view, header pane, pan can then be set to display either
> explicitly matched posts, matched posts and their subthreads, or matched
> posts and their entire threads, as desired.
>
> It's up to you. =:^)
>
> And in the preferences dialog (edit menu, preferences), on the colors
> tab, you can set the colors for each scoring category.
>
> It's up to you. =:^)
>
> (Tho do note that these days, pan only shows those colors in the score
> column, not the entire line as it used to do.  So you have to have the
> score column in your listing or you won't see the colors.  I preferred it
> coloring the entire line, but oh, well, I'm a user, not a dev... and
> unfortunately, that's NOT a user available option.  As I'm writing this,
> however, I'm wondering just how hard it might be to find that and patch
> it to whole line, tho.  I /am/ an advanced enough user that even tho I
> don't claim to be a dev, I can /sometimes/ work out patches on my own,
> and as I run gentoo, I normally build everything from sources and can and
> often do apply my own patches or those I've picked up from others to
> various packages, including pan.  So I'll have to look into patching
> this...)
>
>
> OK, so you can set whether the various score categories are displayed or
> not, and if displayed, you can set the color per category, but what about
> more practical score-based actions?  In particular, for those who track
> things via marked-read, and who don't have pan's preference to
> automatically mark everything in the group read when they fetch headers
> or leave a group, not displaying ignored posts AND not having them
> automatically marked read is frustrating, because then they hang around,
> still marked unread!
>
> Of course if you've been paying attention, you already know the answer,
> as I mentioned it above.
>
> It is (of course) up to you! =:^)
>
> (Noticing the trend yet? =:^)
>
> Preferences dialog, actions tab.
>
> One possible setup might be:
>
> Delete articles scoring at:     -9999 or less   (ignored)
>
> This would auto-delete ignored articles.
>
> Mark articles read scoring at:  -9998 to -1     (low/negative)
>
> This would auto-mark-read negative/low-scoring articles, but wouldn't
> delete them.  The idea here is to let you hide them by default (by
> showing only unread), but still keep them around in case you see a reply
> and you want to see the message it's replying to.
>
> (I /believe/ it'll mark anything read UNDER the named category as well,
> so it would mark ignored articles read too, if they're not deleted with
> the earlier option, above.  But I'm not actually sure on this bit.)
>
> Alternatively, if you don't delete ignored articles, you can simply mark
> them read, and still show negative/low-scoring articles that aren't
> entirely ignored.
>
> Cache articles scoring at:      1 to 4999       (medium)
>
> Of course you can set this to high/5000-9998 or watched/9999 instead, if
> that fits your needs better.
>
> The idea is that if an article is sufficiently highly scored, you want it
> cached for you so it's already there when you would otherwise have to
> download it to cache.
>
> Do be aware that pan's cache size is pretty small, 10 MB by default, and
> especially if you're doing binaries and using this setting, you'll
> probably want a larger cache.  That's set in preferences, on the behavior
> tab.
>
> (Again, I /believe/ it'll do the same with the higher categories, high
> and watched, too, but I've not actually tested it to be sure.)
>
> Download attachments of articles scoring at:    Disabled
>
> If you're doing binaries, you might want to set this instead of the cache
> option.
>
> Generally, people download binaries using one of two strategies.
>
> Here, I prefer to have pan's cache set way big, and download messages to
> cache first, so they're local.  Then when they're already cached so I
> won't be waiting for the download, I can go thru and sort out what I
> really want, saving it where I want it, and deleting what I don't really
> want.  This works best for (relatively) small binaries that you will
> download many hundreds or thousands of, like still images or audio clips
> mostly under 10 minutes in length, with the occasional longer audio clip
> or short video.  It also requires a much larger cache setting (on the
> order of gigabytes, for me), or pan will start deleting previously
> downloaded to cache but still unread messages, to make room for the
> newest still downloading to cache messages.
>
> For that binaries strategy or for text messages, the auto-download-to-
> cache action exists.  Just be aware of the cache size requirements and
> adjust it accordingly.
>
> The other strategy, which is obviously pan's default given the very small
> 10 MB default cache size, is to have pan download and save off the
> binaries immediately, without caring at all about the messages they're
> attached to.  Because the attachments are saved immediately and the
> messages they were attached to don't matter, those messages can be
> deleted from cache as soon as the attachment is saved, so this requires a
> far smaller cache and pan's default 10 MB cache suffices.
>
> This works best for very large binaries, typically half-hour or longer
> videos like TV series episodes or feature-length movies.  It works best
> if you don't care about the messages containing the attachments at all
> (no discussion of the series, etc), since unless you increase the size of
> the cache anyway, they'll be deleted effectively immediately after the
> attachment processing is completed.
>
> It is for this binaries strategy that the auto-download-(and-save)-
> attachments action exists.  Obviously this isn't going to work too well
> if your interest is primarily text groups (and people post binaries there
> too, and the messages score high enough for the action to trigger),
> because you'll end up with a bunch of random binaries that happened to be
> attached to watched or whatever level scoring messages saved off to
> wherever you have pan saving them.
>
>
> OK, but what about the scoring itself?
>
> First of all, the watch (thread) and ignore (thread or author) entries on
> the articles menu are the GUI method to create scoring rules that set the
> +/-9999 score and abort further score processing.
>
> Next, there's the edit article's watch/ignore/score and add a scoring
> rule entries, again on the articles menu.  These bring up a dialog,
> either directly (for add) or indirectly (for edit, using the add button
> there), that lets you setup a more detailed scoring rule.  This is more
> flexible than the arbitrary watch/ignore options above, allowing you to
> match various options and if matched either set a specific score and
> abort further scoring as the above watch/ignore options do, or
> alternatively, to simply add/subtract whatever score and continue
> processing further scoring rules.  You can also set an expiry for the
> rule, if desired, or make it permanent.
>
> It's this last option, to add/subtract some score value and continue
> processing more scoring rules, that's where the real flexibility comes
> in.  You can match on multiple subject keywords in multiple rules, adding
> or subtracting based on the match, then add/subtract based on author,
> then do some more based on references (effectively thread, only sometimes
> message-ids are deleted from the header and it won't match the thread any
> longer), then subtract points if it's cross-posted/spammed to too many
> groups, and add or subtract more points based on size in bytes or line
> count.
>
> As long as no match sets an arbitrary score and stops further processing,
> all these matches will result in a final score that combines the effects
> and the relative scoring weight of all the others, and pan uses that
> final score to decide what scoring category the message belongs in, and
> thus whether to show it and how, as well as what automated actions to
> apply.
>
> See how much richer a good scoring system is, compared to arbitrary
> binary/trinary-choice filtering on just ONE match-factor?
>
> Of course if that's too complex for you, just use the watch/ignore and be
> done with it.
>
> It's up to you. =:^)
>
>
> Meanwhile, as the others suggested, the real advanced stuff is reserved
> for those who choose to directly edit the scorefile itself.  They posted
> the link to the format description.
>
> http://www.slrn.org/docs/score.txt
>
> But, keep in mind that the link above is for a different news client,
> slrn, which shares a general scorefile format with pan.  Unfortunately,
> however, pan's score-processing code isn't quite as advanced as slrn's,
> so some of the more complex stuff described there doesn't work in pan.
> Pan hasn't implemented the include statement, for instance, so don't try
> to use it.  The {} grouping logic isn't implemented either, AFAIK.
>
> And, pan hasn't implemented the score keyword's single-colon AND logic,
> so single or double colon doesn't matter, it's always interpreted as OR
> (double-colon).  This is unfortunate, but the effect can be partially
> counteracted by simply creating multiple conditions, each of which gives
> partial points.  So instead of an AND score with five conditions to meet
> and a +1000 value, you can use pan's OR scoring on each of the five
> conditions, with a +200 value on each.  The total if all match will still
> be +1000, but of course the effect might be less anticipated if only some
> conditions match and that interacts with another would-be compound with
> only some conditions matching.
>
> Another difference is that pan's scoring matches are always case
> insensitive.  So don't worry about John vs. JOHN vs. john vs. JoHN, the
> same regex will match them all without any fancy regex footwork.
>
>
> Some additional scorefile format notes:
>
> * Unfortunately for some, understanding regular expressions is really
> necessary to take full advantage of scoring, particularly when editing
> the scorefile itself, but it's worth it, and pan's GUI does allow simple
> scoring even if you don't know regex.
>
> It's up to you. =:^)
>
> * The note in section 1.1 recommending that one stick to the overview
> headers (typically subject/from/date/message-id/references/bytes/lines
> and often xref), but allowing others, most definitely applies.
> Unfortunately it's a technical limitation of the protocol, not something
> pan (or slrn or any other news client) can do anything about.
>
> The thing is that pan can score headers in the overview without
> downloading the full message (or full headers).  For the most part,
> that's the headers needed to display the message in the headers pane,
> author, subject, date, etc, plus message-id and references for threading
> and tracking across multiple servers, etc.  But for the more exotic
> headers, pan won't get them, and thus can't score them, until the article
> is downloaded to cache.
>
> So if you have an abuser that keeps nym-shifting and otherwise
> deliberately changing everything in the headers he has access to, in
> ordered to try to avoid killfiling, but who always posts thru a provider
> that adds an xtrace header with a consistent value you can score on, you
> *CAN* score on it, but you'll have to download the messages to cache
> first.
>
> Take it from someone who was in the position of trying to killfile a
> poster like that at one point, before pan could score such non-overview
> headers, being able to ignore-score it, but only after downloading to
> cache, sucks, but it definitely sucks less than having to actually show
> the message in ordered to see who it is and block it!
>
>
> * Note that while you can set an expiry on the score in the pan GUI, and
> at that point pan will indeed quit applying that score, it won't actually
> remove it from the scorefile.  The only way to actually remove the score
> from the scorefile is to manually edit it.
>
> Unfortunately, this does mean that if you actively add expiring scoring
> rules and never manually remove them, eventually your scorefile will be
> cluttered with perhaps hundreds or thousands of expired rules and they'll
> begin to affect score-file loading performance as pan still has to
> process them at least far enough to see they're expired, and then how far
> to ignore until the beginning of the next possibly still valid rule.
>
> So you'll probably want to either clear out the scorefile and start new
> occasionally, or manually edit it to at least clean out the expired rules
> from time to time, or simply don't use expiring scores, just living with
> it unless it's worth a permanent rule.
>
> * Yes, an initial % on a line *DOES* mean it's a comment.
>
> By implication, most of the lines pan adds when you add a score via the
> GUI are comments and don't matter for the actual scoring at all.  They're
> only there to aid human readers.
>
> Of course that means you can edit or delete them as you wish, without
> affecting actual operation.
>
> Here, I tend to delete pretty much all of pan's added comments, with the
> exception of the date added comments for expiring scores, since that way
> I can see how long I had set the expiry.
>
> * If you do heavy scoring with lots of rules, using pan's GUI to set them
> up isn't particularly efficient for machine processing.  The example in
> the linked documentation is somewhat more efficient, but it's too short
> to really get the point across.  If you're planning to do a lot of manual
> scorefile editing or simply want to make your scorefile more efficient,
> either check past scoring threads for this list/group (the list is
> available as a newsgroup on news.gmane.org) where I've posted a longer
> example from my scorefile, or ask for such an example.
>
> * Similarly, if you're not good with regular expressions and need some
> help designing a score that's more complex than you can easily do with
> the pan GUI, or if something's just not working as you expected it to,
> with scoring or something else, ask for help.  We've dealt with a number
> of such queries over the years. =:^)
>
>
> OK, so hope that's of help.  Some people just want an answer to plug in
> without understanding it.  Others want to understand what's going on, so
> next time they want to do something similar but not identical, they can
> figure out how to do it themselves.  I'm certainly in this latter group,
> and my posts tend to go to the extreme in explaining things.  That
> frustrates the first group, but I've stacks of thanks from people who
> preferred the better understanding my explanatory if extremely verbose
> style gave them, and sometimes I get new insights or ideas (like possibly
> patching the score coloring to the whole line instead of just the score
> column, above) as I'm writing things down, and it's the combination of
> both of those that's my motivation to keep posting as I do. =:^)
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
>
>
> _______________________________________________
> Pan-users mailing list
> address@hidden
> https://lists.nongnu.org/mailman/listinfo/pan-users



-- 
Hilsen / Regards
Dieter
http://www.dieterbritz.dk
[Prev in Thread]
Current Thread
[Next in Thread]
[Pan-users] Kill files, Dieter Britz, 2017/04/24
- Re: [Pan-users] Kill files, Holger Hoffstätte, 2017/04/24
  - Message not available
    - Re: [Pan-users] Kill files, Pedro, 2017/04/24
- Re: [Pan-users] Kill files, Duncan, 2017/04/25
  - Re: [Pan-users] Kill files, Dieter Britz <=
  - Re: [Pan-users] Kill files, mick, 2017/04/25
Prev by Date: Re: [Pan-users] Kill files
Next by Date: Re: [Pan-users] Kill files
Previous by thread: Re: [Pan-users] Kill files
Next by thread: Re: [Pan-users] Kill files
Index(es):
- Date
- Thread