pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] new post counts not adding up


From: Duncan
Subject: Re: [Pan-users] new post counts not adding up
Date: Sun, 5 May 2013 08:30:43 +0000 (UTC)
User-agent: Pan/0.140 (Chocolate Salty Balls; GIT b00f96e /usr/src/portage/src/egit-src/pan2)

memilanuk posted on Sat, 04 May 2013 19:00:23 +0000 as excerpted:

> Most of the time this seems to work... but in groups/lists with lower
> traffic I started noticing that the number of new articles shown in
> parentheses next to the group name doesn't always match the sum of the
> number of unread posts shown in parentheses next to each collapsed
> thread.

> Since I have no scoring rules in effect at this point, I don't think
> that there are new posts not being shown because of that.
> 
> How can I get Pan to show new posts in both threads and sub-threads at
> the same time, without having to go and manually toggle the settings
> back and forth?

Thanks for specifying that you don't have any scoring in effect.  That 
was my first thought.

There's three additional factors that apply.  I'd guess one or more is/
are what you are seeing here.

First, it's worth noting that in collapsed threads, the unread count is 
the count of NOT SHOWN unread posts.  If the visible post is ITSELF 
unread, it won't be included in that number.  Thus, for example if an 
entire thread with 10 posts is unread, the number in parenthesis beside 
the initial post will be (9), because the initial post itself is shown -- 
there's 9 /hidden/ posts marked unread that will be shown if the thread 
is expanded.

That throws me off occasionally as well, because I intuitively expect it 
to be the total unread count for the (sub)thread, including the displayed 
header if that message too is unread.

Second, note that for multi-part posts, generally of large binary 
attachments, pan transparently combines them into one for display.  So a 
message with 100 individual parts to combine to get the attachments, will 
display as a single post.[1]  Before pan actually fetches the group 
overview file, all it sees is that the highest post sequence number for 
that group is N higher than it was the previous time, and it simply does 
the math to deduce how many posts there are between the two.  But once 
pan fetches the overview file, it can do this combining, which will 
reduce the number of unread posts, sometimes significantly, particularly 
for binary groups.

But that mostly applies to binary groups, where multi-part posts are 
common.  In text groups these don't appear so often, so this is unlikely 
to explain discrepancies there.


Which brings me to number three, the most common reason (once #1 is 
accounted for) for such discrepancies in low-traffic text groups, where 
they're most likely to be noticed.

As mentioned in passing above, put in simple terms the way news works is 
that when a client first asks about a newsgroup, it gets a reply that 
gives some information about the group and the articles the server has 
for that group.  This information includes the post sequence numbers for 
the earliest available post, the low water mark, and the newest available 
post, the high water mark.  For subscribed groups, a news client 
typically remembers the high water mark from the last time it connected, 
and can then do the math to deduce the POSSIBLE number of unread articles 
available.  (Of course, if its the first time you visited the group or if 
you haven't visited in awhile and everything from before is already 
expired, so the low water mark is higher than the remembered high water 
mark, the available posts are only those between the earliest and the 
latest that the server has.)

*BUT*, for various reasons, while the post sequence numbers are indeed in 
sequence, ever increasing, a server DOES NOT NECESSARILY have ALL the 
posts in a particular range available.  A user may have canceled or 
superseded a post, for instance, or the server may apply spam filters 
after assigning sequence numbers, deleting articles from the middle of 
the sequence.  It's actually quite common for larger news service 
providers to have dedicated incoming post machines that assign the 
sequence numbers, before forwarding the posts to filter machines that 
filter out the spam, thus creating holes in the sequence, which then 
forward on to the front-ends that the users and their news clients 
actually connect to.  In such a setup, it's also quite possible for the 
posts to appear at the front-ends in out-of-sequence order, such that say 
# 43561 and 43562 appear before # 43553-43560.  Then # 43557 and 43559 
are filtered as spam, and # 43554 is canceled by the original poster, 
leaving posts numbered 43553,43555-43556,43558 and 43560, appearing after 
# 43562, which all the while is still the high-water-mark.  And # 43554 
might appear, then disappear when the cancel gets processed.

When pan first checks that group, it'll see # 43562 as the high-water-
mark and do the math, displaying the resulting unread count.  Then when 
it fetches the actual overview file, pan learns what messages are 
actually available.  The number of unread messages can then drop, or 
possibly go up if there's more backfills then new message number gaps.  
Then later, when you actually try to fetch those messages, messages in 
the overview may now be unavailable.  Old ones may have expired, or maybe 
someone canceled a message, or perhaps a scanner detected copyrighted 
content and issued a takedown, or futher spam-filtering was applied.

Thus in general the unread count can only be a rough estimate, found 
initially by simply doing the math between new highwater count and either 
old highwater count or new lowwater count, as appropriate.  (Actually, 
the server passes its estimate of the number of messages between its 
lowwater and highwater counts as well, but that's an estimate as well.  
The standard requires that there be no MORE messages than that available, 
but it specifically allows there to be less, and many servers simply do 
the same math to arrive at their estimate, even tho in theory they have 
enough information available to pass a more accurate count if they wanted 
to.)

---
[1] The terminology here isn't fully standardized and can be confusing.  
An extremely large binary file like an ISO image is often pre-split into 
multiple files before posting, with each of these pre-split parts posted 
separately.  The individually posted parts are then often automatically 
split by the posting software, with each individual message post only 
containing an incomplete attachment that must be combined with the others 
in ordered to retrieve the pre-split file part.  It's this automatic 
splitting that pan detects and displays as a single post, not the pre-
split parts.  Of course complicating things is the fact that multi-part 
can also refer to the reverse, a single article containing multiple 
parts, say a plain-text message, the HTML form of the same message, and 
various attachments for the images linked into the HTML message, all 
appearing as part of the same single posted message.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman




reply via email to

[Prev in Thread] Current Thread [Next in Thread]