pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Pan-users] Thanks and comments/offer


From: Lee Reynolds
Subject: Re: [Pan-users] Thanks and comments/offer
Date: 15 Feb 2003 20:47:45 -0700

Sorry about the HTML mail.  I thought I had it turned off, but I've
upgraded my machine, reinstalled evolution, and missed turning it off...
I didn't even realize it was on, so thanks.

I struggled with my duplicate media system as well.  Ended up with a
program to scan targeted directories, store the location and checksum,
another to rescan the machine to see what had been moved, removed, etc..
I think I have 2 different programs that perform 6 or 7 different
functions. 

As far as duplicates, I'd propose that we start with a clean database,
at each startup of pan, nothing stored anywhere. Options to turn on
duplicate checking, with a file size limit, perhaps per newsgroup.  This
way, folks that DL big binaries won't be burdened with file checking on
newsgroups that would significantly reduce speed.  We could also put a
reasonable limit on checksum calculation.  Duplicates will be checked on
files smaller than 1024K (user selectable at whatever threshold they
choose).  

In operation, we only cache a checksum on files that have been Dl'd that
session, or when a duplicate name is detected (based on size limits).
This alone will catch a large amount of duplicates that most users will
run across.  I don't know enough about NNTP headers yenc, and other
encode technologies to determine if there's someway to detect a
duplicate before download, so this won't save bandwidth, just looking
through a batch of images multiple times.   That way there's nothing
stored that can be reported, except that session's information, which I
hope I never see anyway.  

Anyway, just a thought.

Lee

On Sat, 2003-02-15 at 19:41, Duncan wrote:
> On Sat 15 Feb 2003 09:41, Lee Reynolds posted as excerpted below:
> > Another nice feature of NewsBin pro that should be easier to implement
> > was the ability to eliminate ie.. not save images in a download
> > directory that were exact duplicates.  How difficult, or where would I
> > look to add a few lines of code that would check for duplicate filenames
> > (already done with the _copy_ tags) and a CRC-64 checksum on a selected
> > saved file with the the temporary downloaded file, before it's saved?
> 
> This has been talked about a bit.  The db switch could be used here, as well, 
> I suppose, as one way to do it would be to track MD5 or whatever checksum and 
> size, and probably have a three way always/ask/never option on what to do if 
> there was a match.  (The never might turn off tracking, or that might be a 
> separate option, for the paranoid, and those that d/l such volumes that the 
> checksum database itself becomes a performance issue.)
> 
> A reason for that method, of course, would be that it would be name-neutral, 
> so renamed pix wouldn't be saved, either.  Thus, the tracking at save.  The 
> other alternative is to do a check if there's a name colision, which would 
> certainly be better than present, but would retain far more duplicates as 
> stuff is cycled out of the d/l target location for archiving, or across 
> different groups.  
> 
> The flip side is the potential IDing of saved material now off the machine, 
> based on the tracking d/b.  This has already come up in regards to yenc and 
> the way some readers handle it with similar anti-duplicate tracking.  The 
> paraniod see that db, and wonder just what all goes into it, and if there's 
> an ulterior motive and reporting of the data, to someone at some point.  
> Formerly, I would have figured only certain folks in China, etc, would have 
> to worry about that, but with Ashcroft and co.'s anti-privacy anti-freedom 
> activities of late, one does have to wonder how long it will be until those 
> types of things are commonly used here, as well.
> 
> BTW, PLEASE turn off your HTML!!  Do you realize how HTML has contributed to 
> security problems in mail and news?  How many vulns would OE and Outlook have 
> if they didn't render HTML?  How does ZERO strike you?  HTML mail is a 
> favorite if the spammers, both because it allows tracking (if your reader 
> allows scripting or off-site fetching of images, for instance), and because 
> it allows them to do the equivilant of placing flashing lights around their 
> mail, in an effort to get attention.  If it can't get my attention in plain 
> text, I have better things to do with my time.  I filter HTML mail to my 
> trash folder (you'd be amazed at how effective just that single filter is for 
> spam), and that's where yours went.  Only the fact that I scan the subject 
> lines b4 closing my mail client and having them auto-deleted, saved your post 
> to be read.
Thanks,
Lee Reynolds





reply via email to

[Prev in Thread] Current Thread [Next in Thread]