man-db-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Man-db-devel] Database update profiling


From: Kari Pahula
Subject: Re: [Man-db-devel] Database update profiling
Date: Sat, 7 Dec 2013 23:34:24 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Sat, Dec 07, 2013 at 01:00:11PM +0000, Colin Watson wrote:
> On Fri, Dec 06, 2013 at 04:46:43PM -0500, Francis Giraldeau wrote:
> > Le 2013-12-06 01:48, Kari Pahula a écrit :
> > > None of that code has yet made its way to mandb.
> > 
> > It's a good start, let's try make it ready.
> 
> For what it's worth, I'm actually slightly less interested in the patch
> cleanup.  What I'm more interested in, and the reason I hadn't just gone
> ahead and dealt with Kari's patch directly (sorry for not explaining
> this!) is a more detailed analysis of Kari's comment in the bug report:
> "there's something off with the code and it gives false positives on
> differing mtimes".  What exactly is going on here?

test_manfile gets confused with symbolic links, the files they link to
and their mtimes.  It uses lstat to get a man file's mtime and
compares it to the linked file's stored mtime and proceeds only if
they differ.  It can go through the same db entry multiple times due
to symlinks, but the symlinks and the manfile itself can have
differing mtimes.  The different mtimes get stored to the db on those
passes and whenever it changes, it'll cause the next pass to again run
all the code in test_manfile.

To take one example, I had this on my system:

$ stat -c "%Y %N" man1/sha1.1ssl.gz 
1383329434 ‘man1/sha1.1ssl.gz’ -> ‘dgst.1ssl.gz’
$ stat -c "%Y %N" man1/dgst.1ssl.gz 
1383329399 ‘man1/dgst.1ssl.gz’

And test_manfiles gets run on both sha1.1ssl.gz and dgst.1.ssl.gz and
the database is updated twice, once with each value.

I didn't come up with a fix for this issue.  All my attempts ran into
failures in the test suite.  Perhaps you or Francis have better luck
with it, I'm not going to attempt that one myself, again.  I agree
that figuring out a fix for the mtime issues would be a good thing and
what I did amounted to sidestepping the issue, but I ran out of ideas
for directly confronting this.

> I would really be more comfortable continuing to use mtimes if possible;
> it is the more appropriate stat field to use, as it describes changes to
> the file's contents rather than its metadata.  Using ctimes seems to me
> to be a mistake.

I wasn't suggesting dropping that, but I think that checking ctimes on
a mandb -p run would be a good additional check, prior to even
starting to consider mtimes.  I don't see any scenario where anything
with an older ctime could add anything to that run, and you can't
touch the mtime without bumping the ctime.

I was concerned about false negatives and added that subsecond
checking in there.  I don't know if that'd really be necessary.  Or it
might be safer to just bump the time off by one second and compare
that, to catch anything odd.  Just to pick the thoughts about this
from the top of my head.

This is as far as I got with it.  I hope any of this helps.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]