bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug#548439: Inotify support breaks tail -F on syslog


From: Jim Meyering
Subject: Re: Bug#548439: Inotify support breaks tail -F on syslog
Date: Sun, 15 Nov 2009 14:04:51 +0100

Arjan Opmeer wrote:
> Since coreutils 7.5 tail uses inotify. However it is not capable to follow
> the sequence of file operations that happen when /var/log/syslog is rotated
> by its daily cron job. This means that 'tail -F /var/log/syslog' will stop
> producing output as soon as the file is rotated.
>
> Let me give a demonstration. Open two terminal windows and enter the
> following commands:
>
> Terminal 1                              Terminal 2
> ----------                              ----------
>
> echo "line 1" > somefile
> tail -F somefile
>
>                                         echo "line 2" >> somefile
>                                         savelog -m 640 -c 7 somefile
>                                         echo "line 3" >> somefile
>                                         echo "line 4" >> somefile
>
> This results in the following output from the tail command:
>
> line 1
> line 2
> tail: `somefile' has been replaced;  following end of new file
>
> Notice that tail fails to follow the newly created somefile and never shows
> lines 3 and 4. That this is caused by the new inotify code can be
> demonstrated by using the latest git checkout of coreutils. This version
> still fails the above scenario, but by supplying the undocumented
> '---disable-inotify' flag the inotify code is disabled and the old polling
> behaviour is used. This old behaviour can cope with the logrotate sequence
> and will correctly show the contents of the new file.
>
> Could you use this information to file a bug report for GNU coreutils to
> have the inotify implementation of tail fixed? In the meantime should the
> Debian version of tail maybe be build without inotify support because it no
> longer functions the way that users have come to expect?

Thanks a lot for that bug report.
At first I thought it must have been fixed, because a few
attempts to reproduce it showed no problem.
However, with repeated tests, I'd see a failure in ~1 of 7 trials
on one system, and about 1 of 4 on another.  Those were multi-core
systems.  I found it much easier to trigger the failure on a slower,
single-core system.


touch k x && timeout 1 strace -o log-$(date +%j.%T) ./tail -F k & \
echo b >> k; sleep .1; env mv x k; sleep .1; echo ok >> k

Here's output from a hacked tail.c that decodes the events
and demonstrates the problem.  The first one worked, and the
following one, run just a couple of seconds later failed:

=============================  ok  ============================
read(4, "\2\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0\2\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0", 36) 
= 32
write(2, "IN_ATTRIB 0\n", 12)           = 12
write(2, "name: ***\n", 10)             = 10
write(2, "IN_DELETE_SELF 0\n", 17)      = 17
write(2, "name: ***\n", 10)             = 10
read(4, "\2\0\0\0\0\200\0\0\0\0\0\0\0\0\0\0", 36) = 16
write(2, "IN_IGNORED 0\n", 13)          = 13
write(2, "name: ***\n", 10)             = 10
read(4, "\3\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0", 36) = 16
write(2, "IN_MODIFY 0\n", 12)           = 12
write(2, "name: k\n", 8)                = 8
fstat(5, {st_mode=S_IFREG|0644, st_size=3, ...}) = 0
read(5, "ok\n", 8192)                   = 3
write(1, "ok\n", 3)                     = 3
read(5, "", 8192)                       = 0
read(4, 0x12d21a0, 36)                  = -1 EINTR (Interrupted system call)

=============================  bad  ============================
read(4, "\2\0\0\0\4\0\0\0\0\0\0\0\0\0\0\0", 36) = 16
write(2, "IN_ATTRIB 0\n", 12)           = 12
write(2, "name: ***\n", 10)             = 10
read(4, "\3\0\0\0\0\10\0\0\0\0\0\0\0\0\0\0", 36) = 16
write(2, "IN_MOVE_SELF 0\n", 15)        = 15
write(2, "name: k\n", 8)                = 8
inotify_rm_watch(4, 3)                  = 0
open("k", O_RDONLY|O_NONBLOCK)          = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
close(3)                                = 0
read(4, "\3\0\0\0\0\200\0\0\0\0\0\0\0\0\0\0", 36) = 16
write(2, "IN_IGNORED 0\n", 13)          = 13
write(2, "name: ***\n", 10)             = 10
read(4, "\2\0\0\0\0\4\0\0\0\0\0\0\0\0\0\0", 36) = 16
write(2, "IN_DELETE_SELF 0\n", 17)      = 17
write(2, "name: ***\n", 10)             = 10
read(4, "\2\0\0\0\0\200\0\0\0\0\0\0\0\0\0\0", 36) = 16
write(2, "IN_IGNORED 0\n", 13)          = 13
write(2, "name: ***\n", 10)             = 10
read(4, 0x13f91a0, 36)                  = -1 EINTR (Interrupted system call)

Here's a patch that fixes it.
I'll write a test before pushing the result.
So far, I've used this:

#!/bin/sh
for i in $(seq 1000); do
  d=d-$i
  mkdir $d
  ( cd $d
    :>k && :>x && timeout 1 ../tail -F k > out 2>/dev/null &
    echo b >> k; sleep .01; env mv x k; sleep .01; echo ok >> k
    sleep .01; grep ok out > /dev/null && ok=. || ok=X; printf $ok
  )
  rm -rf $d
done

Interestingly, on one system (modern, quad-core), it printed this for the
unpatched version of tail (replacing ../tail above): "X" represents failure
........................X.....................................................\
........X...............................XX....................................\
....................................XX......................X.................\
...................X...................XX.............X.............X.......X.\
.................................X..............................X.....X.......\
............................X.......X...X................................X....\
..............................................................................\
..........................................X...................................\
..X...................................X.......................................\
.X..................................................X.....................X.X.\
.......X..............................X.......................................\
......................X....X...........................X......................\
.................X....X.XX.......X......X.....................X.


Running the same script on an aging uniprocessor system gave this:

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\
X.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\
XXXXXXXXXXXXXXXXXXXXXXXX......X.....X....XXX.X........................XXXXXXXX\
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\
XXXXX.....XXX.XXXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\
X.XXXXXXXXXXXXXXXXXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\
X.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.X\
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.....\
XXXXXXXXXXXXXXXXXX.XXXXXXXXXXX.XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX\
XX.XXXXXXXXXXXXXXXXXXXX..XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX.X.....X\
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX..X....X.XXXXXXXXXXXXXXXXXXXXXXXXXX\
XXXXXXXXXXXXXXXX.......XXXXXXXXXXXXXXXXXXXX.XX..XXXXXXXXXX.XXXXX


>From 203d51c2425f87811bd388c654466b7d24d1f1a1 Mon Sep 17 00:00:00 2001
From: Jim Meyering <address@hidden>
Date: Sun, 15 Nov 2009 09:25:29 +0100
Subject: [PATCH] tail -F: fix a bug in the inotify-based code

Tailing forever and by-name (--follow=name, -F), tail would
sometimes fail to follow a file that had been removed via rename.
* src/tail.c (tail_forever_inotify): When tailing by name (-F),
do not un-watch a file upon receipt of the IN_MOVE_SELF event.
Reported by Arjan Opmeer in http://bugs.debian.org/548439.
* NEWS (Bug fixes): Mention it.
---
 NEWS       |    5 +++++
 THANKS     |    3 ++-
 src/tail.c |    8 +++++++-
 3 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/NEWS b/NEWS
index c9fe6ca..e3763cd 100644
--- a/NEWS
+++ b/NEWS
@@ -48,6 +48,11 @@ GNU coreutils NEWS                                    -*- 
outline -*-
   were first renamed or unlinked or never modified.
   [The race was introduced in coreutils-7.5]

+  tail -F (inotify-enabled) now consistently tails a file that has been
+  replaced via renaming.  That operation provokes either of two sequences
+  of inotify events.  The less common sequence is now handled as well.
+  [The bug came with the implementation change in coreutils-7.5]
+
   timeout now doesn't exit unless the command it is monitoring does,
   for any specified signal. [bug introduced in coreutils-7.0].

diff --git a/THANKS b/THANKS
index 7955da7..f9bc476 100644
--- a/THANKS
+++ b/THANKS
@@ -52,6 +52,7 @@ Andy Longton                        address@hidden
 Anthony Thyssen                     address@hidden
 Antonio Rendas                      address@hidden
 Ariel Faigon                        address@hidden
+Arjan Opmeer                        address@hidden
 Arne H. Juul                        address@hidden
 Arne Henrik Juul                    address@hidden
 Arnold Robbins                      address@hidden
@@ -401,7 +402,7 @@ Max Chang                           address@hidden
 Meelis Roos                         address@hidden
 Michael                             address@hidden
 Michael ???                         address@hidden
-Michael Bacarella                   address@hidden>
+Michael Bacarella                   address@hidden
 Michael Deutschmann                 address@hidden
 Michael Elizabeth Chastain          address@hidden
 Michael Gaughen                     address@hidden
diff --git a/src/tail.c b/src/tail.c
index 09afeec..9a2f5ae 100644
--- a/src/tail.c
+++ b/src/tail.c
@@ -1404,7 +1404,13 @@ tail_forever_inotify (int wd, struct File_spec *f, 
size_t n_files,

       if (ev->mask & (IN_ATTRIB | IN_DELETE_SELF | IN_MOVE_SELF))
         {
-          if (ev->mask & (IN_DELETE_SELF | IN_MOVE_SELF))
+          /* For IN_DELETE_SELF, we always want to remove the watch.
+             However, for IN_MOVE_SELF (the file we're watching has
+             been clobbered via a rename), when tailing by NAME, we
+             must continue to watch the file.  It's only when following
+             by file descriptor that we must remove the watch.  */
+          if ((ev->mask & IN_DELETE_SELF)
+              || ((ev->mask & IN_MOVE_SELF) && follow_mode == 
Follow_descriptor))
             {
               inotify_rm_watch (wd, f[i].wd);
               hash_delete (wd_table, &(f[i]));
--
1.6.5.2.372.gc0502




reply via email to

[Prev in Thread] Current Thread [Next in Thread]