bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#17145: head fails with implicit stdin on darwin


From: Pádraig Brady
Subject: bug#17145: head fails with implicit stdin on darwin
Date: Mon, 31 Mar 2014 13:32:50 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130110 Thunderbird/17.0.2

On 03/30/2014 09:40 PM, Denis Excoffier wrote:
> Hello,
> 
> head -n -1 -- -
> or equivalently
> head -n -1
> returns immediately (ie does not wait for further stdin) and prints nothing.
> 
> I use coreutils 8.22 compiled (with gcc-4.8.2) on top of darwin 13.1.0 
> (Mavericks).
> 
> However the following seem to work perfectly:
> head -n 1
> head -c -1
> cat | head -n -1
> head -n -1 ---presume-input-pipe
> on cygwin: head -n -1
> 
> What is weird on my system is lseek() at the beginning of 
> elide_tail_lines_file():
> lseek(fd, 0, SEEK_CUR) returns a (random?) number, something like 6735, 539 
> etc.
> lseek(fd, 0, SEEK_END) returns 0

So:
  head -n -1 # returns immediately
while:
  cat | head -n -1 # waits as expected

It seems we might be using non portable code here. POSIX says:

  "The behavior of lseek() on devices which are incapable of seeking is 
implementation-defined.
  The value of the file offset associated with such a device is undefined."

and also:

  "The lseek() function shall fail [with ESPIPE if] the fildes argument is 
associated with a pipe, FIFO, or socket"

So tty devices would come outside of this POSIX scope.
Furthermore the FreeBSD lseek man pages states:

  "Some devices are incapable of seeking and POSIX does not specify which
   devices must support it.

   Linux specific restrictions: using lseek on a tty device returns
   ESPIPE. Other systems return the number of written characters, using
   SEEK_SET to set the counter. Some devices, e.g. /dev/null do not cause
   the error ESPIPE, but return a pointer which value is undefined."

Now head(1) isn't the only place we use this logic. In dd we have:

  offset = lseek (STDIN_FILENO, 0, SEEK_CUR);
  input_seekable = (0 <= offset);

I wonder should be be using something like:

  bool seekable (int fd)
  {
    return ! isatty (fd) && lseek (fd, 0, SEEK_CUR) >= 0;
  }

Though this only handles the tty case, and there
could be other devices for which this could be an issue.
So the general question is, is there a way we can robustly
determine if we have a seekable device or not?
Perhaps by using SEEK_SET in combination with SEEK_CUR,
but notice the BSD lseek man page above says that tty devices
support SEEK_SET also :/ Anyway...

Note the original head(1) code to detect seekable input was introduced with:
  http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commit;h=61ba51a6
and that was changed recently due to a coverity identified logic issue, to:
  http://git.sv.gnu.org/gitweb/?p=coreutils.git;a=commit;h=5fdb5082
However that now logically consistent code will return immediately in your case.

I also notice the related `head -c -1` check is more conservative
in that it only uses the more efficient lseek() code for regular files,
which would mean we don't operate as efficiently as we could on a disk device
for example. But that's much better than undefined operation of course.
If we were to do the same for lines then we would also introduce a change
in behavior with devices like /dev/zero. Currently on Linux, this will return 
immediately:
  head -n -1 /dev/zero
I.E. we currently treat such devices as empty, and return immediately with 
success status,
whereas treating as a stream of NULs, would result in memory exhaustion while 
buffering
waiting for a complete line. That is probably the more consistent operation at 
least.

So the attached uses this more conservative test for the --lines=-N case.

thanks,
Pádraig.

Attachment: head-tty-bsd.patch
Description: Text Data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]