coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: stdbuf feature request - line buffering but for null-terminated data


From: Kaz Kylheku
Subject: Re: stdbuf feature request - line buffering but for null-terminated data
Date: Tue, 12 Mar 2024 09:41:42 -0700
User-agent: Roundcube Webmail/1.4.15

On 2024-03-09 08:30, Zachary Santer wrote:
> 'stdbuf --output=L' will line-buffer the command's output stream.
> Pretty useful, but that's looking for newlines. Filenames should be
> passed between utilities in a null-terminated fashion, because the
> null byte is the only byte that can't appear within one.
>
> If I want to buffer output data on null bytes, the closest I can get
> is 'stdbuf --output=0', which doesn't buffer at all. This is pretty
> inefficient.
> 
> 0 means unbuffered, and Z is already taken for, I guess, zebibytes.
> --output=N, then?
> 
> Would this require a change to libc implementations, or is it possible now?

Yes, because stdbuf changes stdio stream buffering modes.

There is no null byte flush mode in standard C, nor as a GNU extension.

The null byte flush mode idea is interesting, separately from
whether it is controlled by stdbuf.

I would say that if it is implemented, the programs which require
it should all make provisions to set it up themselves.

stdbuf is a hack/workaround for programs that ignore the
issue of buffering. Specifically, programs which send information
to one of the three standard streams, such that the information
is required in a timely way.  Those streams become fully buffered
when not connected to a terminal.

Programs can have the issue for other streams, like log files
that they explicitly open. stdbuf won't fix that.

The main reasons for wanting messages sent without delay is
so that information is available in real time, so that a user
sees an important prompt on the terminal before being 
asked for input, or so that a log message is flushed before
a crash occurs. Or so that log messages from multiple sources
are "chronologically clustered" with a decent granularity
that they can be correlated.

There can be a performance issue also, though! Suppose
we run "find" to find certain files over a large file tree.
It finds only a small number of files: all the file paths
identified fit into a single buffer, which is not flushed
until the program terminates (when sent to a pipe).
 
We pipe this to some program which does some processing
on those files. We would like the processing to start as
soon as the first file has been identified, not when find is done!
It could be that find discovers all the relevant files
early in its execution and then spends a minute finding
nothing else. That minute is added to the processing time
of the files that were found.

That is the compelling reason for wanting file names to
be flushed individually, whether they are newline terminated
or null terminated.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]