bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: feature request: -0 option for tr


From: Bob Proulx
Subject: Re: feature request: -0 option for tr
Date: Sun, 28 Jun 2009 12:10:26 -0600
User-agent: Mutt/1.5.18 (2008-05-17)

Craig Sanders wrote:
> Eric Blake wrote:
> > According to Craig Sanders:
> > > please add a -0 option to tr, which is equivalent to
> > > running:
> > > 
> > >     tr '\n' '\000'
> > 
> > Why should we burn an option letter, when it is not that much more typing
> > to get what you wanted anyways?  
> 
> because it's a PITA to have to remember and type "tr '\n' '\000'" every
> single time when 'tr -0' would be easier,

If this is something that you do often then it is good that you create
a shell script for you to use.  But if it isn't used by enough other
people then it hasn't the activation energy to become a part of the
shared universe of tools.

> and mnemonic. transforming newline separated input into null
> separated input is such an extremely common use-case that a
> convenience feature to support it is worthwhile.

Hmm...  I disagree that it is very common.  Anecdotally I can't recall
ever having needed that particular feature in many years of hard core
script writing.

> and it's not like option letters are in short supply for tr. tr
> currently only uses four (plus the usual --version, --help etc).

The issue there is that interfaces are forever.  Caution at adopting
single letter options is a good thing.

> > An option letter makes the most sense only when there is no other easy
> > way to do the same task.
> 
> GNU grep has -z and -Z options - by your reasoning above, these
> convenience options are completely unecessary because you can easily
> run "tr '\0' '\n'" before grep and "tr '\n' '\0'" after grep.
> the same can be said of GNU sort's -z option.

That is incorrect.  They are not the same.  In the case of grep and
sort they are preserving information data exactly.  In your suggestion
of converting the data before or after it doesn't preserve information
data exactly and has failure modes such as the handling of embedded
newlines.  If you don't care about embedded newlines that is fine but
the options you noted above do care about that case.

> and tr itself is pointless because you can do everything it does
> with sed.  and grep can be seen as just a convenience subset of
> either sed or awk for wusses who don't want to type long command
> lines.

Agreed.  But I think sed duplicated the functionality of tr and not
the other way around and so already the interface for tr was set.
And often grep is used inappropriately.  Unfortunate but true.  You
make many good points about mistakes of the past that we should learn
from and avoid making today for the future.

> and both of them are mere conveniences for people who don't
> want to write their own C program...which is again a convenience for
> people who don't want to write assembler.

Sorry but you are drifting off topic.  You have here for the first
time brought up C and assembler programming.  But that point has not
been previously introduced into the discussion.  At this point that is
just a FUD statement.  Please don't do that.  It weakens your claims.

> > > this is a useful command for converting \n-terminated input lines to
> > > null-terminated strings suitable for feeding into 'xargs -0' as many
> > > programs can not generate null-terminated ouput by themselves.
> > 
> > This proposal doesn't really buy you anything.  
> 
> actually, it does. it buys you a standard, consistent, and documented
> way of transforming newline-separated input into null-separated
> output for feeding into xargs -0 (or any other program which can use
> null-separated input).

Sorry but it does not.  It would be a GNU extension and would *NOT* be
standard.  (I am sure that it would be well documented though. :-) You
would not be able to use it in portable scripts with any expectation
of success.  There are several ways to do this in a standard way and
using tr as previously noted is one way.

Conversion of newline terminated data to zero terminated data is
problematic because the data in the newline terminated form is
problematic.  Increasing the encoding to the more strict zero
terminated data encoding to handle arbitrary data doesn't really help
in that case.  Things are still no better than if newline terminated
data were used throughout.  The tools already handle newline
terminated data acceptably throughout.  So things could or should
either always be in zero terminated format from start to finish
without conversion or you might as well just use newline terminated
data throughout.

> it also buys you that feature on every system that has GNU coreutils
> installed without having to write a trivial print0 shell-script on each
> system.

True.  But I don't think the trade-off is worth it.  You have not
convinced me.

I also fear that if such a feature/program existed it would confuse
people into thinking that in general they can convert from newline
terminated data into zero terminated data and that this would be
correct in the general case.  Obviously it doesn't improve upon the
newline terminated data case.

> > Either the output is already nul-terminated (in which case you don't
> > need an extra tr process in the mix), or it is newline terminated (in
> > which case you should just use plain xargs instead of 'xargs -0',

Agreed.

> or it is newline-separated because one or more of the commands in the
> pipeline you've constructed don't handle null-terminated input but the

Then at that point you are introducing a bug.  Data with included
newlines will not be delimited properly and will be split.  Please
consider other options to avoid introducing that type of problem.

> final result may have spaces or other problematic chars in the filenames
> so needs to be fed into xargs with -0.

Consider using the GNU xargs -d "\n" option.  However it isn't standard.

> there are dozens (hundreds!) of tools which can be used to
> create/transform lists of filenames etc(*) which can not handle
> null-terminated input or produce null-terminated output. using tr as the
> final stage of the pipeline before feeding it into xargs -0 makes it
> possible to use all of them without hassle or complication.

But as Eric pointed out doing that translation does not gain
anything.  If you don't care about using zero terminated strings to
preserve data exactly then you don't need to use them at all.  Why
introduce a layer that is effectively a "noop"?  As Nike didn't say,
"Just don't do it."

> > and you already have to worry about the potential for filenames with
> > embedded newlines).
> 
> filenames with embedded newlines are an extremely rare pathological case
> beyond the scope of this request.

If so then you don't need to introduce a layer converting to and from
zero terminated strings either.

> spaces and quote characters and even backslashes in filenames are far
> more common, especially on systems were files are uploaded by users from
> non-unix systems (e.g. ftp upload to web servers, samba file servers,
> etc)

Yes.  And cursed they are but so it is. :-)

> yes, i've already written variations of that script. in fact, i've
> written it hundreds of times over the years because i've needed it on
> hundreds of different systems.
> 
> i'm just sick and tired of having to write it again and again (or scp it
> from somewhere else) when it's something that should be standard.  so
> i submitted a feature request.

Me thinks that perhaps you have writer's block and need something to
kick yourself out of it and to think about the problem differently.
You have latched onto a style of coding that I don't see others using.
It probably isn't the best way of doing things then.  (I have been
there many times.)

As I read it you are using an encoding of converting newline
terminated data to zero terminated data and then using the commands
that work with zero terminated data.  Which is okay as it stands.  But
that doesn't make sense to me.  Since you want to work with newline
terminated data you could just use the tools as they are presently to
work with newline terminated data.  Don't try to force them into a
zero terminated model as a facade over the underlying newline
terminated model.

> it's also irksome that a shell interpreter has to be involved.

Perhaps if you feel that way you should consider writing in native
Perl, Python or Ruby then never shelling out to external commands?
The shell is the glue that binds the Unix universe together.
Resisting it is like resisting gravity.

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]