bug-parallel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

GNU Parallel Bug Reports why is parallel invoking a shell **by default**


From: Stephane Chazelas
Subject: GNU Parallel Bug Reports why is parallel invoking a shell **by default** and associated bugs
Date: Sat, 23 May 2015 21:50:51 +0100
User-agent: Mutt/1.5.21 (2010-09-15)

Hello,

When you do:

seq 10 | parallel cmd

It looks like parallel goes all the trouble of spawning a shell,
(taking extra trouble deciding which one to use), build a command
line that looks alright for that shell with the cmd and
arguments on stdin properly quoted.

Why? Or at least why do it by default, and why not give the user
the option to disable that?

Most of the time, when you do:

seq 10 | parallel cmd

You intend it to run ["cmd", "1"], ["cmd", "2"]... in parallel,
not ["myshell", "-c", "cmd 1"], ["myshell", "-c", "cmd 2"],
hoping that the shell will eventually (after initialisation,
loading libraries, startup files...) run ["cmd", "1"]...

I understand, there are cases where you may want to use shell
constructs in there, but that's not the common case. I'd be more
than happy to do things like:

seq 10 | parallel sh -c 'cmd "$1" > "$1.out"' sh

for instance, rather than hoping parallel does the right thing
and spawns the right shell in

seq 10 | parallel 'cmd {} > {}'

At the moment, depending on the shell (and it's not always clear
which one you'll get) there are a few bugs.

For instance with zsh:

$ printf '=z\n'  | PARALLEL_SHELL=zsh parallel  'printf "<%s>\n"'
zsh:1: z not found

In zsh, a leading = is a globbing operator that is not currently
escaped by parallel.

With csh/tcsh:

$ printf 'a\nb\0'  | PARALLEL_SHELL=tcsh parallel -0 'printf "<%s>\n"'
Unmatched '.
Unmatched '.

(good luck to get the quoting right with csh)

With rc/es/akanga:

$ printf "'"  | PARALLEL_SHELL=rc parallel -0 "printf '<%s>\n'"
line 1: eof in quoted string near eof

rc/es have only one kind of quotes: single quotes

Even with POSIX shells, the quoting will only be right in the
main context.:

~$ printf '%s\n' '\' '\x' | PARALLEL_SHELL=sh parallel  'printf "<%s>\n" 
"`printf \"<%s>\n\" {}`"'
<<\>>
<<x>>
~$ printf '%s\n' '\' '\x' | PARALLEL_SHELL=bash parallel 'printf "<%s>\n" 
"`printf \"<%s>\n\" {}`"'
<<>>
<<x>>

(inside backtick, you need another level of escaping for \ (and
`, $, ")).

There are of course contexts that parallel can't always get
right like typeset -A a; a[{1}]=1... (though of course one can
find work-arounds).

There's the problem of empty arguments:

$ printf '%s\n' 1 2 3 a '' c A B C | PARALLEL_SHELL=sh parallel -n3 'f() { 
IFS=,; echo "$# $*";}; f'
3 1,2,3
2 a,c
3 A,B,C

None of those have issues if you use

... | xargs -P4 -d '\n' -n "$n" cmd

or if you do need a shell:

... | xargs -P4 -d '\n' -n "$n" sh -c 'cmd "$@"' sh

My problem with the parallel way is that it's trying to be too
smart.

While it works in most of the common cases, that means a
significant overhead (when the whole point of using a
"parallel" command is to improve performance) for little benefit
(see echo x | strace -fe process parallel echo for instance
compared to echo x | strace -fe process xargs -P4 echo).

And that means when you want reliability even in corner cases,
you need to double-guess what "parallel" will do and try to
outsmart it which defeats the whole thing.

Now that decision may come from the fact that parallel does
support working over "rsh/ssh" and there you can't avoid the
shell (and can't know which one you'll get), but even then
I'd say it would have been preferable to turn that ssh mode into
a non-shell mode (might be difficult if supporting the 4 main
shell families (Bourne, csh, rc, fish though), rather than
making the non-ssh mode a shell one.

Sorry, that was a lot of ranting, and not much constructive in
there. Now, to sum up, I'd say there are a few things that can
be corrected without much effort like:

- escape that = for zsh
- document that shells of the rc family are not supported
- document that multiline arguments are not supported with
  csh/tcsh (no point in using -0 with csh)
- escape empty arguments as ''
- document limitations when using {} in some shell contexts.

A feature that I would really welcome would be some --no-shell 
option that skips all that business about running a shell and
building a correct command-line for it and just executes the
command (a bit like xargs).

Best regards,
Stephane




reply via email to

[Prev in Thread] Current Thread [Next in Thread]