[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-bash] Fastest way to join an array into a string
From: |
Peng Yu |
Subject: |
Re: [Help-bash] Fastest way to join an array into a string |
Date: |
Tue, 27 Aug 2019 08:46:00 -0500 |
On 8/27/19, Stephane Chazelas <address@hidden> wrote:
> 2019-08-26 13:47:36 -0500, Peng Yu:
> [...]
>> echo -n "$1"
>
> You can't use echo for arbitrary data. Depending on the
> environment and how bash was built, that will either
> - output that -n
> - fail if $1 is -n, -E, -e or any combination like -neEneE
> - fail if $1 contains backslashes
>
> Use printf again: printf %s "$1"
>
>> shift
>
> You'll get an error if $@ is the empty list (no argument).
>
>> if (($#)); then
>> declare x
>
> Why "declare x"
>
>> printf "${separator//%/%%}%s" "$@"
> [...]
>
> You also need to escape \. You're also missing the newline
> delimiter.
>
> Note that if you're printing it, that means that if you need to
> store it in a variable, you'll need to use command substitution,
> which involves running an extra process and writing/reading the
> data through a pipe.
>
> Even if you use printf -v, like in:
>
> join_into() {
> local -n _var="$1"
> local _sep="$2"
> shift 2
> if (($# > 1)); then
> sep=${sep//%/%}
> sep=${sep//\\/\\\\}
> printf -v _var "${separator//%/%%}%s" "${@:2}"
> _var=$1$_var
> else
> _var=$1
> fi
> }
>
> I find it's still orders of magnitude slower than the approach that uses
> ${*/#/$sep}.
This assumes that the
>
> On my system, it is significantly slower than invoking perl or python
>
> like
>
> result=$(perl -le 'print join shift, @ARGV' -- sep "$@")
>
> or
>
> result=$(python -c 'import sys; print(sys.argv[1].join(sys.argv[2:]))' sep
> "$@")
So something like this will fix the bugs that you mentioned?
function strjoin/format {
declare separator=$1
shift
if (($#)); then
printf '%s' "$1"
shift
if (($#)); then
separator=${separator//%/%%}
printf "${separator//\\/\\\\}%s" "$@"
fi
fi
[[ $nonewline ]] || echo
}
function strjoin/replace {
declare separator=$1
shift
if (($#)); then
printf '%s' "$1"
shift
if (($#)); then
printf '%s' "${@/#/$separator}"
fi
fi
[[ $nonewline ]] || echo
}
Indeed, the replace version is slightly faster.
$ time for((i=0;i<10000;++i)); do
strjoin/format $'\t' a b c
done > /dev/null
real 0m0.795s
user 0m0.771s
sys 0m0.020s
$ time for((i=0;i<10000;++i)); do
strjoin/replace $'\t' a b c
done > /dev/null
real 0m0.744s
user 0m0.721s
sys 0m0.020s
I don't think calling external programs will be fast especially when
the number of strings to join is small. In the following run,
`strjoin` is a function that does the join with a default separator of
$'\t'. As you can see it is much faster than python and perl. It is
also interesting to see that perl is much faster than python, which
probably is because perl's startup time is faster than python.
time for ((i=0;i<100;++i)); do
strjoin a b c
done > /dev/null
real 0m0.020s
user 0m0.018s
sys 0m0.001s
time for ((i=0;i<100;++i)); do
python -c 'import sys; print(sys.argv[1].join(sys.argv[2:]))' $'\t' a b c
done > /dev/null
real 0m5.171s
user 0m2.004s
sys 0m2.496s
time for ((i=0;i<100;++i)); do
perl -le 'print join shift, @ARGV' -- $'\t' a b c
done > /dev/null
real 0m0.966s
user 0m0.315s
sys 0m0.353s
--
Regards,
Peng