help-bash
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-bash] Fastest way to join an array into a string


From: Peng Yu
Subject: Re: [Help-bash] Fastest way to join an array into a string
Date: Tue, 27 Aug 2019 08:46:00 -0500

On 8/27/19, Stephane Chazelas <address@hidden> wrote:
> 2019-08-26 13:47:36 -0500, Peng Yu:
> [...]
>>             echo -n "$1"
>
> You can't use echo for arbitrary data. Depending on the
> environment and how bash was built, that will either
> - output that -n
> - fail if $1 is -n, -E, -e or any combination like -neEneE
> - fail if $1 contains backslashes
>
> Use printf again: printf %s "$1"
>
>>             shift
>
> You'll get an error if $@ is the empty list (no argument).
>
>>             if (($#)); then
>>                 declare x
>
> Why "declare x"
>
>>                 printf "${separator//%/%%}%s" "$@"
> [...]
>
> You also need to escape \. You're also missing the newline
> delimiter.
>
> Note that if you're printing it, that means that if you need to
> store it in a variable, you'll need to use command substitution,
> which involves running an extra process and writing/reading the
> data through a pipe.
>
> Even if you use printf -v, like in:
>
> join_into() {
>   local -n _var="$1"
>   local _sep="$2"
>   shift 2
>   if (($# > 1)); then
>     sep=${sep//%/%}
>     sep=${sep//\\/\\\\}
>     printf -v _var "${separator//%/%%}%s" "${@:2}"
>     _var=$1$_var
>   else
>     _var=$1
>   fi
> }
>
> I find it's still orders of magnitude slower than the approach that uses
> ${*/#/$sep}.

This assumes that the
>
> On my system, it is significantly slower than invoking perl or python
>
> like
>
> result=$(perl -le 'print join shift, @ARGV' -- sep "$@")
>
> or
>
> result=$(python -c 'import sys; print(sys.argv[1].join(sys.argv[2:]))' sep
> "$@")

So something like this will fix the bugs that you mentioned?

function strjoin/format {
  declare separator=$1
  shift

  if (($#)); then
    printf '%s' "$1"
    shift
    if (($#)); then
      separator=${separator//%/%%}
      printf "${separator//\\/\\\\}%s" "$@"
    fi
  fi
  [[ $nonewline ]] || echo
}

function strjoin/replace {
  declare separator=$1
  shift

  if (($#)); then
    printf '%s' "$1"
    shift
    if (($#)); then
      printf '%s' "${@/#/$separator}"
    fi
  fi
  [[ $nonewline ]] || echo
}

Indeed, the replace version is slightly faster.

$ time for((i=0;i<10000;++i)); do
  strjoin/format $'\t' a b c
done > /dev/null
real    0m0.795s
user    0m0.771s
sys     0m0.020s
$ time for((i=0;i<10000;++i)); do
  strjoin/replace $'\t' a b c
done > /dev/null
real    0m0.744s
user    0m0.721s
sys     0m0.020s

I don't think calling external programs will be fast especially when
the number of strings to join is small. In the following run,
`strjoin` is a function that does the join with a default separator of
$'\t'. As you can see it is much faster than python and perl. It is
also interesting to see that perl is much faster than python, which
probably is because perl's startup time is faster than python.

time for ((i=0;i<100;++i)); do
  strjoin a b c
done > /dev/null

real    0m0.020s
user    0m0.018s
sys     0m0.001s
time for ((i=0;i<100;++i)); do
  python -c 'import sys; print(sys.argv[1].join(sys.argv[2:]))' $'\t' a b c
done > /dev/null

real    0m5.171s
user    0m2.004s
sys     0m2.496s
time for ((i=0;i<100;++i)); do
  perl -le 'print join shift, @ARGV' -- $'\t' a b c
done > /dev/null

real    0m0.966s
user    0m0.315s
sys     0m0.353s


-- 
Regards,
Peng



reply via email to

[Prev in Thread] Current Thread [Next in Thread]