[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: tokenize honoring quotes
From: |
Greg Wooledge |
Subject: |
Re: tokenize honoring quotes |
Date: |
Fri, 5 Aug 2022 16:01:56 -0400 |
On Fri, Aug 05, 2022 at 02:32:38PM -0400, Chet Ramey wrote:
> On 8/5/22 1:43 PM, Robert E. Griffith wrote:
> > Is there an efficient native bash way to tokenize a string into an array
> > honoring potentially nested double and single quotes?
> >
> > For example...
> >
> > $ str='echo "hello world"'
>
> Some variant of this:
>
> str='echo "hello world"'
>
> declare -a a
> eval a=\( "$str" \)
>
> declare -p a
The biggest problem here is that there's no way to prevent command
substitutions and other code injections from occurring, when all you
actually wanted is the word splitting/parsing.
unicorn:~$ str='"hello world" "$(date 1>&2)"'
unicorn:~$ declare -a a
unicorn:~$ eval a=\( "$str" \)
Fri Aug 5 15:44:11 EDT 2022
Simply disallowing $() and backticks isn't sufficient either, as there
are code injections hiding all over.
unicorn:~$ x='y[$(date 1>&2)0]'
unicorn:~$ str='"hello world" ${a[x]}'
unicorn:~$ declare -a a
unicorn:~$ eval a=\( "$str" \)
Fri Aug 5 15:46:26 EDT 2022
The second biggest problem is unwanted globbing. One might argue that
one can disable this with set -f before, and set +f after. (Or variants
involving a lambda function and "local -".) Nevertheless, it's a concern
that must be addressed.
unicorn:~$ str='"hello world" *.txt'
unicorn:~$ declare -a a
unicorn:~$ eval a=\( "$str" \)
unicorn:~$ declare -p a
declare -a a=([0]="hello world" [1]="37a.txt" [2]="68x68.txt"
[3]="Application.txt" [4]="bldg.txt" [5]="burger15.txt" [...]
The only way to safeguard against code injections is to write an actual
parser, and not rely on shell tricks like eval, tempting as they may be.
Here's an extremely simplistic one, that only handles properly balanced
double quotes, without any kind of nesting. Quotes, if present, must be
totally around the word they enclose, not partially embedded inside a
word. It also only handles spaces, not tabs or other arbitrary
whitespace, but it could easily be extended for that if desired.
#!/bin/bash
shopt -s extglob
str=' one two "hello world" three "four"'
a=()
str=${str##+( )}
while [[ $str = *\ * || $str = \"* ]]; do
if [[ $str = \"* ]]; then
word=${str:1}
word=${word%%\"*}
a+=("$word")
str=${str#\"*\"}
else
word=${str%% *}
a+=("$word")
str=${str#* }
fi
str=${str##+( )}
done
if [[ $str ]]; then a+=("$str"); fi
declare -p a