[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
repeated extended pattern substitution incredibly slow w/large variables
From: |
address@hidden |
Subject: |
repeated extended pattern substitution incredibly slow w/large variables |
Date: |
Sun, 18 Sep 2016 11:32:45 +0200 (MEST) |
Configuration Information [Automatically generated, do not change]:
Machine: i686
OS: linux-gnu
Compiler: gcc
Compilation CFLAGS: -DPROGRAM='bash' -DCONF_HOSTTYPE='i686'
-DCONF_OSTYPE='linux-gnu' -DCONF_MACHTYPE='i686-pc-linux-gnu'
-DCONF_VENDOR='pc' -DLOCALEDIR='/usr/share/locale' -DPACKAGE='bash' -DSHELL
-DHAVE_CONFIG_H -DDEBUG -DMALLOC_DEBUG -I. -I. -I./include -I./lib -g -O2
-Wno-parentheses -Wno-format-security
uname output: Linux Xaox 4.4.0-tm3 #2 Mon Feb 22 13:26:44 CET 2016 i686
GNU/Linux
Machine Type: i686-pc-linux-gnu
Bash Version: 4.4
Patch Level: 0
Release Status: rc2 / release
Description:
The tests below were performed with 4.4.0-rc2. However, the problem is
still present in 4.4.0-release, only execution times are even higher
for about 20%.
Repeated pattern substitution (here: removal) using an extended pattern
and variables of considerable size is incredibly time and cpu consuming.
The command that revealed the problem was:
D=${C//\[+([0-9])\]=}
The variable C contains the output of 'declare -p A', where A is an
array with 510 file names and C contains 510 matches. But as can be
seen below, also commands like
D=${C//u+([a-z])} or D=${C//@(usr)}
trigger the problem, but _not_ commands like
D=${C//usr} or D=${C//u[a-z][a-z]}
See the test case and statistics below.
Of course, the problem is simply solvable be a mini sed(1) script, but
every now and then I try comands like the above, because I think that
simple tasks should be doable without the aid of external programmes.
But in many such cases I must sadly accept that using external programs,
especially sed(1), is the quicker method.
Additionally I will have to revise my script (a ~100kb font editor)
and possibly replace other constructs using extended pattern maching.
Repeat-By:
-----------------------------------------------------------------------
declare -a B A=( /usr/share/consolefonts/* ) # column 2: here 510 files
# A=( "${A[@]##*/}" ) # column 3: pure filenames
# A=( "${A[@]/*/a}" ) # column 4: "a"
# A=( "${A[@]/*}" ) # column 5: "" (empty)
for matches in {10..500..10}; do
B=( "${A[@]:0:matches}" ) # reduce array
C=`declare -p B | sed -r "s/^[^=]+=?//"` # rm 'declare -<attr>
<name>='
time D="${C//\[+([0-9])\]=}" # rm '[<subscr>]='
done
------------------------------------------------------------------------
results (all with >99% cpu):
number of | contents of array elements
matches | size=${#C} path/file | file | "a" | empty
---------------------------------------------------------------
10: | 369 bytes 0.099s | 0.014s | 0.007s | 0.005s
20: | 900 1.261s | 0.315s | 0.048s | 0.036s
30: | 1453 5.274s | 1.538s | 0.168s | 0.134s
40: | 2070 15.030s | 4.868s | 0.406s | 0.324s
50: | 2655 31.830s | 10.694s | 0.814s | 0.644s
60: | 3240 56.831s | 19.203s | 1.423s | 1.130s
70: | 3837 94.022s | 32.356s | 2.299s | 1.829s
80: | 4384 139.000s | 47.079s | 3.473s | 2.751s
90: | 4998 204.683s | | 4.955s | 3.932s
100: | 5567 283.118s | | 6.871s | 5.452s
110: | 6135 | | 9.495s | 7.547s
120: | 6664 | | | 10.164s
200: | 15554 | | | 55.529s
I was too impatient to wait for the complete array with 510
elements to complete.
The following test results all belong in column 1 + 2.
the command: time D=`sed -r "s/\[[0-9]+\]=//g"<<<"$C"`
510: | 27137 bytes, R:0.020 U:0.007 S:0.007 67.66% ok!
other commands:
size=${#C} D=${C//usr} D=${C//u[a-z][a-z]}
--------------------------------------------------------
100: 5567 bytes 0.004s 0.004s ok!
200: 11167 0.012s 0.012s
300: 16712 0.024s 0.024s
400: 21818 0.038s 0.040s
500: 26647 0.056s 0.057s
but: D=${C//u+([a-z])} D=${C//@(usr)}
10: 0.136s 0.112s >99% cpu
20: 1.647s 1.078s
30: 6.467s 4.014s
40: 17.912s 10.886s
50: 38.178s 22.391s
which seems to indicate that extended pattern matching causes the
problem.
Please CC answers to me as I am not subscribed to the list.
----------------------------------------------------------------
Gesendet mit Telekom Mail <https://t-online.de/email-kostenlos> - kostenlos und
sicher für alle!
- repeated extended pattern substitution incredibly slow w/large variables,
address@hidden <=