[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: config files substitution with awk
From: |
Ralf Wildenhues |
Subject: |
Re: config files substitution with awk |
Date: |
Tue, 21 Nov 2006 19:48:22 +0100 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
[ apologies for the resend ]
* Paul Eggert wrote on Tue, Nov 21, 2006 at 06:30:07PM CET:
> +In traditional Awk, @code{FS} must be a string containing just one
> +ordinary character, and similarly for the field-separator argument to
> address@hidden
Thanks. FWIW, Solaris awk seems to choose only the first character,
rather than erroring out (which seemed unobvious to me from the above
description). I see two alternatives out:
- Choose for FS a character unlikely to occur often; I'd guess # or ~
should work?
- Do something like
sed 's/~/|#_!!_#|/g | awk -f "$tmp/subs.awk" | sed 's/|#_!!_#|/~/g'
to work around this (with FS="~" in the awk script).
WDYT?
I have another Solaris awk issue, and don't know how to get around
this easily: it supports `index in array' only in for statements:
$ awk 'END { v="x"; F[v]=1; if (v in F) print v; }' </dev/null
awk: syntax error near line 1
awk: illegal statement near line 1
Interestingly, this difference is not mentioned in autoconf.texi, nor
in the gawk.texi or awkcard.ps files of GNU awk.
Note that rewriting this to, say, test for nonempty 'array[index]'
instead (and using a marker to distinguish empty replacement strings)
could be quite memory-intensive, due to all the new array members
created on the way, so I'd prefer not to go that way, but I admit to
not having tested this. Looping over array members has the wrong work
complexity, so that would be a step in the wrong direction as well.
One further portability note, awkcard.ps colors the description of
'next' in blue, the notation for a nonportable feature. OTOH, the V7
manual describes it. Hmm.
FWIW, below's what I have currently, with above issues not addressed.
Cheers,
Ralf
Changes:
- get rid of sub/gsub completely. Besides avoiding the \& quoting
issue, it also shaves another 8 seconds off ./config.status time
for the large example package:
3.20user 3.66system 0:13.84elapsed 49%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+925981minor)pagefaults 0swaps
Which only reconfirms that hashing is faster than regular expression
matching. ;-)
- As a consequence, the |#_!!_#| marker can be avoided entirely.
I left it in as setting for FS for now, pending the question above.
- parenthesize the file name argument to getline, as suggested in the
gawk manual.
- Added a bit more comments about curiosities in the script.
- Only match AC_SUBST_FILEs if they are alone on the line (except for
white space). I don't particularly care about being precise enough
to give an error for a line such as
@substed_var@ @substed_file@
when $substed_var happens to be empty. Does anyone else?
(It would require looping once for F[] and once for S[].)
- Some more tests from Paolo's examples.
2006-11-21 Ralf Wildenhues <address@hidden>
Rewrite config files generation: avoid quadratic growth in
the number of substituted variables by using awk instead of sed
for the bulk of the substitutions.
* lib/autoconf/status.m4 (_AC_AWK_LITERAL_LIMIT): New macro.
(_AC_OUTPUT_FILES_PREPARE): Instead of several sed scripts,
generate just one large awk script for substitutions,
eliminating much of the earlier complexity, while adding some
new complexity. Only expand the substitution templates at
configure time, for smaller configure script size. The awk
script was written with help from Paolo Bonzini and Paul Eggert.
(_AC_SUBST_CMDS): Renamed from...
(_AC_SED_CMDS): ...this.
(_AC_DELIM_NUM): Renamed from...
(_AC_SED_DELIM_NUM): ...this.
(_AC_SED_CMD_NUM, _AC_SED_FRAG, _AC_SED_FRAG_NUM): Removed.
(_AC_OUTPUT_FILE): Use _AC_SUBST_CMDS.
* tests/torture.at (Substitute a 2000-byte string): Also
substitute a line with 1000 words, and a variable with several
long lines.
(Substitute and define special characters): Test awk special
characters, and put substitution input strings address@hidden@' in the
output, to test that no recursion happens; test several other
combinations from Paolo Bonzini.
* doc/autoconf.texi (Setting Output Variables): The marker
`|#_!!_#|' can appear in the substituted files again.
* NEWS: Update.
--- NEWS 2006-11-20 18:42:44.000000000 +0100
+++ NEWS 2006-11-21 18:52:53.000000000 +0100
@@ -1,5 +1,8 @@
* Major changes in Autoconf 2.61a (??)
+** config.status now uses awk for substitutions, for improved scaling
+ with the number of substituted variables.
+
* Major changes in Autoconf 2.61 (2006-11-17)
** New macros AC_C_FLEXIBLE_ARRAY_MEMBER, AC_C_VARARRAYS.
--- doc/autoconf.texi 2006-11-17 18:49:39.000000000 +0100
+++ doc/autoconf.texi 2006-11-21 18:57:28.000000000 +0100
@@ -8351,9 +8351,7 @@
is called. The value can contain newlines.
The substituted value is not rescanned for more output variables;
occurrences of @samp{@@@var{variable}@@} in the value are inserted
-literally into the output file. (The algorithm uses the special marker
address@hidden|#_!!_#|} internally, so the substituted value cannot contain
address@hidden|#_!!_#|}.)
+literally into the output file.
If @var{value} is given, in addition assign it to @var{variable}.
--- lib/autoconf/status.m4 2006-11-20 18:42:44.000000000 +0100
+++ lib/autoconf/status.m4 2006-11-21 18:41:50.000000000 +0100
@@ -311,6 +311,16 @@
[99])
+# _AC_AWK_LITERAL_LIMIT
+# ---------------------
+# Evaluate the maximum number of characters to put in an awk
+# string literal, not counting escape characters.
+#
+# Some awk's have small limits, such as Solaris and AIX awk.
+m4_define([_AC_AWK_LITERAL_LIMIT],
+[148])
+
+
# _AC_OUTPUT_FILES_PREPARE
# ------------------------
# Create the sed scripts needed for CONFIG_FILES.
@@ -319,7 +329,7 @@
# The intention is to have readable config.status and configure, even
# though this m4 code might be scaring.
#
-# This code was written by Dan Manthey.
+# This code was written by Dan Manthey and rewritten by Ralf Wildenhues.
#
# This macro is expanded inside a here document. If the here document is
# closed, it has to be reopened with "cat >>$CONFIG_STATUS <<\_ACEOF".
@@ -328,81 +338,42 @@
[#
# Set up the sed scripts for CONFIG_FILES section.
#
-dnl ... and define _AC_SED_CMDS, the pipeline which executes them.
-m4_define([_AC_SED_CMDS], [])dnl
+dnl ... and define _AC_SUBST_CMDS, the pipeline which executes them.
+m4_define([_AC_SUBST_CMDS], [| awk -f "$tmp/subs.awk" ])dnl
# No need to generate the scripts if there are no CONFIG_FILES.
# This happens for instance when ./config.status config.h
if test -n "$CONFIG_FILES"; then
+echo 'BEGIN {' >"$tmp/subs.awk"
_ACEOF
-m4_pushdef([_AC_SED_FRAG_NUM], 0)dnl Fragment number.
-m4_pushdef([_AC_SED_CMD_NUM], 2)dnl Num of commands in current frag so far.
-m4_pushdef([_AC_SED_DELIM_NUM], 0)dnl Expected number of delimiters in file.
-m4_pushdef([_AC_SED_FRAG], [])dnl The constant part of the current fragment.
-dnl
m4_ifdef([_AC_SUBST_FILES],
-[# Create sed commands to just substitute file output variables.
-
-m4_foreach_w([_AC_Var], m4_defn([_AC_SUBST_FILES]),
-[dnl End fragments at beginning of loop so that last fragment is not ended.
-m4_if(m4_eval(_AC_SED_CMD_NUM + 3 > _AC_SED_CMD_LIMIT), 1,
-[dnl Fragment is full and not the last one, so no need for the final un-escape.
-dnl Increment fragment number.
-m4_define([_AC_SED_FRAG_NUM], m4_incr(_AC_SED_FRAG_NUM))dnl
-dnl Record that this fragment will need to be used.
-m4_define([_AC_SED_CMDS],
- m4_defn([_AC_SED_CMDS])[| sed -f "$tmp/subs-]_AC_SED_FRAG_NUM[.sed" ])dnl
-[cat >>$CONFIG_STATUS <<_ACEOF
-cat >"\$tmp/subs-]_AC_SED_FRAG_NUM[.sed" <<\CEOF
-/@[a-zA-Z_][a-zA-Z_0-9]*@/!b
-]m4_defn([_AC_SED_FRAG])dnl
-[CEOF
-
-_ACEOF
-]m4_define([_AC_SED_CMD_NUM], 2)m4_define([_AC_SED_FRAG])dnl
-])dnl Last fragment ended.
-m4_define([_AC_SED_CMD_NUM], m4_eval(_AC_SED_CMD_NUM + 3))dnl
-m4_define([_AC_SED_FRAG],
-m4_defn([_AC_SED_FRAG])dnl
-[/^[ address@hidden@[ ]*$/{
-r $]_AC_Var[
-d
-}
-])dnl
+[# Create commands to substitute file output variables.
+
+{
+ echo "cat >>$CONFIG_STATUS <<_ACEOF"
+ echo 'cat >>"\$tmp/subs.awk" <<\CEOF'
+ echo "$ac_subst_files" | sed 's/.*/F@<:@"&"@:>@ = "$&"/'
+ echo "CEOF"
+ echo "_ACEOF"
+} >conf$$files.sh
+. ./conf$$files.sh
+rm -f conf$$files.sh
])dnl
-# Remaining file output variables are in a fragment that also has non-file
-# output varibles.
-])
-dnl
-m4_define([_AC_SED_FRAG], [
-]m4_defn([_AC_SED_FRAG]))dnl
-m4_foreach_w([_AC_Var],
-m4_ifdef([_AC_SUBST_VARS], [m4_defn([_AC_SUBST_VARS]) ])address@hidden@],
-[m4_if(_AC_SED_DELIM_NUM, 0,
-[m4_if(_AC_Var, address@hidden@],
-[dnl The whole of the last fragment would be the final deletion of `|#_!!_#|'.
-m4_define([_AC_SED_CMDS], m4_defn([_AC_SED_CMDS])[| sed 's/|#_!!_#|//g' ])],
-[
-ac_delim='%!_!# '
-for ac_last_try in false false false false false :; do
- cat >conf$$subs.sed <<_ACEOF
-])])dnl
-m4_if(_AC_Var, address@hidden@],
- [m4_if(m4_eval(_AC_SED_CMD_NUM + 2 <= _AC_SED_CMD_LIMIT), 1,
- [m4_define([_AC_SED_FRAG], [ end]m4_defn([_AC_SED_FRAG]))])],
-[m4_define([_AC_SED_CMD_NUM], m4_incr(_AC_SED_CMD_NUM))dnl
-m4_define([_AC_SED_DELIM_NUM], m4_incr(_AC_SED_DELIM_NUM))dnl
-_AC_Var!$_AC_Var$ac_delim
-])dnl
-m4_if(_AC_SED_CMD_LIMIT,
- m4_if(_AC_Var, address@hidden@], m4_if(_AC_SED_CMD_NUM, 2, 2,
_AC_SED_CMD_LIMIT), _AC_SED_CMD_NUM),
-[_ACEOF
-
-dnl Do not use grep on conf$$subs.sed, since AIX grep has a line length limit.
- if test `sed -n "s/.*$ac_delim\$/X/p" conf$$subs.sed | grep -c X` =
_AC_SED_DELIM_NUM; then
+{
+ echo "cat >conf$$subs.awk <<_ACEOF"
+ echo "$ac_subst_vars" | sed 's/.*/&!$&$ac_delim/'
+ echo "_ACEOF"
+} >conf$$subs.sh
+ac_delim_num=`echo "$ac_subst_vars" | grep -c '$'`
+ac_delim='%!_!# '
+for ac_last_try in false false false false false :; do
+ . ./conf$$subs.sh
+
+dnl Do not use grep on conf$$subs.awk, since AIX grep has a line length limit.
+ if test `sed -n "s/.*$ac_delim\$/X/p" conf$$subs.awk | grep -c X` =
$ac_delim_num; then
break
elif $ac_last_try; then
AC_MSG_ERROR([could not make $CONFIG_STATUS])
@@ -410,51 +381,92 @@
ac_delim="$ac_delim!$ac_delim _$ac_delim!! "
fi
done
+rm -f conf$$subs.sh
dnl Similarly, avoid grep here too.
-ac_eof=`sed -n '/^CEOF[[0-9]]*$/s/CEOF/0/p' conf$$subs.sed`
+ac_eof=`sed -n '/^CEOF[[0-9]]*$/s/CEOF/0/p' conf$$subs.awk`
if test -n "$ac_eof"; then
ac_eof=`echo "$ac_eof" | sort -nru | sed 1q`
ac_eof=`expr $ac_eof + 1`
fi
-
-dnl Increment fragment number.
-m4_define([_AC_SED_FRAG_NUM], m4_incr(_AC_SED_FRAG_NUM))dnl
-dnl Record that this fragment will need to be used.
-m4_define([_AC_SED_CMDS],
-m4_defn([_AC_SED_CMDS])[| sed -f "$tmp/subs-]_AC_SED_FRAG_NUM[.sed" ])dnl
-[cat >>$CONFIG_STATUS <<_ACEOF
-cat >"\$tmp/subs-]_AC_SED_FRAG_NUM[.sed" <<\CEOF$ac_eof
-/@[a-zA-Z_][a-zA-Z_0-9]*@/!b]m4_defn([_AC_SED_FRAG])dnl
-[_ACEOF
-sed '
-s/[,\\&]/\\&/g; s/@/@|#_!!_#|/g
-s/^/s,@/; s/!/@,|#_!!_#|/
-:n
-t n
-s/'"$ac_delim"'$/,g/; t
-s/$/\\/; p
-N; s/^.*\n//; s/[,\\&]/\\&/g; s/@/@|#_!!_#|/g; b n
-' >>$CONFIG_STATUS <conf$$subs.sed
-rm -f conf$$subs.sed
-cat >>$CONFIG_STATUS <<_ACEOF
-]m4_if(_AC_Var, address@hidden@],
-[m4_if(m4_eval(_AC_SED_CMD_NUM + 2 > _AC_SED_CMD_LIMIT), 1,
-[m4_define([_AC_SED_CMDS], m4_defn([_AC_SED_CMDS])[| sed 's/|#_!!_#|//g' ])],
-[[:end
-s/|#_!!_#|//g
-]])])dnl
-CEOF$ac_eof
-_ACEOF
-m4_define([_AC_SED_FRAG], [
-])m4_define([_AC_SED_DELIM_NUM], 0)m4_define([_AC_SED_CMD_NUM], 2)dnl
-
-])])dnl
-dnl
-m4_popdef([_AC_SED_FRAG_NUM])dnl
-m4_popdef([_AC_SED_CMD_NUM])dnl
-m4_popdef([_AC_SED_DELIM_NUM])dnl
-m4_popdef([_AC_SED_FRAG])dnl
+dnl Initialize an awk array of substitutions, keyed by variable name.
+dnl
+dnl First read a whole (potentially multi-line) substitution,
+dnl and construct `S["VAR"] ='. Then, and split it into pieces that fit
+dnl in an awk literal. Each piece then gets active characters escaped:
+dnl (if we escape earlier we risk splitting inside an escape sequence).
+dnl Output as separate string literals, joined with backslash-newline.
+dnl Eliminate the newline after `=' in a second script, for readability.
+dnl
+dnl Notes to the main part of the awk script:
+dnl - the unusual FS value helps to avoid the limit of 99 fields,
+dnl - the space in `$ 0' avoid expansion by m4,
+dnl - we avoid sub/gsub because of the \& quoting issues, see
+dnl http://www.gnu.org/software/gawk/manual/html_node/Gory-Details.html
+dnl
+dnl m4-double-quote most of the scripting for readability.
+[cat >>$CONFIG_STATUS <<_ACEOF
+cat >>"\$tmp/subs.awk" <<\CEOF$ac_eof
+_ACEOF
+sed '
+t line
+:line
+s/'"$ac_delim"'$//; t gotline
+N; b line
+:gotline
+h
+s/^/S["/; s/!.*/"] = /; p
+g
+s/^.*!//
+:more
+t more
+h
+s/\(.\{]_AC_AWK_LITERAL_LIMIT[\}\).*/\1/
+t notlast
+s/["\\]/\\&/g; s/\n/\\n/g
+s/^/"/; s/$/"/
+b
+:notlast
+s/["\\]/\\&/g; s/\n/\\n/g
+s/^/"/; s/$/"\\/
+p
+g
+s/.\{]_AC_AWK_LITERAL_LIMIT[\}//
+b more
+' <conf$$subs.awk | sed '
+/^[^"]/{
+ N
+ s/\n//
+}
+' >>$CONFIG_STATUS
+rm -f conf$$subs.awk
+cat >>$CONFIG_STATUS <<_ACEOF
+ FS = "[|]#_!!_#[|]"
+}
+{
+ nfields = split($ 0, field, "@")
+ len = length(field[1])
+ for (i = 2; i < nfields; i++) {
+ key = field[i]
+ keylen = length(key)
+ if (key in S) {
+ $ 0 = substr($ 0, 1, len) "" S[key] "" substr($ 0, len + keylen + 3)
+ len += length(S[key]) + length(field[++i])
+ } else {
+ len += 1 + keylen
+ if (key in F && $ 0 ~ "^[ ]*@" key "@[ ]*$") {
+ while ((getline aline < (F[key])) > 0)
+ print(aline)
+ close(F[key])
+ next
+ }
+ }
+ }
+ print
+}
+CEOF$ac_eof
+_ACEOF
+]dnl end of double-quoted part
# VPATH may cause trouble with some makes, so we remove $(srcdir),
# ${srcdir} and @srcdir@ from VPATH if srcdir is ".", strip leading and
@@ -554,7 +566,7 @@
])dnl
m4_ifndef([AC_DATAROOTDIR_CHECKED], [$ac_datarootdir_hack
])dnl
-" $ac_file_inputs m4_defn([_AC_SED_CMDS])>$tmp/out
+" $ac_file_inputs m4_defn([_AC_SUBST_CMDS])>$tmp/out
m4_ifndef([AC_DATAROOTDIR_CHECKED],
[test -z "$ac_datarootdir_hack$ac_datarootdir_seen" &&
--- tests/torture.at 2006-11-08 18:41:56.000000000 +0100
+++ tests/torture.at 2006-11-21 18:38:48.000000000 +0100
@@ -539,18 +539,26 @@
# Solaris 9 /usr/ucb/sed that rejects commands longer than 4000 bytes. HP/UX
# sed dumps core around 8 KiB. However, POSIX says that sed need not
# handle lines longer than 2048 bytes (including the trailing newline).
-# So we'll just test a 2000-byte value.
+# So we'll just test a 2000-byte value, and for awk, we test a line with
+# almost 1000 words, and one variable with 4 lines of 500 bytes each.
AT_SETUP([Substitute a 2000-byte string])
AT_DATA([Foo.in], address@hidden@
])
+AT_DATA([Bar.in], address@hidden@
+])
+AT_DATA([Baz.in], address@hidden@
+])
AT_DATA([configure.ac],
[[AC_INIT
AC_CONFIG_AUX_DIR($top_srcdir/build-aux)
AC_SUBST([foo], ]m4_for([n], 1, 100,, ....................)[)
-AC_CONFIG_FILES([Foo])
+AC_SUBST([bar], "]m4_for([n], 1, 100,, @ @ @ @ @ @ @ @ @ @@)[")
+AC_SUBST([baz], "]m4_for([n], 1, 4,, m4_for([m], 1, 25,, ... ... ... ... ....)
+)[")
+AC_CONFIG_FILES([Foo Bar Baz])
AC_OUTPUT
]])
@@ -558,6 +566,11 @@
AT_CHECK_CONFIGURE
AT_CHECK([cat Foo], 0, m4_for([n], 1, 100,, ....................)
)
+AT_CHECK([cat Bar], 0, m4_for([n], 1, 100,, @ @ @ @ @ @ @ @ @ @@)
+)
+AT_CHECK([cat Baz], 0, m4_for([n], 1, 4,, m4_for([m], 1, 25,, ... ... ... ...
....)
+)
+)
AT_CLEANUP
@@ -584,25 +597,57 @@
## Substitute and define special characters. ##
## ------------------------------------------ ##
-# Use characters special to the shell, sed, and M4.
+# Use characters special to the shell, sed, awk, and M4.
AT_SETUP([Substitute and define special characters])
AT_DATA([Foo.in], address@hidden@
address@hidden@@notsubsted@@baz@ stray @ and more@@@baz@
address@hidden@address@hidden
address@hidden@@address@hidden
address@hidden@@address@hidden@
address@hidden @address@hidden
address@hidden @address@hidden@
address@hidden @baz@@baz@
address@hidden@
+ @file@
address@hidden@
address@hidden@X
+])
+
+AT_DATA([File],
address@hidden@@bar@
])
AT_CONFIGURE_AC(
-[[foo="AS@&address@hidden([[X*'[]+ ", `\($foo]])"
+[[foo="AS@&address@hidden([[X*'[]+ ",& &`\($foo \& \\& \\\& \\\\& \ \\ \\\]])"
+bar="@foo@ @baz@"
+baz=bla
AC_SUBST([foo])
-AC_DEFINE([foo], [[X*'[]+ ", `\($foo]], [Awful value.])
+AC_SUBST([bar])
+AC_SUBST([baz])
+file=File
+AC_SUBST_FILE([file])
+AC_DEFINE([foo], [[X*'[]+ ",& &`\($foo]], [Awful value.])
AC_CONFIG_FILES([Foo])]])
AT_CHECK_AUTOCONF
AT_CHECK_AUTOHEADER
AT_CHECK_CONFIGURE
-AT_CHECK([cat Foo], 0, [[X*'[]+ ", `\($foo
-]])
-AT_CHECK_DEFINES([[#define foo X*'[]+ ", `\($foo
+AT_CHECK([cat Foo], 0, [[X*'[]+ ",& &`\($foo \& \\& \\\& \\\\& \ \\ \\\
address@hidden@ @baz@@address@hidden stray @ and more@@bla
address@hidden@ @address@hidden@baz
address@hidden@ @address@hidden
address@hidden@ @address@hidden@
address@hidden blabaz
address@hidden blabaz@
address@hidden blabla
address@hidden@@bar@
address@hidden@@bar@
address@hidden@
address@hidden@X
+]])
+AT_CHECK_DEFINES([[#define foo X*'[]+ ",& &`\($foo
]])
AT_CLEANUP
- Re: config files substitution with awk, (continued)
- Re: config files substitution with awk, Paul Eggert, 2006/11/20
- Re: config files substitution with awk, Ralf Wildenhues, 2006/11/20
- Re: config files substitution with awk, Paolo Bonzini, 2006/11/21
- Re: config files substitution with awk, Ralf Wildenhues, 2006/11/21
- Re: config files substitution with awk, Paul Eggert, 2006/11/21
- Re: config files substitution with awk, Paul Eggert, 2006/11/21
- Re: config files substitution with awk,
Ralf Wildenhues <=
- Message not available
- Re: config files substitution with awk, Paul Eggert, 2006/11/21
- Re: config files substitution with awk, Ralf Wildenhues, 2006/11/24
- Re: config files substitution with awk, Paul Eggert, 2006/11/26
- Re: config files substitution with awk, Ralf Wildenhues, 2006/11/27
- Re: config files substitution with awk, Ralf Wildenhues, 2006/11/28
- Re: config files substitution with awk, Paul Eggert, 2006/11/27
- awk: $0 != $ 0 (was: config files substitution with awk), Ralf Wildenhues, 2006/11/24
- Re: awk: $0 != $ 0, Paul Eggert, 2006/11/26