[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: config files substitution with awk
From: |
Ralf Wildenhues |
Subject: |
Re: config files substitution with awk |
Date: |
Mon, 20 Nov 2006 21:22:49 +0100 |
User-agent: |
Mutt/1.5.13 (2006-08-11) |
Hello Paul,
Thanks for the review.
* Paul Eggert wrote on Mon, Nov 20, 2006 at 07:03:18PM CET:
> Ralf Wildenhues <address@hidden> writes:
>
> > I did not see an easy way to write it portably to ancient awk[...]
>
> What difficulties do you see with ancient awk?
Not being very experienced with it, mostly.
> For example, this non-ancient loop
[...]
> can easily be written in ancient awk using something like this:
Ah, good-
> This is arguably even more readable when written in the ancient style
> (though I admit I don't know what that 'skip' is doing there in the
> original :-).
Yeah, blame it on lack of concentration. (The original idea was to have
a loop to allow both recursive and nonrecursive substitution, without
needing a marker. But we don't need to pursue that.)
> > One drawback for AC_SUBST_FILE currently present causes a noticeable
> > regression due to the fact that awk's system function is used for each
> > such substitution.
>
> Why can't we use a repeated getline/print loop here?
Because I was scared of:
> > The autoconf.texi note leaves me uncertain what we
> > can portably expect from awk's getline.
I've used such a loop now.
> (Maybe we should deprecate AC_SUBST_FILE?....)
I don't see a good reason for that. I think it can be useful, given the
size restrictions on the value of AC_SUBST that are still in place.
> > Is it necessary to 'chmod +x' a file before sourcing it ('. ./file')?
>
> No.
OK. Updated patch below, not requiring AC_PROG_AWK.
If I read 'gawk.info(Gory Details)', correctly, then we do need a test
for the runs-of-backslashes-before-ampersand escaping rules, in order
add the right amount of backslashes. :-/
I've amended the testsuite, but I still need to think more about the
code.
Cheers,
Ralf
2006-11-20 Ralf Wildenhues <address@hidden>
Rewrite config files generation: replace quadratic growth in
the number of substituted variables with loglinear growth by
using awk instead of sed for the bulk of the substitutions.
* lib/autoconf/status.m4 (_AC_AWK_LITERAL_LIMIT): New macro.
(_AC_OUTPUT_FILES_PREPARE): Instead of several sed scripts,
generate just one large awk script for substitutions,
eliminating much of the earlier complexity, while adding some
new complexity. Only expand the substitution templates at
configure time, for smaller configure script size.
(_AC_SUBST_CMDS): Renamed from...
(_AC_SED_CMDS): ...this.
(_AC_DELIM_NUM): Renamed from...
(_AC_SED_DELIM_NUM): ...this.
(_AC_SED_CMD_NUM, _AC_SED_FRAG, _AC_SED_FRAG_NUM): Removed.
(_AC_OUTPUT_FILE): Use _AC_SUBST_CMDS.
* tests/torture.at (Substitute a 2000-byte string): Also
substitute a line with 1000 words, and a variable with several
long lines.
(Substitute and define special characters): Also substitute
ampersands, and put substitution input strings address@hidden@' in the
output, to test that no recursion happens.
* NEWS: Update.
--- NEWS 17 Nov 2006 20:01:04 -0000 1.413
+++ NEWS 20 Nov 2006 19:45:01 -0000
@@ -1,5 +1,8 @@
* Major changes in Autoconf 2.61a (??)
+** config.status now uses awk for substitutions, for improved scaling
+ with the number of substituted variables.
+
* Major changes in Autoconf 2.61 (2006-11-17)
** New macros AC_C_FLEXIBLE_ARRAY_MEMBER, AC_C_VARARRAYS.
--- lib/autoconf/status.m4 2006-11-18 04:04:15.000000000 +0100
+++ lib/autoconf/status.m4 2006-11-20 20:44:17.000000000 +0100
@@ -311,6 +311,16 @@
[99])
+# _AC_AWK_LITERAL_LIMIT
+# ---------------------
+# Evaluate the maximum number of characters to put in an awk
+# string literal, not counting escape characters.
+#
+# Some awk's have small limits, such as Solaris and AIX awk.
+m4_define([_AC_AWK_LITERAL_LIMIT],
+[148])
+
+
# _AC_OUTPUT_FILES_PREPARE
# ------------------------
# Create the sed scripts needed for CONFIG_FILES.
@@ -319,7 +329,7 @@
# The intention is to have readable config.status and configure, even
# though this m4 code might be scaring.
#
-# This code was written by Dan Manthey.
+# This code was written by Dan Manthey and rewritten by Ralf Wildenhues.
#
# This macro is expanded inside a here document. If the here document is
# closed, it has to be reopened with "cat >>$CONFIG_STATUS <<\_ACEOF".
@@ -328,81 +338,42 @@
[#
# Set up the sed scripts for CONFIG_FILES section.
#
-dnl ... and define _AC_SED_CMDS, the pipeline which executes them.
-m4_define([_AC_SED_CMDS], [])dnl
+dnl ... and define _AC_SUBST_CMDS, the pipeline which executes them.
+m4_define([_AC_SUBST_CMDS], [| awk -f "$tmp/subs.awk" ])dnl
# No need to generate the scripts if there are no CONFIG_FILES.
# This happens for instance when ./config.status config.h
if test -n "$CONFIG_FILES"; then
+echo 'BEGIN {' >"$tmp/subs.awk"
_ACEOF
-m4_pushdef([_AC_SED_FRAG_NUM], 0)dnl Fragment number.
-m4_pushdef([_AC_SED_CMD_NUM], 2)dnl Num of commands in current frag so far.
-m4_pushdef([_AC_SED_DELIM_NUM], 0)dnl Expected number of delimiters in file.
-m4_pushdef([_AC_SED_FRAG], [])dnl The constant part of the current fragment.
-dnl
m4_ifdef([_AC_SUBST_FILES],
-[# Create sed commands to just substitute file output variables.
-
-m4_foreach_w([_AC_Var], m4_defn([_AC_SUBST_FILES]),
-[dnl End fragments at beginning of loop so that last fragment is not ended.
-m4_if(m4_eval(_AC_SED_CMD_NUM + 3 > _AC_SED_CMD_LIMIT), 1,
-[dnl Fragment is full and not the last one, so no need for the final un-escape.
-dnl Increment fragment number.
-m4_define([_AC_SED_FRAG_NUM], m4_incr(_AC_SED_FRAG_NUM))dnl
-dnl Record that this fragment will need to be used.
-m4_define([_AC_SED_CMDS],
- m4_defn([_AC_SED_CMDS])[| sed -f "$tmp/subs-]_AC_SED_FRAG_NUM[.sed" ])dnl
-[cat >>$CONFIG_STATUS <<_ACEOF
-cat >"\$tmp/subs-]_AC_SED_FRAG_NUM[.sed" <<\CEOF
-/@[a-zA-Z_][a-zA-Z_0-9]*@/!b
-]m4_defn([_AC_SED_FRAG])dnl
-[CEOF
-
-_ACEOF
-]m4_define([_AC_SED_CMD_NUM], 2)m4_define([_AC_SED_FRAG])dnl
-])dnl Last fragment ended.
-m4_define([_AC_SED_CMD_NUM], m4_eval(_AC_SED_CMD_NUM + 3))dnl
-m4_define([_AC_SED_FRAG],
-m4_defn([_AC_SED_FRAG])dnl
-[/^[ address@hidden@[ ]*$/{
-r $]_AC_Var[
-d
-}
-])dnl
+[# Create commands to substitute file output variables.
+
+{
+ echo "cat >>$CONFIG_STATUS <<_ACEOF"
+ echo 'cat >>"\$tmp/subs.awk" <<\CEOF'
+ echo "$ac_subst_files" | sed 's/.*/F@<:@"&"@:>@ = "$&"/'
+ echo "CEOF"
+ echo "_ACEOF"
+} >conf$$files.sh
+. ./conf$$files.sh
+rm -f conf$$files.sh
])dnl
-# Remaining file output variables are in a fragment that also has non-file
-# output varibles.
-])
-dnl
-m4_define([_AC_SED_FRAG], [
-]m4_defn([_AC_SED_FRAG]))dnl
-m4_foreach_w([_AC_Var],
-m4_ifdef([_AC_SUBST_VARS], [m4_defn([_AC_SUBST_VARS]) ])address@hidden@],
-[m4_if(_AC_SED_DELIM_NUM, 0,
-[m4_if(_AC_Var, address@hidden@],
-[dnl The whole of the last fragment would be the final deletion of `|#_!!_#|'.
-m4_define([_AC_SED_CMDS], m4_defn([_AC_SED_CMDS])[| sed 's/|#_!!_#|//g' ])],
-[
-ac_delim='%!_!# '
-for ac_last_try in false false false false false :; do
- cat >conf$$subs.sed <<_ACEOF
-])])dnl
-m4_if(_AC_Var, address@hidden@],
- [m4_if(m4_eval(_AC_SED_CMD_NUM + 2 <= _AC_SED_CMD_LIMIT), 1,
- [m4_define([_AC_SED_FRAG], [ end]m4_defn([_AC_SED_FRAG]))])],
-[m4_define([_AC_SED_CMD_NUM], m4_incr(_AC_SED_CMD_NUM))dnl
-m4_define([_AC_SED_DELIM_NUM], m4_incr(_AC_SED_DELIM_NUM))dnl
-_AC_Var!$_AC_Var$ac_delim
-])dnl
-m4_if(_AC_SED_CMD_LIMIT,
- m4_if(_AC_Var, address@hidden@], m4_if(_AC_SED_CMD_NUM, 2, 2,
_AC_SED_CMD_LIMIT), _AC_SED_CMD_NUM),
-[_ACEOF
-
-dnl Do not use grep on conf$$subs.sed, since AIX grep has a line length limit.
- if test `sed -n "s/.*$ac_delim\$/X/p" conf$$subs.sed | grep -c X` =
_AC_SED_DELIM_NUM; then
+{
+ echo "cat >conf$$subs.awk <<_ACEOF"
+ echo "$ac_subst_vars" | sed 's/.*/&!$&$ac_delim/'
+ echo "_ACEOF"
+} >conf$$subs.sh
+ac_delim_num=`echo "$ac_subst_vars" | grep -c '$'`
+ac_delim='%!_!# '
+for ac_last_try in false false false false false :; do
+ . ./conf$$subs.sh
+
+dnl Do not use grep on conf$$subs.awk, since AIX grep has a line length limit.
+ if test `sed -n "s/.*$ac_delim\$/X/p" conf$$subs.awk | grep -c X` =
$ac_delim_num; then
break
elif $ac_last_try; then
AC_MSG_ERROR([could not make $CONFIG_STATUS])
@@ -410,51 +381,89 @@
ac_delim="$ac_delim!$ac_delim _$ac_delim!! "
fi
done
+rm -f conf$$subs.sh
dnl Similarly, avoid grep here too.
-ac_eof=`sed -n '/^CEOF[[0-9]]*$/s/CEOF/0/p' conf$$subs.sed`
+ac_eof=`sed -n '/^CEOF[[0-9]]*$/s/CEOF/0/p' conf$$subs.awk`
if test -n "$ac_eof"; then
ac_eof=`echo "$ac_eof" | sort -nru | sed 1q`
ac_eof=`expr $ac_eof + 1`
fi
-
-dnl Increment fragment number.
-m4_define([_AC_SED_FRAG_NUM], m4_incr(_AC_SED_FRAG_NUM))dnl
-dnl Record that this fragment will need to be used.
-m4_define([_AC_SED_CMDS],
-m4_defn([_AC_SED_CMDS])[| sed -f "$tmp/subs-]_AC_SED_FRAG_NUM[.sed" ])dnl
-[cat >>$CONFIG_STATUS <<_ACEOF
-cat >"\$tmp/subs-]_AC_SED_FRAG_NUM[.sed" <<\CEOF$ac_eof
-/@[a-zA-Z_][a-zA-Z_0-9]*@/!b]m4_defn([_AC_SED_FRAG])dnl
-[_ACEOF
-sed '
-s/[,\\&]/\\&/g; s/@/@|#_!!_#|/g
-s/^/s,@/; s/!/@,|#_!!_#|/
-:n
-t n
-s/'"$ac_delim"'$/,g/; t
-s/$/\\/; p
-N; s/^.*\n//; s/[,\\&]/\\&/g; s/@/@|#_!!_#|/g; b n
-' >>$CONFIG_STATUS <conf$$subs.sed
-rm -f conf$$subs.sed
-cat >>$CONFIG_STATUS <<_ACEOF
-]m4_if(_AC_Var, address@hidden@],
-[m4_if(m4_eval(_AC_SED_CMD_NUM + 2 > _AC_SED_CMD_LIMIT), 1,
-[m4_define([_AC_SED_CMDS], m4_defn([_AC_SED_CMDS])[| sed 's/|#_!!_#|//g' ])],
-[[:end
-s/|#_!!_#|//g
-]])])dnl
-CEOF$ac_eof
-_ACEOF
-m4_define([_AC_SED_FRAG], [
-])m4_define([_AC_SED_DELIM_NUM], 0)m4_define([_AC_SED_CMD_NUM], 2)dnl
-
-])])dnl
-dnl
-m4_popdef([_AC_SED_FRAG_NUM])dnl
-m4_popdef([_AC_SED_CMD_NUM])dnl
-m4_popdef([_AC_SED_DELIM_NUM])dnl
-m4_popdef([_AC_SED_FRAG])dnl
+dnl Initialize an awk array of substitutions, keyed by variable name.
+dnl
+dnl First read a whole (potentially multi-line) substitution,
+dnl and construct `S["VAR"] ='. Then, escape '@' in the value,
+dnl and split it into pieces that fit in an awk literal.
+dnl Each piece then gets active characters escaped:
+dnl " -> \"
+dnl \ -> \\
+dnl newline -> \n
+dnl & -> \\& (otherwise & will be active in awk's sub)
+dnl
+dnl (if we escape earlier we risk splitting inside an escape sequence).
+dnl Output as separate string literals, joined with backslash-newline.
+dnl Eliminate the newline after `=' in a second script, for readability.
+dnl
+dnl m4-double-quote most of the scripting for readability.
+[cat >>$CONFIG_STATUS <<_ACEOF
+cat >>"\$tmp/subs.awk" <<\CEOF$ac_eof
+_ACEOF
+sed '
+t line
+:line
+s/'"$ac_delim"'$//; t gotline
+N; b line
+:gotline
+h
+s/^/S["/; s/!.*/"] = /; p
+g
+s/^.*!//; s/@/@|#_!!_#|/g
+:more
+t more
+h
+s/\(.\{]_AC_AWK_LITERAL_LIMIT[\}\).*/\1/
+t notlast
+s/["\\]/\\&/g; s/\n/\\n/g; s/&/\\\\&/g
+s/^/"/; s/$/"/
+b
+:notlast
+s/["\\]/\\&/g; s/\n/\\n/g; s/&/\\\\&/g
+s/^/"/; s/$/"\\/
+p
+g
+s/.\{]_AC_AWK_LITERAL_LIMIT[\}//
+b more
+' <conf$$subs.awk | sed '
+/^[^"]/{
+ N
+ s/\n//
+}
+' >>$CONFIG_STATUS
+rm -f conf$$subs.awk
+cat >>$CONFIG_STATUS <<_ACEOF
+ FS = "[|]#_!!_#[|]"
+}
+/@[a-zA-Z_][a-zA-Z_0-9]*@/ {
+ nfields = split($ 0, field, "@")
+ for (i = 1; i <= nfields; i++) {
+ key = field[i]
+ if (key in S)
+ sub("@" key "@", S[key])
+ else if (key in F) {
+ while ((getline aline < F[key]) > 0)
+ print(aline)
+ close(F[key])
+ next
+ }
+ }
+}
+{
+ gsub("[|]#_!!_#[|]", "")
+ print
+}
+CEOF$ac_eof
+_ACEOF
+]dnl end of double-quoted part
# VPATH may cause trouble with some makes, so we remove $(srcdir),
# ${srcdir} and @srcdir@ from VPATH if srcdir is ".", strip leading and
@@ -554,7 +563,7 @@
])dnl
m4_ifndef([AC_DATAROOTDIR_CHECKED], [$ac_datarootdir_hack
])dnl
-" $ac_file_inputs m4_defn([_AC_SED_CMDS])>$tmp/out
+" $ac_file_inputs m4_defn([_AC_SUBST_CMDS])>$tmp/out
m4_ifndef([AC_DATAROOTDIR_CHECKED],
[test -z "$ac_datarootdir_hack$ac_datarootdir_seen" &&
--- tests/torture.at 28 Oct 2006 09:41:07 -0000 1.72
+++ tests/torture.at 20 Nov 2006 20:09:28 -0000
@@ -539,18 +539,26 @@
# Solaris 9 /usr/ucb/sed that rejects commands longer than 4000 bytes. HP/UX
# sed dumps core around 8 KiB. However, POSIX says that sed need not
# handle lines longer than 2048 bytes (including the trailing newline).
-# So we'll just test a 2000-byte value.
+# So we'll just test a 2000-byte value, and for awk, we test a line with
+# almost 1000 words, and one variable with 4 lines of 500 bytes each.
AT_SETUP([Substitute a 2000-byte string])
AT_DATA([Foo.in], address@hidden@
])
+AT_DATA([Bar.in], address@hidden@
+])
+AT_DATA([Baz.in], address@hidden@
+])
AT_DATA([configure.ac],
[[AC_INIT
AC_CONFIG_AUX_DIR($top_srcdir/build-aux)
AC_SUBST([foo], ]m4_for([n], 1, 100,, ....................)[)
-AC_CONFIG_FILES([Foo])
+AC_SUBST([bar], "]m4_for([n], 1, 100,, . . . . . . . . . ..)[")
+AC_SUBST([baz], "]m4_for([n], 1, 4,, m4_for([m], 1, 25,, ... ... ... ... ....)
+)[")
+AC_CONFIG_FILES([Foo Bar Baz])
AC_OUTPUT
]])
@@ -558,6 +566,11 @@
AT_CHECK_CONFIGURE
AT_CHECK([cat Foo], 0, m4_for([n], 1, 100,, ....................)
)
+AT_CHECK([cat Bar], 0, m4_for([n], 1, 100,, . . . . . . . . . ..)
+)
+AT_CHECK([cat Baz], 0, m4_for([n], 1, 4,, m4_for([m], 1, 25,, ... ... ... ...
....)
+)
+)
AT_CLEANUP
@@ -589,20 +602,26 @@
AT_SETUP([Substitute and define special characters])
AT_DATA([Foo.in], address@hidden@
address@hidden@@notsubsted@@baz@ stray @ and more@@@baz@
])
AT_CONFIGURE_AC(
-[[foo="AS@&address@hidden([[X*'[]+ ", `\($foo]])"
+[[foo="AS@&address@hidden([[X*'[]+ ",& &`\($foo \& \\& \\\& \\\\&]])"
+bar="@foo@ @baz@"
+baz=bla
AC_SUBST([foo])
-AC_DEFINE([foo], [[X*'[]+ ", `\($foo]], [Awful value.])
+AC_SUBST([bar])
+AC_SUBST([baz])
+AC_DEFINE([foo], [[X*'[]+ ",& &`\($foo]], [Awful value.])
AC_CONFIG_FILES([Foo])]])
AT_CHECK_AUTOCONF
AT_CHECK_AUTOHEADER
AT_CHECK_CONFIGURE
-AT_CHECK([cat Foo], 0, [[X*'[]+ ", `\($foo
+AT_CHECK([cat Foo], 0, [[X*'[]+ ",& &`\($foo \& \\& \\\& \\\\&
address@hidden@ @baz@@address@hidden stray @ and more@@bla
]])
-AT_CHECK_DEFINES([[#define foo X*'[]+ ", `\($foo
+AT_CHECK_DEFINES([[#define foo X*'[]+ ",& &`\($foo
]])
AT_CLEANUP
- config files substitution with awk, Ralf Wildenhues, 2006/11/19
- Re: config files substitution with awk, Paul Eggert, 2006/11/20
- Re: config files substitution with awk,
Ralf Wildenhues <=
- Re: config files substitution with awk, Paolo Bonzini, 2006/11/21
- Re: config files substitution with awk, Ralf Wildenhues, 2006/11/21
- Re: config files substitution with awk, Paul Eggert, 2006/11/21
- Re: config files substitution with awk, Paul Eggert, 2006/11/21
- Re: config files substitution with awk, Ralf Wildenhues, 2006/11/21
- Message not available
- Re: config files substitution with awk, Paul Eggert, 2006/11/21
- Re: config files substitution with awk, Ralf Wildenhues, 2006/11/24
- Re: config files substitution with awk, Paul Eggert, 2006/11/26
- Re: config files substitution with awk, Ralf Wildenhues, 2006/11/27
- Re: config files substitution with awk, Ralf Wildenhues, 2006/11/28