[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: config files substitution with awk
From: |
Ralf Wildenhues |
Subject: |
Re: config files substitution with awk |
Date: |
Wed, 6 Dec 2006 14:38:42 +0100 |
User-agent: |
Mutt/1.5.13 (2006-11-01) |
* Pascal Bourguignon wrote on Wed, Dec 06, 2006 at 11:00:51AM CET:
> Ralf Wildenhues <address@hidden> writes:
> >
> > s/@var1@/@|#_!!_#|var2@/g
> > s/@var2@/text2/g
> > ...
> > s/|#_!!_#|//g
> Yes. I was pointing to the semantics of sed, not to the restricted
> usage autoconf needs. With this later s/|#_!!_#//g we still have the
> problem, if we must use a generic sed. IMO, it would be better to use
> a specific tool, since it could be easily implemented in
> O(length(input-file)), and wouldn't even need to implement
> sophisticated DFA at all (given the @..@ convention).
I'm not sure if we're talking past each other or simply in violent
agreement. My reasoning is as follows:
- problem: Autoconf-generated config.status scripts are slow for large
packages.
- analysis: it uses a suboptimal sed-based algorithm for substitution.
- Any solution to the problem must be extremely portable, so it should
adhere to POSIX, the GNU Coding Standards, and also take into account
further known limitations of real-world systems (the Autoconf manual
has a guide for portability issues).
- awk is both portable, available everywhere, and allows for a better
algorithm: we can exploit the hashing that is used in array index
lookup.
- According to Paul, it's ok to assume (ancient V7) awk.
- result: use portable awk to accomplish the same task.
Fix the GNU Coding Standards to allow awk, so we comply.
- outlook: more modern awk could allow for an algorithm with even
better asymptotic scaling, as outlined in [1]. But for real-world
configure scripts, this step doesn't seem necessary yet.
So are you now saying this job can be done even better, without
resorting to awk? If yes, details please?
Note that the current code doesn't use a regex engine _at all_. It
simply splits the input on `@' characters. A splitted string between
two such characters is then used for index lookup in an awk array.
Depending on the index set I of the array and the quality of the awk
implementation, this typically costs either log(|I|) or constant time.
The latter would correspond to your O(length(input-file)). In practice,
the difference is largely lost in the noise.
But maybe you're just misreading here that Autoconf falls back on
generic sed: it does not. That was merely a suggestion of mine
for the bootstrapping of the one specific package named GNU gawk.
Paul argued that this is not necessary. So I'm done.
Paolo already addressed the proposal of adding optimization to GNU sed.
Cheers,
Ralf
[1] http://lists.gnu.org/archive/html/autoconf-patches/2006-11/msg00049.html
- Re: config files substitution with awk, Ralf Wildenhues, 2006/12/03
- Re: config files substitution with awk, Paul Eggert, 2006/12/04
- Re: config files substitution with awk, Karl Berry, 2006/12/04
- Re: config files substitution with awk, Paul Eggert, 2006/12/05
- Message not available
- Re: config files substitution with awk, Pascal Bourguignon, 2006/12/05
- Re: config files substitution with awk, Ralf Wildenhues, 2006/12/05
- Re: config files substitution with awk, Paolo Bonzini, 2006/12/06
- Message not available
- Re: config files substitution with awk, Pascal Bourguignon, 2006/12/06
- Re: config files substitution with awk,
Ralf Wildenhues <=