bug-m4
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: format bug


From: Eric Blake
Subject: Re: format bug
Date: Wed, 30 May 2007 18:56:32 -0600
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.10) Gecko/20070221 Thunderbird/1.5.0.10 Mnenhy/0.7.5.666

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

According to Eric Blake on 5/28/2007 10:15 PM:
> Try this for a fun time:
> 
> $ echo 'format(%*.*d,-1,-1,1)' | m4 | wc
>       1       1 2280281
> 

A related question I have about the format builtin:

Consider printf(1).  POSIX allows conversion from integer value to
character (with ASCII, "printf %b '\x09'" results in a literal TAB),
character to integer value (with ASCII, "printf %d '"a'" results in 97),
treats %c as a synonym for %.1s ("printf %c 9" results in 9, not TAB),
and honors certain escape sequences (both directly in the format string,
and indirectly via %b).

Meanwhile, m4's format builtin, for the past 17 years, has handled %c as
a conversion from integer to character (with ASCII, format(%c,9) results
in TAB, and format(%c,a) results in a NUL which truncates the string),
has no way to convert a character to an integer, does not support escape
sequences, and requires the use of %s to grab raw characters.  I find
this rather confusing.

I'm thinking of changing this setup so that m4's format is more like
printf(1) (unlike C, where printf(3) can distinguish between character
literals and integers, m4's format is restricted that all arguments
start out as strings, much like the shell's printf(1).)  But this is a
backwards-incompatible change.  So what I am proposing is to make m4
1.4.10 implement %b, and issue a warning when \ is encountered in the
format or when %c is encountered in the format with an integer argument,
but keep output identical with earlier m4 1.4.x except that %b now
results in content instead of the undocumented behavior of being
skipped.  Then m4 2.0 could just use the newer printf(1) semantics
without worry.  I would also update the documentation to mention the
change in direction, as well as these portability guidelines for using
format consistently across both 1.4.x and 2.0:

- - avoid \ in the first argument to format
- - if you want a literal \, use format(%s,\) or just rely on m4
concatenation of \ outside of format
- - if you want to convert an integer to a character, write a wrapper:
ifelse(format(%b,1),1,format(%b,\x09),format(%c,9))
- - no portable way to convert a character to an integer short of a
255-element reverse-lookup table (you could use a forloop recursion
construct, but be sure your iterator and quote characters are
multi-character for the duration of the loop to avoid parse problems; hmm,
maybe I should code this up and add it to the examples directory)
- - if you want the first character of a string, use %.1s instead of %c

Any objections to this approach?

- --
Don't work too hard, make some time for fun as well!

Eric Blake             address@hidden
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (Cygwin)
Comment: Public key at home.comcast.net/~ericblake/eblake.gpg
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFGXh1A84KuGfSFAYARAl+XAJ9a3YI6NE3WJUPlStNj1+TNu0eWVQCfSSYO
tfBpbp46K8kgRHRs36E1EeQ=
=3eA6
-----END PGP SIGNATURE-----




reply via email to

[Prev in Thread] Current Thread [Next in Thread]