groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: string questions


From: G. Branden Robinson
Subject: Re: string questions
Date: Wed, 30 Nov 2022 12:10:25 -0600

Hi Dave,

At 2022-11-30T07:49:48-0600, Dave Kemper wrote:
> I have a couple questions about groff strings.
> 
> 1. In its section about the \* escape to interpolate a string, the
> current manual text states, "The delimited form need not use the
> neutral apostrophe; see *note Delimiters::."  But the example above
> shows [ and ] as delimiters, and in fact single quotes (neutral
> apostrophes) do not seem to work as \* delimiters:
> 
> $ cat test1
> .ds mystring hello
> \*[mystring]
> \*'mystring'
> $ groff -Tascii test1 | cat -s
> hello mystring'
> 
> The output indicates that groff is parsing "\*'mystring'" as a(n
> undefined) string called "\*'" followed by the text "mystring'".
> 
> Is this reference to neutral apostrophes an error,

That's exactly what it is.  I did a bunch of commits like a3a7f6e0edcbb
(11 October) updating internal cross references (also see
0a502f1bec08fe, 9 October) after a heavy revision of text on 25
September (66b7cb51ba9439).  Thanks for catching this; I apologize for
the confusion.

\*[] is analogous to \f[] (select font with "long" typeface identifier),
\n[] (interpolate register with "long" identifier), and \[] (interpolate
special character with "long" identifier).

> or is there some aspect of this I'm not understanding?

Nope.  I'll review this and recast.

> 2. The manual also states, "In contrast to macro invocations, however,
> a closing bracket as a string argument must be enclosed in double
> quotes."  But in practice this seems to be true of not just a closing
> bracket on its own, but anything containing a closing bracket.

You're right.  The language here is excessively specific.  This one I
think I inherited.  I have recast it some but kept the excess precision.

Our Texinfo manual in groff 1.22.4 says:

     ... Only the syntax form using brackets can take arguments that are
     handled identically to macro arguments; the single exception is
     that a closing bracket as an argument must be enclosed in double
     quotes.

There is not a single exception, but a class of them.

[reduced example]
> .ds mystring2 hello \\$1
> \*[mystring2 \[aq]]
> $ groff -Tascii test3 | cat -s
> troff:test3:6: error: newline character not allowed in escape sequence 
> parameter
> 
> Is this the expected behavior?

The output didn't surprise me, but the diagnostic...

> The error message in particular gives little indication of what the
> actual problem is.

...is very meh.

> More significantly, it's unclear why "\*[mystring2 \[aq]]" shouldn't
> be parseable.  Clearly a "]" on its own needs the quotes to
> disambiguate it from the closing bracket of the \* escape.  But the
> opening bracket of the inner "\[aq]" ought to be able to tell groff
> that the next closing one is associated with that escape rather than
> with the outer one.

I _think_ this is a consequence of the way the hand-written recursive
descent parser in src/roff/troff/input.cpp is written.  The good news is
that I've spent some time wrapping my head around it over the past 3
years.

But I will have to step through it with GDB to remember (or perhaps
learn in the first place) why the foregoing parse didn't startle me.  So
I'll do that after fixing the bad cross reference above.

As a guess, once entering a bracket-delimited structure like this, the
parser might look ahead for the very next "]" not within double quotes,
and perform a recursive parse only on that material, to (waves hands)
keep the recursion bounded.

First consider what happens with the following.

\*[nonexistent arg1 arg2 fnord]

What becomes of parameters to a string (or macro) that doesn't do
anything with them?

Now back to your input.

Loosely, I can see the parse of
  \*[mystring2 \[aq]]
proceeding like
  \
  ^ escape time; get function selector
  \*
   ^ handle string interpolation; next character determines which
     approach we take[1]
  \*[
    ^ Aha!  Interpolate a string name with optional parameters!
    get the name of the string
    look ahead for unquoted ]
    lookup name "mystring2 \[aq"
    this name is not defined
    ("call" empty-named string [we would expect to discard parameters])
    collect parameters...
    first parameter is "]"[2]
    next parameter is...
    whoops!  newline!  I wasn't expecting that!  <complain>

> So, is the fact that this doesn't parse as expected a bug?  Or is this
> not supposed to work, but the error text is suboptimal for explaining
> the problem?

The diagnostic is indeed unhelpful.  Until I understand the parser state
at the time it is thrown, I won't know how easy it would be to improve
the message.

Regards,
Branden

[1] If the next character is '(', look up and interpolate the next two
    characters as a string name.  If '[', do a groffish string
    interpolation as shown.  Otherwise interpolate string with
    one-character name.
[2] This is the part of the parse I'm mentally really iffy about.

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]