[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2] Support 2-digit \sNN only in compatibility mode.
From: |
Ingo Schwarze |
Subject: |
Re: [PATCH v2] Support 2-digit \sNN only in compatibility mode. |
Date: |
Sat, 4 Apr 2020 16:11:38 +0200 |
User-agent: |
Mutt/1.12.2 (2019-09-21) |
Hi Branden,
G. Branden Robinson wrote on Sat, Apr 04, 2020 at 05:29:38AM +1100:
> Second attempt.
I have no very strong opinion about this, so i don't strictly object
to changing it, if people here really consider it valuable to have.
Then again, I'm still not quite sure what exactly the point is.
* We can discourage using \s12 and \s48 in the documentation
without changing the language syntax and semantics, if we want
to (that would probably make sense, as far as it is not already
done).
* We can throw a warning about \s36 (Consider using \s[36] because
that's more readable.) and also about \s40 (Do you really want
to print the digit zero in point size 4? If so, consider saying
\s[4]0 for clarity.) without changing any behaviour.
* Either way, i would consider \s12 a legacy idiom, supported for
backward compatibility only. What is the point of making a
feature more intuitive that is only supported for backward
compatibility in the first place?
* While the new interpretation of \s12 is formally rigorous, it
makes extremely little sense for practical use. Why would anybody
ever want to print anything in point size 1? Or even in point
size 2 or 3? That feels really useless. Syntactic purity and
simplicity of a language is indeed a goal, but not for its own
sake, only in so far as it makes learning the language easier,
and that's not the case here because no new users need to learn
about \s12 anyway.
* While i don't really see much benefit in making a legacy feature
nicer when there already are better modern features providing
the same functionality, there definitely is a cost: some
documents will become harder to process in a correct way,
and some people who value their finger memory more than code
clarity will be annoyed for little gain. It's definitely
setting a trap for some experienced users.
* "Compatibility mode", in general, is much less helpful than it
usually seems. Every time you think about it, you tend to only
think about that one feature at hand and all seems fine in that
mindset: if you want to old behaviour, switch the mode on, else
leave it off.
But software evolves gradually over the decades. There isn't a
single point in time such that before is legacy and afterwards
is modern. So the more you add to compatibility mode, the more
often people will have the problem that they need to process
documents that *require* compatibility mode in some respects,
but only work *without* compatibility mode in other respects,
because the documents were written at a time when some mildly
modern features had already been introduced and could already
be used freely (outside compatibility mode, that is), while at
that same time, some antique feature were still fully supported
that now begin to require compatibilty mode. The more time goes
by, the more severly broken the whole fundamentally flawed concept
of compatibility mode will eventually become. In that sense,
the concept of compatibilty mode is a time bomb, or more precisely
a smouldering fire that will slowly, but eventually develop into
a considerable blaze. The less it is touched, the less risk is
added.
[...]
> From 682ba81d314497014f72f26a8c73ff4505a6eee9 Mon Sep 17 00:00:00 2001
> From: "G. Branden Robinson" <address@hidden>
> Date: Sat, 4 Apr 2020 05:14:09 +1100
> Subject: [PATCH] Support 2-digit \sNN only in compatibility mode.
>
> * src/roff/troff/input.cpp (read_size): Move special-case
> interpretation of single-digit point-size escapes with two digits to
The phrase "single-digit point-size escapes with two digits" sounds
confusing and oxymoronic. Maybe "point size escapes of the form
\sNN", if this goes in?
> compatibility mode (groff -C) only, and throw error diagnostic with
> suggestion for remedy if encountered.
>
> The problem is that traditionally '\s36A' is interpreted as "set point
> size to 36, then emit 'A'". However, only values in the range 10-39
> are handled specially; '\s40A' is interpreted as a four-point "0A".
> This is unlike anything else in *roff grammar; see \*, \$, \f, \F, \g,
> \k, \m, \M, \n, \V, and \Y.
>
> To anticipate objections: Why not throw only a warning? Because there
> isn't a warning category for supported but ambiguous syntax
Indeed, the groff warning categories are severely ill-designed.
Several of them are overly specific. There is really no point in
switching something like "el" or "right-brace" on or off individually.
At the same time, even though the number of categories is way too
large, there are no categories for what really matters. The most
important categories would be somewhat like these:
1. critical syntax errors: the author's intent is unclear and the
document will likely be considerably misformatted, or text from
the source file is likely to be lost, not appearing in the output
at all (could subsume char, delim, di, el, escape, input; and
probably parts of missing, number, range, right-brace, scale)
2. resource errors: the syntax is OK, but the document is likely
to be severely mangled anyway, possibly also with information
loss, because external resources are missing (file and font are
typical examples)
3. minor syntax errors: the author's intent is unclear but the
consequences will likely be limited to minor ugliness of
formatting; document content is unlikely to be lost (examples:
missing - e.g. for requests of minor importance, like .af or .hcode
number - e.g. when it occurs in a point size or similar context
scale - maybe in some situations ... etc.
4. layout warnings: there is no syntax error and the author's intent is
clear, but it just doesn't work well (break is a typical example;
others would be filling problems like unusually wide inter-word
distances in one line of a narrow column, or one isolated line of
a paragraph in the first or last line of a page or column, or a
single short word wrapping over to a new line).
5. style warnings: intent is clear and formatting will be just fine,
but maybe the author should have a look anyway because there
may be a better way to do the same, or the situation may or may not
indicate some oversight (ig, mac, reg, space could be subsumed here,
and maybe syntax or parts of it; this is where a warning about
\s48 in compat mode or about \s12 in non-compat mode would belong)
Of course it need not be exactly like this. There could be fewer
categories (e.g. one could combine 1+2 and/or 3+4), or some might
want to split a category (e.g. 1 could be split into character-related,
programming-related, layout-related), but this is how warning categories
ought to be designed, in principle.
I have no idea how to deal with the current mess. Just add it to
whatever category comes closest, even if it doesn't really fit?
In any case, *please* do not use shortcomings of the GNU troff
specific message system as an argument for or against any syntactic
or semantic choices regarding the roff(7) language at large.
> (this
> behavior of AT&T troff dates to 1976 but apparently was not documented
> until 1992). Why not throw the warning outside of compatibility mode
> too? Because outside of compatibility mode we (now) have an
> unambiguous parse.
Right, but the meaning of the what was parsed makes absolutely no
sense, so it would be clearly useful to warn about \s12 in the
new world: what it does is almost certainly not what the author
intended, and with such a small point size, the misformatting
will even be quite severe, likely producing unreadable text.
[...]
> diff --git a/src/roff/troff/input.cpp b/src/roff/troff/input.cpp
> index 6f35dadf..d21bd698 100644
> --- a/src/roff/troff/input.cpp
> +++ b/src/roff/troff/input.cpp
> @@ -5076,13 +5076,30 @@ static int read_size(int *x)
> }
> else if (csdigit(c)) {
> val = c - '0';
> - if (!inc && c != '0' && c < '4') {
> + if (compatible_flag && !inc && c != '0' && c < '4') {
> + // Note: Very special case! If we have \s followed immediately by
> + // a digit (not '(', '+', or '-'), and that digit is 1, 2, or
> + // 3...read another digit! This is because the Graphic Systems
> + // C/A/T phototypesetter (the original device target for AT&T
> + // troff) only supported a few discrete point sizes in the range
> + // 6..36. Kernighan warned of this in the 1992 revision of CSTR
> + // #54 (section 2.3), and more recently, McIlroy referred to it as
> + // a "living fossil". This DWIM syntax is surprising to the user;
> + // it clashes with the syntax of several other escapes (\*, \$,
> + // \f, \F, \g, \k, \m, \M, \n, \V, and \Y). We therefore support
> + // it only in compatibility mode.
> + //
> + // See:
> + // https://lists.gnu.org/archive/html/groff/2020-03/msg00054.html
> + // et seq.
This comment feels quite excessive to me. Very long comments may
occasionally make sense above functions that are very long and very
complicated, but you clearly shouldn't comment a five-line if block
in the middle of a relatively simple, straight-forward function
with a 15-line comment, in particular if the block is neither
all that important nor hard to understand. Such a long comment
makes the actual code hard to find, and makes the file hard to read
because you have to scroll so much.
"Note: Very special case!" says nothing at all.
"If we have .. read another digit" and "We therefore support it
only in compatibility mode" is totally obvious in the first place:
i++; /* increment the variable i by one */
"This is because ... living fossil" may be appropriate in a commit
message, but not as a comment, and in the commit message, a reference
to the mailing list discussion may nearly suffice.
"This DWIM syntax is surprising" doesn't belong in the code at all;
we might want to warn users in the manual page, though.
I don't think any comment is really needed at all because the
code is very obvious and straightforward. If you absolutely
want to say something,
// Support the legacy form \s10 to \s39.
or something like that should be sufficient.
Yours,
Ingo