groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] Status of the portability work, and plans for the future


From: Eric S. Raymond
Subject: Re: [Groff] Status of the portability work, and plans for the future
Date: Mon, 8 Jan 2007 15:36:35 -0500
User-agent: Mutt/1.4.2.2i

Werner LEMBERG <address@hidden>:
> My mistake.  Anyway, I think XML also knows Unicode character
> entities, right?  This is what I have meant.

Yes, you can embed Unicode entities in XML.  Right now doclifter does this 
for a handful of cases in which the right ISO entities don't exist.  The
set of these is currently:

        ("lh",  "&#x261E;"),    # Hand pointing left
        ("rh",  "&#x261C;"),    # Hand pointing right
        ("CR",  "&#x240D;"),    # Carriage return symbol
        ("fo",  "&#x2039;"),    # Single left-pointing quotation mark
        ("fc",  "&#x203a;"),    # Single right-pointing quotation mark
        # These are from groff
        ("yogh", "&#x021d;"),   # Small letter yogh
        ("ohook", "&#x01a1;"),  # Small letter o with hook or ogonek
        ("udot",  "&#0323;"),   # Combining underdot.

The names on the left are aliases that doclifter generates internally
in order to avoid having hard-to-read raw Unicode hex literals in the
generated XML.  Instead, it generates in the XML preamble one named 
entity definition for each hex literal, and then uses the named entity.

> > In fact, *all* defined groff-1.19 glyphs except the old Bell Labs
> > bracket-pile graphics get mapped to ISO entities -- even the exotica
> > like yogh and o-with-ogonek.
> 
> o-with-ogonek isn't an exotic letter at all!  All Poles will object to
> your assertion :-)

Not to mention the Lithuanians, and nobody wants to offend a country
so full of good-looking women. :-) Looking at my code, I see there is one more
exception; troff \*(an, horizontal arrow extension, can't be mapped either.
You'd think there'd be an ISO entity for this somewhere in the AMSA arrow
set, but there isn't.  Nor have I found a Unicode equivalent.

> Whatever decision we will find, I won't force anything right now.
> Maybe later.  Thus I don't evade a decision but postpone it.

Well, OK, but you'll only get to temporize until I turn in my patches
for the 1.19 development tree.
  
> Hmm.  To exaggerate, the only `technical ground' currently is that
> doclifter can't handle it.   Up to now nobody has ever claimed problems
> with groffer.1 -- while I understand your arguments, I don't see an
> urgent need to react immediately.

I see you've forgotten Gunnar's post on this topic.  He actually showed how
badly groffer.1 gets mangled in some viewers, with a screen dump.  If that 
doesn't constitute "urgent need", I'm not sure what would.

I'm really not trying to use the viewer-portability argument to solve
problems exclusive to doclifter.  I don't have to do that, because
doclifter plus XML stylesheets already generates better HTML from a
wider range of manual pages than any of the viewers can.  And it's
still improving; I just added code to parse ad-hoc tables made with .ta
and tabs rather than TBL markup, and I think I'm going to be able
to bite a large corner off of the .ti problem next.

The constraint is actually the other way around.  Gunnar demonstrated
that my initial cut at a portable request set was far too large,
because doclifter is better at emulating troff than the viewers are.
(The cost we pay for this is that doclifter's running time would be
too slow and variable for it to be used on the fly even if XSLT to
render the DocBook to HTML didn't take much longer. The toolchain is
just too slow; you have to batch-translate your man pages in advance
and cache the HTML somewhere.)

So I am trying to solve the viewer-portability problem now rather
than grinding an axe for doclifter.  Thank Gunnar for this, because he
convinced me it was both worthwhile and possible.  He caused me to
discover that the difference between what we would have to do to solve
doclifter's problems alone and the larger set of things we will have
to do to solve viewer portability is small enough that tackling both
at once makes sense.

So I'm going after the bigger one now, and treating the solution to
doclifter's problems as a happy and motivating side effect.  If I
were still only trying to solve doclifter's problems, groffer.1 could
be allowed to live as it is -- I could do what was needed to doclifter
to translate it, though that would be painful and I would still prefer
not to.

(The connection between solving the viewer-portability problem and
solving the structure-lifting problem is not an accident.  To solve
the viewer portability problem, you have to define a sublanguage of
troff+man that does not require knowing the fine physical capabilities
of the output medium.  This turns out to be almost the same subset as
the one that can be structurally translated.)

> > The problem is that once it is known that you have one, people
> > invent all sorts of clever, plausible reasons they should be on it
> > rather than doing the bit of extra work needed for a clean solution.
> > [... omitting shameless exaggerations ...]
> 
> According to your analysis, groffer.1 is basically the only candidate
> which is not going to be fixed easily -- for whatever reasons.  Not
> bad to have just one single exception out of 10000...

That would be one out of 13,000, and no groffer.1 isn't the only one.
There about 54 others currently on the too broken-to-live list.  These
are mostly pages that will break viewers much worse than they break
doclifter.  Here is a rough breakdown:

* 21 pages associated with netpbm.  
* 8 pages associated with groff.
* 6 empty pages generated by broken Perl build machinery
* 5 seriously mangled pages generated by the Canna project.  (They
    run several pages together as one, complete with multiple .TH headers.)
* 2 pages shipped by a defunct project called wordtrans (viewers handle
    these OK).
* 4 pages generated from Doxygen sources by a very broken reporting tool.
* 6 other pages with markup so gnarled that doclifter barfs on it --
    mostly these are weird edge cases that trip over bugs in my mandoc 
    interpreter.  Maybe three of these could be patched around if I 
    didn't have higher-priority things to do.

This is really not good company for the groff documentation to be in.
And it's going to be worse company in a couple of weeks when Bryan
Henderson and I have fixed the netpbm problems (scheduled, and I know
exactly how to do it, but it's not done yet).

> It's not necessary to tell anyone that an exception list exists :-)

Trust me.  They find out :-(    <--- Bitter experience speaking again.
 
> > And even for pages that can't be strictly viewer-portable, simplifying
> > them to the point where doclifter can lift them will have benefits.
> 
> Uh, oh, I'm not comfortable with `simplifying until doclifter can
> handle it'. 

No, no.  You misunderstand.  Simplifying until doclifter can handle it
is *easier* than the real problem -- cross-viewer portability -- not
harder.  Simplifying until a page doesn't break non-groff viewers 
normally solves all of doclifter's problems handily.

The exception cases where a page intrinsically can't be
viewer-portable are *extremely* rare. Offhand I can only think of four
such, groff_char.7 and three from the Canna project that are broken
for other reasons.

> > > Ideally, they should use groff for formatting (opening a TTY
> > > window showing `man' output would be sufficient IMHO) if the
> > > number of problems exceeds a certain threshold.
> >
> > And that's an excellent idea for a general fallback.
> 
> groffer.1 comes to my mind :-)

Well, yes.  But for this to work, we'd have to push patches out for
every single viewer first.  That's a rather high price to pay to avoid
offending one sulky groff contributor when we can fix the problem in
one spot upstream.
-- 
                <a href="http://www.catb.org/~esr/";>Eric S. Raymond</a>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]