[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [groff] [UTROFF] Troff and xml
From: |
Pierre-Jean |
Subject: |
Re: [groff] [UTROFF] Troff and xml |
Date: |
Sun, 03 Dec 2017 12:55:28 +0100 |
User-agent: |
mail v14.8.16 |
Ralph Corderoy <address@hidden> wrote:
> > It seems to be removed by the mailing list server
>
> Yes, I expect Mailman has been configured to strip text/html parts.
> Just include it in the plain/text part of your email? Or have your MUA
> send it as a text/plain part since it's the HTML we want to see, not the
> rendering of it.
Here it is, thank you Ralph!
Pierre-Jean.
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>Utroff</title><meta
http-equiv="Content-Type" content="text/html; charset=UTF-8"></meta><meta
name="generator" content="heirloom nroff -mux"></meta><meta name="date"
content="2017-12-03T12:49:06"></meta><link rel="stylesheet" href="style.css"
type="text/css" media="screen"></link></head><body>
<p>Hello alls,</p>
<p>Resuming my little sery of articles, I am explaining today
how Utmac is linked to the XML world.</p>
<h4>Troff and Xml</h4>
<p>We all have in mind the various attempts to produce XML
files from a troff document: some aim to be universal, and,
dealing with the raw troff requests, can only ouptut non
semantic html with hardcoded styles, while others, dedicated
to a particular macro, fail to consider the raw troff
requests the user may need in his document (cf. the source
of ms2html, in which the author comments he is implementing
more and more raw troff requests)</p>
<p>XML files are nothing else but plain text files with
semantic informations. On the other side, a troff document
contains structured information which gets its meaning
within the context of a macro. When we think at it, we have
yet a tool which interprets a troff source within the
context of a macro to produce plain text files: nroff.
Could we use nroff to produce xml files ? I tried, and it
appears that solution works well.</p>
<p>The idea is simple: one only has to write a macro file,
which interprets all the interface macros (paragraph,
headers...), to add XML tags to the output file. For
example, here is a simple macro to produce XML paragraphs
and headings:</p>
<pre><span class="F">.</span><span class="F">de</span> <span class="F">PP</span>
<span class="F">.</span> <span class="F">\" first, we close the previous
block</span>
<span class="F">.</span> <span class="F">\" by printing its recorded
tag</span>
<span class="F">.</span> <span class="F">if</span> <span
class="F">d</span> xml-block <span class="F">\</span><span
class="F">\*[</span><span class="F">xml-block</span><span class="F">]</span>
<span class="F">.</span> <span class="F">\" Secondly, we define the
closing tag for the block</span>
<span class="F">.</span> <span class="F">ds</span> <span
class="F">xml-block</span> <span class="F"></p></span>
<span class="F">.</span> <span class="F">\" and last, we print the
openning tag.</span>
<p>
<span class="F">.</span><span class="F">.</span>
<span class="F">.</span><span class="F">de</span> <span class="F">H1</span>
<span class="F">.</span> <span class="F">if</span> <span
class="F">d</span> xml-block <span class="F">\</span><span
class="F">\*[</span><span class="F">xml-block</span><span class="F">]</span>
<span class="F">.</span> <span class="F">rm</span> xml-block
<h1><span class="F">\</span><span class="F">\$*</span></h1>
<span class="F">.</span><span class="F">.</span></pre>
<p>Nroff has to be configured to produce a correct xml files:
we do not want hyphen, lines don’t need to be adjusted, and,
the page length has to be defined correctly.</p>
<pre><span class="F">.</span><span class="F">\" page length is one line</span>
<span class="F">.</span><span class="F">pl</span> 1v
<span class="F">.</span><span class="F">ll</span> 75
<span class="F">.</span><span class="F">\" don’t adjust nor hyphenates</span>
<span class="F">.</span><span class="F">na</span>
<span class="F">.</span><span class="F">nh</span>
<span class="F">.</span><span class="F">\" Ending macro is doc:end</span>
<span class="F">.</span><span class="F">em</span> doc:end
<span class="F">.</span><span class="F">\" Print header</span>
<?xml version="1.0" encoding="UTF-8"?>
<span class="F">.</span><span class="F">\" Open the root tag</span>
<utmac>
<span class="F">.</span><span class="F">de</span> <span class="F">doc:end</span>
<span class="F">.</span> <span class="F">\" doc:end needs some more space
to output text</span>
<span class="F">.</span> <span class="F">pl</span> <span
class="F">\</span><span class="F">\n(</span><span class="F">nl</span>u+3v
<span class="F">.</span> <span class="F">\" close the previous
block</span>
<span class="F">.</span> <span class="F">if</span> <span
class="F">d</span> xml-block <span class="F">\</span><span
class="F">\*[</span><span class="F">xml-block</span><span class="F">]</span>
<span class="F">.</span> <span class="F">\" Close the root tag.</span>
</utmac>
<span class="F">.</span> <span class="F">\" set correct page length</span>
<span class="F">.</span> <span class="F">pl</span> <span
class="F">\</span><span class="F">\n(</span><span class="F">nl</span>u
<span class="F">.</span><span class="F">.</span></pre>
<p>Since the fonts are hierarchical and defined as strings in
Utmac, they are easy to implement as well.</p>
<pre><span class="F">.</span><span class="F">ds</span> <span
class="F">font-bold0</span> <span class="F"></B></span>
<span class="F">.</span><span class="F">ds</span> <span
class="F">font-bold1</span> <span class="F"><B></span>
<span class="F">.</span><span class="F">nr</span> <span class="F">f-b</span>
<span class="F">0</span>
<span class="F">.</span><span class="F">ds</span> <span class="F">B</span>
<span class="F">\ER’f-b 1-\En[f-b]’\E*[font-bold\En[f-b]]</span></pre>
<p>The only real problem of using nroff to produce xml
documents is that — along with troff — it is not easy to
deal with automatically inserted spaces. I tried to use
.chop and \c, but without reliable results. To solve that
problem and escape the possible restricted characters a user
may insert in his document (’<’, ’>’, and
’&’), I wrote a small post-processor –
<span class="I">postxml</span> –, which translates a custom set of tags
to xml special characters. Amongst those tags, a special tag
removes newlines:</p>
<pre>#[ becomes <
#] becomes >
#( becomes &
#) becomes ;
<span class="F">\n#</span>-<span class="F">\n </span>is deleted from the
stream, and is used to delete newlines.</pre>
<p>So, instead of directly writing xml tags, the nroff macro
produces writes those custom tags, which are later
translated by postxml. Our paragraph macro becomes:</p>
<pre><span class="F">.</span><span class="F">de</span> <span class="F">PP</span>
<span class="F">.</span> <span class="F">if</span> <span
class="F">d</span> xml-block <span class="F">\{</span><span class="F">\</span>
<span class="F">.</span> <span class="F">\" tag to remove unwanted
newlines</span>
#-
<span class="F">.</span> <span class="F">\" closing xml tag</span>
<span class="F">\</span><span class="F">\*[</span><span
class="F">xml-block</span><span class="F">]</span>
<span class="F">.</span> <span class="F">\}</span>
<span class="F">.</span> <span class="F">ds</span> <span
class="F">xml-block</span> <span class="F">#[/pp#]</span>
<span class="F">.</span> <span class="F">\" opening xml tag</span>
#[pp]
<span class="F">.</span> <span class="F">\" tag to remove unwanted
newlines</span>
#-
<span class="F">.</span><span class="F">.</span></pre>
<p>A preprocessor, prexml, escapes the possible presence of
those tags in the user document. The troffxml archive,
avaible on
<a href="http://utroff.org ,"></a>
provides prexml, postxml, and a two xsl stylesheet to
produce html and fodt (flat open document) files, and Utmac
provides the macro ux for that purpose. So, the command to
produce xml documents from a troff source is:</p>
<pre>prexml <span class="F"><</span> f.tr <span class="F">|</span> nroff
-Tlocale -mux <span class="F">|</span> postxml <span class="F">></span> f.xml
xsltproc utohtml.xsl f.xml <span class="F">></span> f.html
xsltproc utofodt.xsl f.xml <span class="F">></span> f.fodt</pre>
<p>Since I believe you want to have a look at the result, you
will find, joined to this mail, its xml, html, and fodt
versions as produced by this system (which reveals the fodt
code block needs some more work...).</p>
<p>On my next mail about Utmac, I will present you some
goodies.</p>
<p>Kind Regards,
Pierre-Jean</p>
</body></html>