On 25 March 2011 04:38, Werner LEMBERG wrote:
Justin,
a simple example says more than thousand words... So please give us
an example we can examine.
Hear! Hear!
At a first glance, it seems you have an encoding problem (but this
doesn't explain the strange things you see). The default encoding of
groff is latin1, and your input file is probably UTF8. Starting with
version 1.20, groff can handle UTF8 by use a new preprocessor.
The HTML output driver is still experimental (and basically
unmaintained currently due to lack of time and interest); it is easily
possible that you've found a bug.
Equally -- perhaps more -- likely, Justin has encountered a hyphenation
issue. This:
On the 11th in my groff file, an "â" character is found after 64
characters have been printed, within the word hamburger, the text gets
parsed and printed as "hamâburger". If I change hamburger to donations
I have the "â" character show up at the 60th character on the line,
with donations being "donaâtions".
is reminiscent of an issue I myself observed, earlier this week. I had
run some informally structured ASCII text through a sed filter, and then
through nroff, (v1.20.1), to produce an alternative layout. Although I
had suppressed hyphenation (.hy 0), I did have several explicit ASCII
hyphen characters in the input stream; each of these was replaced, in
the output stream, by the three byte octal sequence 342 200 220, (which
I guess represents u2010 -- the Unicode hyphen which groff_char(7)
documents as the output form for hyphen).
Viewing this output with "less", on my UTF-8 aware console, it looked
absolutely fine, but after uploading as a package description file on my
SourceForge downloads page, each hyphen was rendered, by Firefox, with
unwanted whitespace surrounding it; rendered by Internet Explorer, each
hyphen was replaced by three characters of garbage, amongst it being the
"â" observed by Justin, IIRC.
So yes, I guess what you actually see is dependent on encoding, (and how
the viewer interprets the u2010 sequence, however it is encoded). In my
case, I wanted real ASCII hyphens in my output stream; adding "-Tascii"
to my nroff command gave me that.
--
Regards,
Keith.