bug-groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #64279] Proposal to rename roff(7)


From: G. Branden Robinson
Subject: [bug #64279] Proposal to rename roff(7)
Date: Sat, 3 Jun 2023 13:15:41 -0400 (EDT)

Update of bug #64279 (project groff):

                Category:                    None => General                
                Severity:              3 - Normal => 1 - Wish               
              Item Group:                    None => Documentation          
                  Status:                    None => Need Info              

    _______________________________________________________

Follow-up Comment #1:

Are you sure you're looking at a copy of roff(7) from the latest release
candidate, 1.23.0.rc4?

The page covers much more than just *roff history.


[...]
    Below we present typographical concepts that form the background of
    all roff implementations, narrate the development history of some
    roff systems, detail the command pipeline managed by groff(1),
    survey the formatting language, suggest tips for editing roff input,
    and recommend further reading materials.

Concepts
    roff input files contain text interspersed with instructions to
    control the formatter.  Even in the absence of such instructions, a
    roff formatter still processes its input in several ways, by
    filling, hyphenating, breaking, and adjusting it, and supplementing
    it with inter-sentence space.  These processes are basic to
    typesetting, and can be controlled at the input document's
    discretion.

    When a device-independent roff formatter starts up, it obtains
    information about the device for which it is preparing output from
    the latter's description file (see groff_font(5)).  An essential
    property is the length of the output line, such as "6.5 inches".

    The formatter interprets plain text files employing the Unix line-
    ending convention.  It reads input a character at a time, collecting
    words as it goes, and fits as many words together on an output line
    as it can--this is known as filling.  To a roff system, a word is
    any sequence of one or more characters that aren't spaces, tabs, or
    newlines.  The exceptions separate words.

    A roff formatter attempts to detect boundaries between sentences,
    and supplies additional inter-sentence space between them.  It flags
    certain characters (normally "!", "?", and ".") as potentially
    ending a sentence.  When the formatter encounters one of these end-
    of-sentence characters at the end of an input line, or one of them
    is followed by two (unescaped) spaces on the same input line, it
    appends an inter-word space followed by an inter-sentence space in
    the output.  The dummy character escape sequence \& can be used
    after an end-of-sentence character to defeat end-of-sentence
    detection on a per-instance basis.  Normally, the occurrence of a
    visible non-end-of-sentence character (as opposed to a space or tab)
    immediately after an end-of-sentence character cancels detection of
    the end of a sentence.  However, several characters are treated
    transparently after the occurrence of an end-of-sentence character.
    That is, a roff does not cancel end-of-sentence detection when it
    processes them.  This is because such characters are often used as
    footnote markers or to close quotations and parentheticals.  The
    default set is ", ', ), ], *, \[dg], \[dd], \[rq], and \[cq].  The
    last four are examples of special characters, escape sequences whose
    purpose is to obtain glyphs that are not easily typed at the
    keyboard, or which have special meaning to the formatter (like \).

    When an output line is nearly full, it is uncommon for the next word
    collected from the input to exactly fill it--typically, there is
    room left over only for part of the next word.  The process of
    splitting a word so that it appears partially on one line (with a
    hyphen to indicate to the reader that the word has been broken) with
    its remainder on the next is hyphenation.  Hyphenation points can be
    manually specified; groff also uses a hyphenation algorithm and
    language-specific pattern files to decide which words can be
    hyphenated and where.  Hyphenation does not always occur even when
    the hyphenation rules for a word allow it; it can be disabled, and
    when not disabled there are several parameters that can prevent it
    in certain circumstances.

    Once an output line is full, the next word (or remainder of a
    hyphenated one) is placed on a different output line; this is called
    a break.  In this document and in roff discussions generally, a
    "break" if not further qualified always refers to the termination of
    an output line.  When the formatter is filling text, it introduces
    breaks automatically to keep output lines from exceeding the
    configured line length.  After an automatic break, a roff formatter
    adjusts the line if applicable (see below), and then resumes
    collecting and filling text on the next output line.

    Sometimes, a line cannot be broken automatically.  This usually does
    not happen with natural language text unless the output line length
    has been manipulated to be extremely short, but it can with
    specialized text like program source code.  groff provides a means
    of telling the formatter where the line may be broken without
    hyphens.  This is done with the non-printing break point escape
    sequence \:.

    There are several ways to cause a break at a predictable location.
    A blank input line not only causes a break, but by default it also
    outputs a one-line vertical space (effectively a blank output line).
    Macro packages may discourage or disable this "blank line method" of
    paragraphing in favor of their own macros.  A line that begins with
    one or more spaces causes a break.  The spaces are output at the
    beginning of the next line without being adjusted (see below).
    Again, macro packages may provide other methods of producing
    indented paragraphs.  Trailing spaces on text lines (see below) are
    discarded.  The end of input causes a break.

    After the formatter performs an automatic break, it may then adjust
    the line, widening inter-word spaces until the text reaches the
    right margin.  Extra spaces between words are preserved.  Leading
    and trailing spaces are handled as noted above.  Text can be aligned
    to the left or right margin only, or centered, using requests.

    A roff formatter translates horizontal tab characters, also called
    simply "tabs", in the input into movements to the next tab stop.
    These tab stops are by default located every half inch measured from
    the current position on the input line.  With them, simple tables
    can be made.  However, this method can be deceptive, as the
    appearance (and width) of the text in an editor and the results from
    the formatter can vary greatly, particularly when proportional
    typefaces are used.  A tab character does not cause a break and
    therefore does not interrupt filling.  The formatter provides
    facilities for sophisticated table composition; there are many
    details to track when using the "tab" and "field" low-level
    features, so most users turn to the tbl(1) preprocessor for table
    construction.

  Requests and macros
    A request is an instruction to the formatter that occurs after a
    control character, which is recognized at the beginning of an input
    line.  The regular control character is a dot ".".  Its counterpart,
    the no-break control character, a neutral apostrophe "'", suppresses
    the break implied by some requests.  These characters were chosen
    because it is uncommon for lines of text in natural languages to
    begin with them.  If you require a formatted period or apostrophe
    (closing single quotation mark) where the formatter is expecting a
    control character, prefix the dot or neutral apostrophe with the
    dummy character escape sequence, "\&".

    An input line beginning with a control character is called a control
    line.  Every line of input that is not a control line is a text
    line.

    Requests often take arguments, words (separated from the request
    name and each other by spaces) that specify details of the action
    the formatter is expected to perform.  If a request is meaningless
    without arguments, it is typically ignored.  Of key importance are
    the requests that define macros.  Macros are invoked like requests,
    enabling the request repertoire to be extended or overridden.

    A macro can be thought of as an abbreviation you can define for a
    collection of control and text lines.  When the macro is called by
    giving its name after a control character, it is replaced with what
    it stands for.  The process of textual replacement is known as
    interpolation.  Interpolations are handled as soon as they are
    recognized, and once performed, a roff formatter scans the
    replacement for further requests, macro calls, and escape sequences.

    In roff systems, the "de" request defines a macro.

  Page geometry
    roff systems format text under certain assumptions about the size of
    the output medium, or page.  For the formatter to correctly break a
    line it is filling, it must know the line length, which it derives
    from the page width.  For it to decide whether to write an output
    line to the current page or wait until the next one, it must know
    the page length.  A device's resolution converts practical units
    like inches or centimeters to basic units, a convenient length
    measure for the output device or file format.  The formatter and
    output driver use basic units to reckon page measurements.  The
    device description file defines its resolution and page dimensions
    (see groff_font(5)).

    A page is a two-dimensional structure upon which a roff system
    imposes a rectangular coordinate system with its upper left corner
    as the origin.  Coordinate values are in basic units and increase
    down and to the right.  Useful ones are therefore always positive
    and within numeric ranges corresponding to the page boundaries.

    While the formatter (and, later, output driver) is processing a
    page, it keeps track of its drawing position, which is the location
    at which the next glyph will be written, from which the next motion
    will be measured, or where a geometric primitive will commence
    rendering.  Notionally, glyphs are drawn from the text baseline
    upward and to the right.  (groff does not yet support right-to-left
    scripts.)  The text baseline is a (usually invisible) line upon
    which the glyphs of a typeface are aligned.  A glyph therefore
    "starts" at its bottom-left corner.  If drawn at the origin, a
    typical letter glyph would lie partially or wholly off the page,
    depending on whether, like "g", it features a descender below the
    baseline.

    Such a situation is nearly always undesirable.  It is furthermore
    conventional not to write or draw at the extreme edges of the page.
    Therefore the initial drawing position of a roff formatter is not at
    the origin, but below and to the right of it.  This rightward shift
    from the left edge is known as the page offset.  (groff's terminal
    output devices have page offsets of zero.)  The downward shift
    leaves room for a text output line.

    Text is arranged on a one-dimensional lattice of text baselines from
    the top to the bottom of the page.  Vertical spacing is the distance
    between adjacent text baselines.  Typographic tradition sets this
    quantity to 120% of the type size.  The initial vertical drawing
    position is one unit of vertical spacing below the page top.
    Typographers term this unit a vee.

    Vertical spacing has an impact on page-breaking decisions.
    Generally, when a break occurs, the formatter moves the drawing
    position to the next text baseline automatically.  If the formatter
    were already writing to the last line that would fit on the page,
    advancing by one vee would place the next text baseline off the
    page.  Rather than let that happen, roff formatters instruct the
    output driver to eject the page, start a new one, and again set the
    drawing position to one vee below the page top; this is a page
    break.

    When the last line of input text corresponds to the last output line
    that fits on the page, the break caused by the end of input will
    also break the page, producing a useless blank one.  Macro packages
    keep users from having to confront this difficulty by setting
    "traps"; moreover, all but the simplest page layouts tend to have
    headers and footers, or at least bear vertical margins larger than
    one vee.

  Other language elements
    Escape sequences start with the escape character, a backslash \, and
    are followed by at least one additional character.  They can appear
    anywhere in the input.

    With requests, the escape and control characters can be changed;
    further, escape sequence recognition can be turned off and back on.

    Strings store character sequences.  In groff, they can be
    parameterized as macros can.

    Registers store numerical values, including measurements.  The
    latter are generally in basic units; scaling units can be appended
    to numeric expressions to clarify their meaning when stored or
    interpolated.  Some read-only predefined registers interpolate text.

    Fonts are identified either by a name or by a mounting position (a
    non-negative number).  Four styles are available on all devices.  R
    is "roman": normal, upright text.  B is bold, an upright typeface
    with a heavier weight.  I is italic, a face that is oblique on
    typesetter output devices and usually underlined instead on terminal
    devices.  BI is bold-italic, combining both of the foregoing style
    variations.  Typesetting devices group these four styles into
    families of text fonts; they also typically offer one or more
    special fonts that provide unstyled glyphs; see groff_char(7).

    groff supports named colors for glyph rendering and drawing of
    geometric primitives.  Stroke and fill colors are distinct; the
    stroke color is used for glyphs.

    Glyphs are visual representation forms of characters.  In groff, the
    distinction between those two elements is not always obvious (and a
    full discussion is beyond our scope).  In brief, "A" is a character
    when we consider it in the abstract: to make it a glyph, we must
    select a typeface with which to render it, and determine its type
    size and color.  The formatting process turns input characters into
    output glyphs.  A few characters commonly seen on keyboards are
    treated specially by the roff language and may not look correct in
    output if used unthinkingly; they are the (double) quotation mark
    ("), the neutral apostrophe ('), the minus sign (-), the backslash
    (\), the caret or circumflex accent (^), the grave accent (`), and
    the tilde (~).  All of these and more can be produced with special
    character escape sequences; see groff_char(7).

    groff offers streams, identifiers for writable files, but for
    security reasons this feature is disabled by default.

    A further few language elements arise as page layouts become more
    sophisticated and demanding.  Environments collect formatting
    parameters like line length and typeface.  A diversion stores
    formatted output for later use.  A trap is a condition on the input
    or output, tested automatically by the formatter, that is associated
    with a macro, calling it when that condition is fulfilled.

    Footnote support often exercises all three of the foregoing
    features.  A simple implementation might work as follows.  A pair of
    macros is defined: one starts a footnote and the other ends it.  The
    author calls the first macro where a footnote marker is desired.
    The macro establishes a diversion so that the footnote text is
    collected at the place in the body text where its corresponding
    marker appears.  An environment is created for the footnote so that
    it is set at a smaller typeface.  The footnote text is formatted in
    the diversion using that environment, but it does not yet appear in
    the output.  The document author calls the footnote end macro, which
    returns to the previous environment and ends the diversion.  Later,
    after much more body text in the document, a trap, set a small
    distance above the page bottom, is sprung.  The macro called by the
    trap draws a line across the page and emits the stored diversion.
    Thus, the footnote is rendered.

History
[...]

The "History" section is only about 1/3rd of the page by line count. 
Significant, but not even a majority of the content.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?64279>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]