.ig make : ./busgrap.pl NewGropdf.trf | groff -Tpdf -ms -dpaper=a4 -P-e > NewGropdf.pdf .. . .nr PO 2c .nr HM 2c .nr FM 1c .nr LL 17c .ds CH Gropdf Enhancements .RP .TL Gropdf Enhancements .AU Deri James .AB I have made a number of changes, mainly to gropdf, and they need to be tested before release to the wild. These changes can be tested by checking out the git branch "deri-gropdf-ng" before building. .AE . .defcolor ink rgb #2a586f .defcolor hd rgb #568da3 .fam H .de CODE .fl \M[cornsilk]\c .nr rgt \\n[.o]u+\\n[.l]u .pdfbackground boxfill \\n[.o]z 0 \\n[rgt]z 29.7c \M[]\fB\c .in 5p .ll -10p .fam U-C .ps 9.7 .vs 10 .sp .5 .gcolor ink .nf .. .de ENDCODE .fi .fam .ft P .ps .vs .sp -.3 .pdfbackground off .ll +10p .in -5p .sp 5p .gcolor .fl .. .de HD1 .sp .8 .ps +2 .gcolor hd .pdfbookmark 1 \\$* .SH 1 \fB\\$*\fP .LP .gcolor .ps -2 .sp .2 .. .de HD2 .sp .2 .ps +1 .gcolor hd .pdfbookmark 2 \\$* .SH 2 \\$* .LP .gcolor .ps -1 .sp .2 .. .HD1 The groff PDF output driver .LP Gropdf has been enhanced in a number of ways:- .LP .IP \[bu] 3n Fonts can be subsetted to only include glyphs required to render the document, this can substantially reduce the size of the completed pdf file, if very large fonts have been used. .IP \[bu] 3n Gropdf now produces pdfs which conform to the PDF 1.7 standard (ISO32000). This also reduces the size of the pdf file. .IP \[bu] 3n A new macro, ".pdfpagenumbering" allows the page numbers in the overview panel to be controlled. It is common for a document to start with roman numbering and then reset to decimal 1 for the body of the document, this can now be specified for the overview panel as well. .IP \[bu] 3n When multiple man pages are bundled into a collection (such as the groff-man-pages.pdf and the LinuxManPages.pdf) intra page links are clickable. .IP \[bu] 3n Unicode support for bookmarks, has been implemented. If bookmarks include unicode characters they are stored as UTF-16 strings and will be displayed correctly in the overview panel. .IP \[bu] 3n Environment variable GROPDF_OPTIONS. .LP .HD2 Font Subsetting Deconstructing and the subsetting very large fonts is slow in a scripted language so there is an option to use a routine written in C rather than perl. To use the fast version you can install the perl module Inline::C and gropdf will use the faster version. Using fonts with more than 1000 glyphs it is recommended to install Inline::C. The module is widely available packaged usually as perl-Inline-C.rpm or libinline-c-perl.deb. .LP If you are using a large font without having installed Inline::C, you will see this warning:- .CODE \&./gropdf:groff.7: warning: Font 'SauceHanSansJP-R (GR)' has 18482 glyphs You would see a noticeable speedup if you install the perl module Inline::C .ENDCODE This can be ignored if you are happy to use the pure perl version .LP The first test document is 17 pages in Japanese which uses 3 fonts, each of which has over 18,000 glyphs, and the second is the groff-man-pages.pdf document of 375 pages using the 35 standard fonts all of which are embedded. The relative timings (in seconds) are:- .LP .sp .5 .BGS graph Frame:9.5c 4.3c PS:10 Caption:Inline::C v. Pure Perl Origin:0 .1c 6c 4c KeyBox:y BoxFrame:6.5c 1c 1.5c 2c HGrid:y VGrid:y BGCol:grey90 WallCol:grey95 Floor:0 Steps:7 Just:centre Flow:n SER:X Japanese GroffMan Ser:1 BAR:Inline::C 2.1 8.46 Ser:2 BAR:Pure Perl 9.6 9.79 Ser:3 BAR:Gropdf 1.22.4 . 8.67 Ser:4 BAR:Grops+ps2pdf 32.1 . PDFBookMark:3 .BGE .LP Gropdf 1.22.4 is unable to produce the Japanese document since it could not cope with more than 255 glyphs. Grops+ps2pdf is the total time for creating the pdf (no overview pane - UTF-16 not supported). .LP Inline::C creates a work directory in your home directory called "_Inline", if you would prefer it used somewhere else then set environment variable PERL_INLINE_DIRECTORY to a writeable directory. .LP .HD2 More compact pdf - PDF1.7 The major advantage of using font subsetting is a reduction in the overall size of the generated pdf file, this shows sizes for the 375 page groff-man-pages.pdf. .LP .sp .5 .BGS graph Frame:14c 4.3c PS:10 VS:15 Caption:File Sizes Origin:5c 0c 8.5c 4c KeyBox:y BoxFrame:1c .4c 8c 4c BoxHeads:Type Size (Mb) BoxLabels:$text $value BoxTabs:0cL 3.2cR Horiz:y Sync:y HGrid:y VGrid:y BGCol:grey90 WallCol:grey95 Floor:0 Just:centre Flow:n SER:X GroffMan \N'40'Mb) Ser:1 1.5 1.5 Subset 1.5 Ser:2 3.9 3.9 No Subset 3.9 Ser:3 1.7 1.7 PDF1.4 1.7 Ser:4 1.1 1.1 No embed 1.1 Ser:5 5.8 5.8 Gropdf 1.22.4 5.8 PDFBookMark:3 .BGE .LP Starting from the top, bars 2 and 5 did not use font subsetting. The difference between bars 1 and 3 is the more compact format when using PDF 1.7. Bar 4 produces the smallest file but no fonts are embedded at all so the pdf viewer will choose whichever font it considers is best, which can produce strange results. The third bar is using this gropdf but setting a flag to request it output in PDF1.4 format. .LP The default for gropdf is to subset, use PDF 1.7 format, and not embed the standard fonts. It is recommended to always embed all fonts with the -P-e flag to ensure faithful reproduction of the document. .LP .HD2 New macro .pdfpagenumbering .B "\[rs]X\[aq]pdf: pagenumbering\~" .I "type prefix start" \[aq] .RS This is used to control the page numbering shown in the pdf reader\[aq]s outline pane which contains your bookmarks. Normally the page numbers shown against the bookmark is the physical page number in the file, But this may not match the different page number styles within the document. .LP In a single document there may be a cover sheet (which has no page number), a TOC (which uses lower case roman numbers), and the main body of the document (which has decimal page numbers). Use this command somewhere on the page where the numbering system changes, once changed the numbers will automatically increment until the number system changes again, so don\[aq]t call for every page, just when you want to change the numbering. .LP The parameters are:- .IP \fItype\fP 7n This specifies the type of numbering to use for this page onward. It should be one of .B "\[lq]Decimal | Roman | roman | Alpha | alpha\[rq]" . Only the initial letter is relevant. The alphabetic number systems use A-Z (then AA-AZ ... ZA-ZZ). The .I type may also be .B \[lq] . \[rq] which means no numbering system is chosen, but you may still provide a .I prefix to have a custom name (such as "Cover"); .IP \fIprefix\fP Provides a string to insert before the number. If the document has an Appendix with page numbers in the form A-\c .I n , the .I prefix would be set to \[lq]A-\[rq] and the .I type would be .B Decimal . . . .IP \fIstart\fP Gives the start number for the incrementing page numbers in the outline pane. If no value is given for .I start it will default to 1, which is usually correct. .LP The convenience macro for this command is \[lq] .B ".pdfpagenumbering " .I "type prefix start" \[rq] using '.' for preceding missing values, or just \[lq] .B .pdfpagenumbering \[rq] on its own to have no page numbers shown in the outline pane. .RE .LP The pdf viewer will be aware of these "virtual" page numbers and should show the virtual page number as well as the physical page number. Also, if you enter a page number to jump to, it will use the virtual page number, so entering "i" may jump to the first page of the TOC if you have used roman numbering for the TOC. .LP .HD2 Man page collections .LP If you bundle multiple man pages into a single groff run, the output will be a single file containing the man pages and any intra-page links to other pages in the collection will be clickable. .LP .HD2 Unicode bookmarks .LP Preconv (which is called by including -k on the groff command) converts UTF-8 characters to \e[uXXXX] symbols. When these are passed to .pdfbookmark or \&.pdfinfo they are converted to UTF-16 characters. .LP .HD2 GROPDF_OPTIONS .LP A new environment variable GROPDF_OPTIONS is read before parsing command line options. So you could instruct gropdf to only produce to the PDF1.4 format by including export GROPDF_OPTIONS="--pdfver=1.4" in your shell login script. Do not include -P as part of the flags. .LP .HD1 Other Changes .LP .pdfbookmark 2 GMPfront.t New file "GMPfront.t", is the front page of the groff_man_pages.pdf, also control the overview page numbering so the front page has no page number and the first man page is on page 1. It is only used in the build of groff_man_pages.pdf so should not be installed. .LP .pdfbookmark 2 doc.am Changes to "doc/doc.am" to utilise "pdfmom --roff" rather than plain groff to produce groff_man_pages.pdf since this takes care of the forward references in the document. .LP .pdfbookmark 2 gropdf.1.man In "src/devices/gropdf/gropdf.1.man", document that gropdf is no longer restricted by the number of glyphs in a single font, and specify the new feature which can control the appearance of page numbers in the overview panel. .LP .pdfbookmark 2 pdfmom.pl Changes to "src/devices/gropdf/pdfmom.pl" have generalised its use. Previously it was only useful for generating documents which used the mom macros, now, if you pass the --roff flag it can be used with any macro set. To satisfy forward references in an ms file you could use "pdfmom --roff -ms" to create a pdf. Further, if you set up a symbolic link to pdfmom using the template "pdf" (such as "pdfms") when you use this command you don't need to include "-ms" on the command line. Since this command only produces pdfs there is no need to include -Tpdf or \%-mpdfmark. .LP .pdfbookmark 2 afmtodit.pl An extra column has been added to the output of "src/utils/afmtodit/afmtodit.pl" which is the 4 digit hex code of the character, as discovered from the AGL_to_unicode mapping table. This information is needed to support the inclusion of UTF-16 characters in the overview panel. For certain special characters, such as \e(em it is not possible to derive the unicode number for the character from the current information. The extra column has no impact for grops, it ignores the new column. It is also useful when the groff name for the character is a composite, such as the glyph Hcircumflex which groff knows as \e[u0048_0302] but the actual unicode number is 0x0124. .LP .pdfbookmark 2 an.tmac Changes to the .MR macro in "tmac/an.tmac" allow them to be treated as clickable links, only if they point to a man page which is part of the same collection. Single man pages never have links in them. This is what enables the intra-page links in groff_man_pages.pdf. .LP .pdfbookmark 2 anmark.tmac The file "tmac/anmark.tmac" is another support file for creating the groff_man_pages.pdf, the same as GMPfront.t above. It is just used in the build and should not be installed. .LP .pdfbookmark 2 papersize.tmac If the flag -dpaper= is given to groff the selected papersize is also passed to gropdf if -Tpdf is given. If you also pass the -P-p flag to groff it will take predence. This allows a different media size to be used for the pdf than groff uses for typesetting, should this be what you require. .LP .pdfbookmark 2 pdf.tmac A new macro .pdfpagenumbering has been added to "tmac/pdf.tmac" which controls the page numbers shown in the overview panel. The text used in bookmarks are no longer "cleaned" by using .asciify, the raw text is now passed to gropdf where it is now cleaned and may be converted to UTF-16. .LP .pdfbookmark 2 pseudo glyphs Preparation for replacing the pseudo slanted lowercase greek characters (used in equations) with real glyphs, i.e. provide a Symbol-Slanted pfb font for gropdf. At the moment it requires the following intervention to make it work with test-groff:- .LP Copy files SS and StandardSymSL.pfb from the source to the build directory. .LP Add the following line to the download file:- .LP Symbol-Slanted ./StandardSymSL.pfb. .LP This should make the output of equations using -T pdf exactly match the postscript equivalent. It allows the subtle italic corrections to be applied. It requires integration into the groff build system, see the relavent git log entry. .bp .HD1 Experimental gropdf flag --opt= .LP During development I introduced an option flag, which turn on/off some pieces of the new code. The flag can be set by passing -P-opt=n on the command line. Where "n" is a number whose bits control the following:- .LP 1 = SUBSET otherwise include entire font as now. (If someone has a document using a font which produces a bad pdf which can't be viewed, repeating with this bit unset will prove if the problem is with the subsetting, and also if they send this pdf, with a source file, I will be able to extract the actual fonts and use them for diagnosing/fixing the problem). .LP 2 = USESPACE this uses a space character in text to separate words, otherwise use a horizontal motion. The difference is this:- .CODE [(This is groff)] Tj # using space. .sp .5 or .sp .5 [(This) 250.000 (is) 250.000 (groff)] Tj # using horizontal motion. .ENDCODE The first method is preferable since it is more compact, but there are some fonts where it can't be used. Usually this is because there is no space character in the font, or it is called something else (I've seen u0020 and u00a0). If gropdf uses the compact method in this situation it is likely spaces will be shown as the .notdef glyph. Hopefully gropdf detects which fonts can't use the compact format and automatically adjusts, but this bit can be unset to force gropdf to use horizontal motion always. It can also be used to check whether the width of the space glyph as used by the pdf viewer is the same size as groff uses. .LP 4 = COMPRESS when unset the pdf instructions (such as above), are not compressed so can be viewed in a text editor. .LP 8 = NOFILE if set then no fonts are embedded in the pdf, even if the download file says they must be! Not particularly useful but does produce the smallest files. .LP The default value is 7. (SUBSET | USESPACE | COMPRESS) .LP I don't intend to release with the --opt=flag, it may be better if we can either drop some or think of separate flags, or add them to the -d flag, since some of them are definitely for debugging! .LP If, while using the new version gropdf, you notice a problem using a particular font which may manifest itself by the viewer showing an error message (mupdf gives helpful diagnostics), or the document is using the wrong font, please run the document with the flag -P--opt=6, to check if the problem is to do with font subsetting or not. If this "fixes" the issue I would be very grateful for a bug report containing a sample source file plus the two pdfs, with and without -P--opt=6, rather than just the incorrect version. .LP .HD1 Known Issues .LP I am aware of the following issues. .LP .HD2 pdfmark destinations .LP The named destinations, i.e. .HEADING n NAMED name "text" must not contain special characters, e.g. unicode converted by preconv. The "text" may contain unicode and any special characters. If you see warnings of the type " warning: special character 'u0413' not defined" this may not affect the pdf at all, but you may be able to suppress the message by including an -f flag on the command line specifying a font family which includes the missing unicode glyphs. .LP If you do use unicode for the destination name, such as:- .CODE \&.HEADING 1 NAMED Гуляйпольщина "Гуляйпольщина" .ENDCODE It may work, but you will see this error:- .CODE toff::32: error: bad string definition .ENDCODE And there is a probability it may not be quite right! .bp .HD2 expandos .LP The use of "expandos" (as explained in the document "Producing PDFs with groff and mom" is not supported if you are using unicode for the text, you have to insert the text yourself. So code such as:- .CODE \&.HEADING 1 NAMED Russian "Гуляйпольщина" \&... \&.PDF_LINK Russian PREFIX ( SUFFIX ) "see: +" \& ^ \& | \& expando .ENDCODE Will cause this error:- .CODE troff: ../src/roff/troff/input.cpp:523: static int input_stack::finish_get(): \& Assertion `level == 0' failed. groff: error: troff: Aborted (core dumped) .ENDCODE Surprisingly not at the point the expando is used, but at the end of the document. To avoid the problem, until it is fixed, use this instead:- .CODE \&.PDF_LINK Russian PREFIX ( SUFFIX ) "see: \e[dq]8. Гуляйпольщина\e[dq]" .ENDCODE Where the text in the link is a copy of the text used when the destination "Russian" was created. If the text does not contain unicode characters expandos work normally.