groff
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Groff] pdfmom grep (was parallel text processing)


From: Peter Schaffter
Subject: Re: [Groff] pdfmom grep (was parallel text processing)
Date: Sat, 9 Sep 2017 17:47:46 -0400
User-agent: Mutt/1.5.24 (2015-08-30)

Ralph --

On Sat, Sep 09, 2017, Ralph Corderoy wrote:
> I think you're smuggling a -k or -K through to the first groff that
> pdfmom runs.  Here's its -Tpdf pipeline again.
> 
>     groff -Tpdf -dPDF.EXPORT=1 -mom -z $cmdstring 2>&1 |
>     grep '^.ds' |
>     groff -Tpdf -mom - $preconv $cmdstring

The pipeline in the current pdfmom is actually

  groff -Tpdf -dLABEL.REFS=1 -mom -z $preconv $cmdstring 2>&1 | 
  grep '^\\. *ds' |
  groff -Tpdf -dPDF.EXPORT=1 -dLABEL.REFS=1 -mom -z - $preconv $cmdstring 2>&1 |
  grep '^\\. *ds' |
  groff -Tpdf -mom $preconv - $cmdstring

> The problem is grep seeing invalid UTF-8 and thus deciding stdin is
> binary.  A preconv(1) would turn your UTF-8 troff source into
> ISO-8859-1, and any non-ASCII characters in that would probably be
> invalid UTF-8.  But pdfmom has tried to spot -k or -K in its arguments
> and arrange for them to be moved from $cmdstring to $preconv and so used
> only by the second groff.  If it's simplistic argv[] parsing has failed,
> because you've -xyzk for example, then your -k remains in $cmdstring and
> affects the first groff.

I wish that were the case, but consider this:

***pdfmom pipeline entered literally at the command line
  groff -Tpdf -dLABEL.REFS=1 -mom -z -k camus.mom 2>&1 | \
  grep '^\.  *ds' | \
  groff -Tpdf -dPDF.EXPORT=1 -dLABEL.REFS=1 -mom -z -k - camus.mom 2>&1 | \
  grep '^\. *ds' | \
  groff -Tpdf -mom -k - camus.mom > camus.pdf
- grep does not report a binary file hit

***pdfmom itself at the command line
  pdfmom -k camus.mom > camus.pdf
- grep reports a binary file hit

strace on 'pdfmom -k camus.mom > camus. pdf' produces

  3225  execve("/usr/local/bin/pdfmom", ["pdfmom", "-k", "camus.mom"], [/* 86 
vars */]) = 0
  3226  execve("/bin/sh", ["sh", "-c", "groff -Tpdf -dLABEL.REFS=1 -mom "...], 
[/* 86 vars */]) = 0
  3227  execve("/usr/local/bin/groff", ["groff", "-Tpdf", "-dLABEL.REFS=1", 
"-mom", "-z", "-k", "camus.mom"], [/* 86 vars */]) = 0
  3228  execve("/bin/grep", ["grep", "^\\. *ds"], [/* 86 vars */]) = 0
  3229  execve("/usr/local/bin/groff", ["groff", "-Tpdf", "-dPDF.EXPORT=1", 
"-dLABEL.REFS=1", "-mom", "-z", "-", "-k", "camus.mom"], [/* 86 vars */] 
<unfinished ...>
  3230  execve("/bin/grep", ["grep", "^\\. *ds"], [/* 86 vars */] <unfinished 
...>
  3229  <... execve resumed> )            = 0
  3230  <... execve resumed> )            = 0
  3231  execve("/usr/local/bin/groff", ["groff", "-Tpdf", "-mom", "-k", "-", 
"camus.mom"], [/* 86 vars */]) = 0
  3232  execve("/usr/local/bin/preconv", ["preconv", "-", "camus.mom"], [/* 87 
vars */]) = 0
  3233  execve("/usr/local/bin/troff", ["troff", "-dPDF.EXPORT=1", 
"-dLABEL.REFS=1", "-mom", "-z", "-Tpdf"], [/* 87 vars */]) = 0
  3234  execve("/usr/local/bin/preconv", ["preconv", "camus.mom"], [/* 87 vars 
*/]) = 0
  3235  execve("/usr/local/bin/troff", ["troff", "-dLABEL.REFS=1", "-mom", 
"-z", "-Tpdf"], [/* 87 vars */]) = 0
  3234  +++ exited with 0 +++
  3227  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3234, 
si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
  3237  execve("/usr/local/bin/troff", ["troff", "-mom", "-Tpdf"], [/* 87 vars 
*/]) = 0
  3236  execve("/usr/local/bin/preconv", ["preconv", "-", "camus.mom"], [/* 87 
vars */] <unfinished ...>
  3238  execve("/usr/local/bin/gropdf", ["gropdf"], [/* 87 vars */]) = 0
  3236  <... execve resumed> )            = 0
  3235  +++ exited with 0 +++
  3227  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3235, 
si_uid=1000, si_status=0, si_utime=8, si_stime=0} ---
  3227  +++ exited with 0 +++
  3226  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3227, 
si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
  3228  +++ exited with 1 +++
  3226  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3228, 
si_uid=1000, si_status=1, si_utime=0, si_stime=0} ---
  3232  +++ exited with 0 +++
  3229  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3232, 
si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
  3230  +++ exited with 0 +++
  3226  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3230, 
si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
  3236  +++ exited with 0 +++
  3231  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3236, 
si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
  3233  --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3233, 
si_uid=1000} ---
  3233  +++ killed by SIGPIPE +++
  3229  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=3233, 
si_uid=1000, si_status=SIGPIPE, si_utime=7, si_stime=0} ---
  3229  +++ exited with 0 +++
  3226  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3229, 
si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
  3237  +++ exited with 0 +++
  3231  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3237, 
si_uid=1000, si_status=0, si_utime=8, si_stime=0} ---
  3238  +++ exited with 0 +++
  3231  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3238, 
si_uid=1000, si_status=0, si_utime=11, si_stime=0} ---
  3231  +++ exited with 0 +++
  3226  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3231, 
si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
  3226  +++ exited with 0 +++
  3225  --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3226, 
si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
  3225  +++ exited with 0 +++

Unless my eyesight is worse than I think (very possible), it looks
as if pdfmom is processing its pipeline identically to the long
version at the command line, where the reinvocations of preconv(1)
(via the repetitions of the -k flag) aren't doing any harm.  Yet the
binary file match shows up when the file is processed with pdfmom.

-- 
Peter Schaffter
http://www.schaffter.ca



reply via email to

[Prev in Thread] Current Thread [Next in Thread]