[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
PostScript-to-ASCII
From: |
Jonathan Monsarrat |
Subject: |
PostScript-to-ASCII |
Date: |
Wed, 10 Nov 93 22:50:46 -0500 |
HI!
There aren't any good PostScript to ASCII converters. I'd know, I maintain
the PostScript FAQ, which sez:
3.9 How can I convert PostScript to ASCII?
In general, when you say ``I want to convert PostScript to ASCII''
what you really mean is ``I want to convert MacWrite (which makes
PostScript output) to ASCII'' or ``I want to convert somebody's TeX
document (which I have in PostScript) to ASCII''.
Unfortunately, programs like these (if they're smart) do a lot of
fancy stuff like kerning, which means that where they would
normally execute the postscript command for
``print water fountain''
instead they execute the postscript command for
``print wat'' (move a little to get the spacing *just* right)
``print er'' (move a little to get the spacing *just* right)
``print foun'' (move a little to get the spacing *just* right)
``print tain'' (move a little to get the spacing *just* right)
So if I write a program to look through a PostScript file for
strings, like ps2ascii.pl, It can't tell where the words really
end. Here my program would see 4 strings
``wat'' ``er'' ``foun'' ``tain''
And it doesn't see any difference between the spacing between
``found'' and ``tain'' (not a word break) and the spacing between
``er'' and ``foun'' (a real word break).
The problem is that PostScript for text formatting is usually
produced machine generated by a text formatter. A PostScript
generator like dvips might have a special command like ``boop''
that differentiates between a real world break and a fake one. But
every text formatter that generates PostScript has their own name
for the ``boop'' command.
So you really want a ``PostScript to ASCII converter for dvips
output''.
The only general solution I can see would be to redefine the show
operator to print out the currentpoint for every letter being
printed, like gs2asc, and then make up an ASCII page based on this
by sticking ASCII characters where they go in a two-dimensional
array. That would convert PostScript to ASCII ``formatted''.
But even that wouldn't solve the problem, because special bitmap
fonts and and standard fonts like Symbol don't always print a ``P''
when you say the letter ``P''. Sometimes they print the greek Pi
symbol or a chess piece or a ZapfDingBat.
Use ps2a, ps2ascii, ps2txt, ps2ascii.ps or ps2ascii.pl.
If anybody wants these programs, ask me (or see FAQ for more info).
-Jon
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- PostScript-to-ASCII,
Jonathan Monsarrat <=