[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Draft v2: London and Reiser's UNIX/32V paper, reconstructed
From: |
G. Branden Robinson |
Subject: |
Re: Draft v2: London and Reiser's UNIX/32V paper, reconstructed |
Date: |
Wed, 12 Jun 2024 15:44:59 -0500 |
Hi Oliver,
At 2024-06-12T22:12:34+0200, Oliver Corff via wrote:
> Absolutely reasonable.
[...]
> Do you know whether your scan is scaled 1:1? In this case only, direct
> measures could be taken from the image, assuming that the paper size
> is letter.
I attached the scan I used to my earlier email. I think a quick glance
is enough to reveal that such hopes are much too high for this document.
The pages aren't even scanned _straight_. This made it tedious to
repair the OCR generated from it. The OCR engine appears to have
imposed resolutely horizontal baselines on the page images and quantized
the word position to them, which scrambled the word order with high
reliability. I've attached the OCR text output; you may find it
amusing.
(There _was_ an "unskew" option in the OCR UI. I clicked it. It seems
to have done little or nothing.)
The pages also appear not to be cropped consistently, which annoys me
for another reason: that makes it impossible for me judge the sizes of
the page margins used by the formatter/macro package.
With these problems I think the geometry of the page scans is pretty far
from a rectangle with a consistent aspect ratio.
I think it's more likely than not that the paper format was U.S. letter.
If this had been a journal article, that bet would be off.
> Which, in return, makes visual identity an ideal tool to check for
> undiscovered glitches which have the potential to cause inconsistent
> line breaks.
>
> When working in a negotiation team quite a few years ago, we would
> take pages of claimed-to-be identical copies or transcripts of text,
> superimpose them and check them against the light of a strong lamp for
> gray areas --- mismatches in print. This helped us discover a good few
> issues like altered digits etc., and the method was *much* faster than
> reading side by side.
Right. I think I've mentioned this on the list before, but this is the
principle behind the blink comparator, the tool that helped Clyde
Tombaugh discover the dwarf planet Pluto.
And I do in fact use that technique in groff development (including
today while comparing nroff mode output for various memorandum types
between DWB 3.3 and groff). Since I run terminals maximized to the
screen geometry anyway, such comparison is always a keyboard chord away.
I simply don't have any hope of applying the technique here. Not unless
a much superior scan of an authenticated original document turns up.
Regards,
Branden
32vscan_ocr.txt
Description: Text document
signature.asc
Description: PGP signature