gnumed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract


From: Karsten Hilbert
Subject: Re: [Gnumed-devel] Re: Scanning Xsane, gscan2pdf, Simple Scan, Tesseract OCR
Date: Tue, 26 Jan 2010 17:00:09 +0100
User-agent: Mutt/1.5.20 (2009-06-14)

On Tue, Jan 26, 2010 at 07:48:40AM -0800, Jim Busser wrote:

> I am only wondering what constrains or otherwise defines
> the ability of GNUmed (postgres) to "look inside" a part no
> matter its type.

GNUmed can easily look inside each and every document part
regardless of the type. It cannot make sense of it, however.

> Is it as simple as GNUmed looking for ASCII or UTF-8 text strings?

The seemingly obvious entity "*text* strings" is entirely
undefined.

> If in this case the PDF has some combination of
> - images + PDF-formatting-encumbered-non-readable text AND
> - a layer of human readable text
>       (if the latter is, by luck, a layer in a "searchable PDF")
> 
> 1) should GNUmed then be able to find this document part?

Given sufficient amounts of programming resources, sure.

> 2) will this be incredibly slow

This entirely depends on how access to the "text strings" is
implemented.

>, or does GNUmed (postgres) index all of the text that is readable "in" the 
>parts?

Surely not. PostgreSQL understands even *less* about the
data than GNUmed does. GNUmed itself does not index *any*
text inside its database.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346




reply via email to

[Prev in Thread] Current Thread [Next in Thread]