gnumed-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnumed-devel] doc_med , txt blob


From: Karsten Hilbert
Subject: Re: [Gnumed-devel] doc_med , txt blob
Date: Fri, 15 Sep 2006 16:00:24 +0200
User-agent: Mutt/1.5.13 (2006-08-11)

On Thu, Sep 14, 2006 at 08:22:33PM +0800, Syan Tan wrote:

> how to store a doc_obj that is just a text file ?
My knee-jerk reaction would be, why, of course, dump it
into doc_obj.data.

While this would certainly work and not lose any data I'll
wager a paraphrasing of the question:

How to store a text blob as a document and not lose
*information* ?

Bytea will not lose data but it will lose information unless
the data is self-descriptive to some degree. PDF is
self-descriptive, "text" is not. The latter needs to be
accompanied by at least one bit of metadata to make it
safely transferrable by purely technical means: the
encoding.

So, there's a bunch of solutions:

- Convert the text into UTFx, create a unicode file with the
  proper start of file marker and store that into
  doc_obj.data. Probably the cleanest and recommendable
  solution.

- Store the text in doc_obj.data and keep the encoding
  information elsewhere such as: doc_desc, comments, etc.

- Store the text in doc_desc where it is properly encoded
  and keep a special value in doc_obj.data pointing to doc_desc.

- Store an enriched version (custom format) of the text in
  doc_obj.data which contains the encoding in a
  computationally extractable way (such as XML).

I'd suggest either the first or the last approach. The first
is preferrable, I suppose.

Karsten
-- 
GPG key ID E4071346 @ wwwkeys.pgp.net
E167 67FD A291 2BEA 73BD  4537 78B9 A9F9 E407 1346




reply via email to

[Prev in Thread] Current Thread [Next in Thread]