[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Freecats-Dev] early proof of concept implementations?
From: |
Marc Prior |
Subject: |
Re: [Freecats-Dev] early proof of concept implementations? |
Date: |
Tue, 1 Apr 2003 20:12:46 +0200 |
Firstly, thank you to all members of Free CATS for your confidence in the
future of the OmegaT project.
Keith will no doubt begin making contributions of his own to the list in due
course, but at the moment both he and I are under time pressure owing to
other activities. Please be patient!
In response to Simos' post and more particularly to Henri's reply:
> A few weeks ago and after advice from Thierry Sourbier, I suggested a
> tagged bilingual document file format derived from the .TMX format and that
> seemed much in line with what Yves Champollion is doing with Wordfast.
An industry-standard tagged bilingual file format would be a major
breakthrough. I am currently in the position of arguing vehemently that TMX,
and not Trados' native translation memory format, should be regarded as the
industry-standard translation memory format. Trados though, with its
"uncleaned file" format, has a format for which there is no industry-standard
equivalent, and so the Trados format can effectively claim this status by
default. :-(
However, I find it difficult to conceive of an industry-standard tagged
bilingual file format in the absence of an industry-standard tagged
(monolingual) word processing file format.
If, for the sake of argument, OpenOffice.org's file format (which is at least
open, documented, extensible, and has been submitted to the W3C for formal
recognition as a standard) is accepted as the standard for a *monolingual*
word processing file format, the step to a tagged bilingual file format is
trivial. It may well be possible to add such functionality with no alteration
to the OOo code, purely by modification of the XML mechanisms (DTD etc.).
The difficult part is the first step. :-)
> Also note that Keith developed his own bilingual file format, but I don't
> remember he provided its specification
This is not correct. OmegaT, as a standalone application, does not need a
bilingual file format, and does not have one, though it would certainly
enhance OmegaT's functionality if it were to have one. The question is how
that functionality can be added.
OmegaT's *translation memory format* is TMX1. Prior to version 1.0.0, OmegaT
had a dedicated translation memory format. (That format is documented in the
0.9.7 manual, incidentally.) This is probably the reason for the confusion.
> I believe everybody will agree that once we have published such a bilingual
> document file format, many developers will be able to add conversion
> scripts in order to allow translators to localize various file formats.
Why wait for the appearance of a bilingual file format? There are lots of
conversion filters which would be advantageous in their own right. .po to
TMX, for example, and vice-versa, would be beneficial to OmegaT - I think
that benefit is independent of a bilingual file format. Even TMX2 to TMX1
would be an advantage. It may well be that such filters already exist.
On the subject of conversion filters, some initial work was done within the
Semerkent project, see:
http://sourceforge.net/projects/semerkent/
- though I agree with Simos that scripts are a much more practical solution
as I believe learning Perl or tcl/tk in order to manipulate plain text
formats such as XML is within the realms of many translators' abilities.
Learning C for this purpose is a different proposition.
Marc