freecats-dev
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freecats-Dev] early proof of concept implementations?


From: Marc Prior
Subject: Re: [Freecats-Dev] early proof of concept implementations?
Date: Tue, 1 Apr 2003 20:12:46 +0200

Firstly, thank you to all members of Free CATS for your confidence in the 
future of the OmegaT project.

Keith will no doubt begin making contributions of his own to the list in due 
course, but at the moment both he and I are under time pressure owing to 
other activities. Please be patient!

In response to Simos' post and more particularly to Henri's reply:

> A few weeks ago and after advice from Thierry Sourbier, I suggested a
> tagged bilingual document file format derived from the .TMX format and that
> seemed much in line with what Yves Champollion is doing with Wordfast.

An industry-standard tagged bilingual file format would be a major 
breakthrough. I am currently in the position of arguing vehemently that TMX, 
and not Trados' native translation memory format, should be regarded as the 
industry-standard translation memory format. Trados though, with its 
"uncleaned file" format, has a format for which there is no industry-standard 
equivalent, and so the Trados format can effectively claim this status by 
default.  :-(

However, I find it difficult to conceive of an industry-standard tagged 
bilingual file format in the absence of an industry-standard tagged 
(monolingual) word processing file format.

If, for the sake of argument, OpenOffice.org's file format (which is at least 
open, documented, extensible, and has been submitted to the W3C for formal 
recognition as a standard) is accepted as the standard for a *monolingual* 
word processing file format, the step to a tagged bilingual file format is 
trivial. It may well be possible to add such functionality with no alteration 
to the OOo code, purely by modification of the XML mechanisms (DTD etc.).

The difficult part is the first step. :-)

> Also note that Keith developed his own bilingual file format, but I don't
> remember he provided its specification

This is not correct. OmegaT, as a standalone application, does not need a 
bilingual file format, and does not have one, though it would certainly 
enhance OmegaT's functionality if it were to have one. The question is how 
that functionality can be added.

OmegaT's *translation memory format* is TMX1. Prior to version 1.0.0, OmegaT 
had a dedicated translation memory format. (That format is documented in the 
0.9.7 manual, incidentally.) This is probably the reason for the confusion.

> I believe everybody will agree that once we have published such a bilingual
> document file format, many developers will be able to add conversion
> scripts in order to allow translators to localize various file formats. 

Why wait for the appearance of a bilingual file format? There are lots of 
conversion filters which would be advantageous in their own right. .po to 
TMX, for example, and vice-versa, would be beneficial to OmegaT - I think 
that benefit is independent of a bilingual file format. Even TMX2 to TMX1 
would be an advantage. It may well be that such filters already exist.

On the subject of conversion filters, some initial work was done within the 
Semerkent project, see:

http://sourceforge.net/projects/semerkent/

- though I agree with Simos that scripts are a much more practical solution 
as I believe learning Perl or tcl/tk in order to manipulate plain text 
formats such as XML is within the realms of many translators' abilities. 
Learning C for this purpose is a different proposition.

Marc




reply via email to

[Prev in Thread] Current Thread [Next in Thread]