bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: libgettextpo: Ability to work with non-ASCII without a header


From: Dwayne Bailey
Subject: Re: libgettextpo: Ability to work with non-ASCII without a header
Date: Tue, 23 Oct 2007 10:30:38 +0200

On Sun, 2007-10-21 at 16:31 +0200, Bruno Haible wrote:
> Hi,
> 
> Dwayne Bailey wrote:
> > In some of our tools in the Translate Toolkit e.g. pogrep we only pull
> > out bits of a file and create a PO file without a header.  Of course if
> > these contain non-ASCII chars then libgettextpo complains about invalid
> > multibyte sequences since it has no header and thus no information about
> > what encoding to use.  
> 
> Yes. A PO file without a header is not a PO file, it's just a bit of data.
> You can store it on a filesystem, or in a database. But you cannot view it
> in an editor, nor make a .mo file from it, because you would have to know the
> encoding for that.

The main problem for me was that I can't store it on a filesystem, the
multibyte error makes that impossible.  The two cases I'm using
un-headered files are 1) testing conversions, 2) quick roundtrip (grep,
edit, merge - in which the encoding remains unchanged).

> > A solution of course is to put in a header
> 
> Yes, that's the solution that we chose that minimizes the risk of
> misinterpretation, while at the same time not enforcing global choices to
> the users.

The use case I'm looking at is not published PO files so unlikely to
cause misinterpretation.

> > but it would be nice to be
> > able to specify the encoding of the file/text even when it has no
> > header.  So a function that allows you to set the encoding independent
> > of the header, of course if there is a header then the same function
> > would update the encoding information and change the current data.
> 
> Without a header, it's too messy: How should the conversion know what was
> the old encoding?

I think the conversion suggestion is confusing the motivation for my
original request, so please ignore that.

What my request is is simple.  Without a header any multibyte sequence
fails.  I would like the ability to say that the PO file is in format X.
Thus if there is output without a header then libgettextpo will not
complain.  Updating and maintaining the header entry would remain the
programmers responsibility in the same way that it is already.

Of course you could tweak the header initialisation to set the charset
if one has been defined and to produce an error if you try to change an
already defined charset.

Hope that is clearer.

> With a header: msgconv does this. Invoke msgconv. And, of course, be prepared
> to failures of the target encoding is not UTF-8.
> 
> I'm looking forward to the day when all PO files are in UTF-8. Then, a few
> years later, we can change the interpretation of PO files without a charset
> specification from ASCII to UTF-8. But we're not there yet.
> 
> Bruno
> 
> PS: A propos 'pogrep': It could use 'msggrep'. msggrep is much faster now
> since version 0.14.2.
> 
-- 
Dwayne Bailey
Translate.org.za

+27-12-460-1095 (w)
+27-83-443-7114 (cell)





reply via email to

[Prev in Thread] Current Thread [Next in Thread]