[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: libgettextpo: Ability to work with non-ASCII without a header
From: |
Bruno Haible |
Subject: |
Re: libgettextpo: Ability to work with non-ASCII without a header |
Date: |
Sun, 21 Oct 2007 16:31:27 +0200 |
User-agent: |
KMail/1.5.4 |
Hi,
Dwayne Bailey wrote:
> In some of our tools in the Translate Toolkit e.g. pogrep we only pull
> out bits of a file and create a PO file without a header. Of course if
> these contain non-ASCII chars then libgettextpo complains about invalid
> multibyte sequences since it has no header and thus no information about
> what encoding to use.
Yes. A PO file without a header is not a PO file, it's just a bit of data.
You can store it on a filesystem, or in a database. But you cannot view it
in an editor, nor make a .mo file from it, because you would have to know the
encoding for that.
> A solution of course is to put in a header
Yes, that's the solution that we chose that minimizes the risk of
misinterpretation, while at the same time not enforcing global choices to
the users.
> but it would be nice to be
> able to specify the encoding of the file/text even when it has no
> header. So a function that allows you to set the encoding independent
> of the header, of course if there is a header then the same function
> would update the encoding information and change the current data.
Without a header, it's too messy: How should the conversion know what was
the old encoding?
With a header: msgconv does this. Invoke msgconv. And, of course, be prepared
to failures of the target encoding is not UTF-8.
I'm looking forward to the day when all PO files are in UTF-8. Then, a few
years later, we can change the interpretation of PO files without a charset
specification from ASCII to UTF-8. But we're not there yet.
Bruno
PS: A propos 'pogrep': It could use 'msggrep'. msggrep is much faster now
since version 0.14.2.