bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bug#594059: gettext: xgettext should not complain about UTF-8 (fwd)


From: Bruno Haible
Subject: Re: Bug#594059: gettext: xgettext should not complain about UTF-8 (fwd)
Date: Sun, 29 Aug 2010 12:28:15 +0200
User-agent: KMail/1.9.9

Hi,

W. Martin Borgert wrote:
> # -*- coding: utf-8 -*-
> name=_("Åland Islands")
> name=_("Côte d'Ivoire")
> 
> and the command line
> 
> $ xgettext --language=Python --omit-header --output=/dev/null utf8.py
> 
> leads to the following warnings:
> ...
> utf8.py:2: invalid multibyte sequence

This is an error, not a warning.

> Btw. there is a similar bug report at
> http://code.djangoproject.com/ticket/4734

Thanks for this pointer. That text says "It's due to --omit-header option
passed to xgettext". This is documented in the doc of xgettext:

  `--omit-header'
       Don't write header with `msgid ""' entry.
       ...
       Note that using this option will lead to an error if the resulting
       file would not entirely be in ASCII.

> Because UTF-8 is the default and recommended encoding in
> Debian (and, as far as I know, in PO files)

Nope, the default encoding in PO files is ASCII. The header entry contains
meta-information that allows the PO file to contain UTF-8 non-ASCII
characters. If you tell xgettext to omit the header entry, it cannot produce
an unambiguous PO file any more.

What the default encoding in Debian is, does not matter for PO and POT files,
because PO and POT files are meant to be transferred from/to the machines
of translators, that is, between machines that run different operating systems,
and Debian is not the only OS in the world.

The solution is:
  1. to always keep and generate a header in each PO and POT file,
  2. use xgettext to combine two POT files, and use the other tools (msgcat,
     msgattrib, etc.) when needed.

Bruno



reply via email to

[Prev in Thread] Current Thread [Next in Thread]