[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: msggrep problems when priming bison-runtime's pump
From: |
Bruno Haible |
Subject: |
Re: msggrep problems when priming bison-runtime's pump |
Date: |
Mon, 25 Jul 2005 15:21:18 +0200 |
User-agent: |
KMail/1.5 |
Paul Eggert wrote:
> I wanted to extract from (say) po/et.po the subset of msgids that are
> mentioned in runtime-po/bison-runtime.pot.
You can do this through
$ msgcat --more-than=1 po/et.po runtime-po/bison-runtime.pot
or
$ msgcomm po/et.po runtime-po/bison-runtime.pot
The result of both commands is not the same. The first one is usually
better from a translator's point of view, the second one may be better
for a maintainer.
> The Gettext manual gives this as an example:
>
> msggrep --location src/getopt.c -o compendium.po file.po
>
> So I tried this command:
>
> msggrep --location runtime-po/bison-runtime.pot po/et.po
Err, the --location flag searches the #: part of the messages.
But the messages you are looking at have the line number info
#: data/yacc.c:NN
not
#: runtime-po/bison-runtime.pot:NN
> so I then generated the msgids by hand (there are only a few) and
> tried this:
>
> msggrep -K 'memory exhausted' -K 'syntax error' po/et.po
>
> but this isn't the correct usage for msggrep.
Yup, it is hard for a command-line program to accept both basic and
extended regexps for 5 different roles. Here we stumble on limitations
of what can reasonably done with command-line options.
> This worked, except that I didn't want one of the 'syntax error'
> messages. That is, of the following msgids extracted by that
> msggrep:
>
> msgid "memory exhausted"
> msgid "syntax error"
> msgid "syntax error, unexpected %s"
> msgid "syntax error, unexpected %s, expecting %s or %s or %s or %s"
> msgid "syntax error, unexpected %s, expecting %s or %s or %s"
> msgid "syntax error, unexpected %s, expecting %s or %s"
> msgid "syntax error, unexpected %s, expecting %s"
> msgid "syntax error: cannot back up"
> msgid "syntax error; also memory exhausted"
>
> I didn't want the last one (since it's no longer in bison-runtime).
> However, I couldn't come up with a pattern to do that. For example,
>
> msggrep -K -E -e 'memory exhausted' -e '^syntax error($|[^;])' po/et.po
>
> still outputs that last msgid.
Yes, it does this because the last msgid matches the first pattern. The
different patterns are implicitly "or"ed together.
> Finally, msggrep outputs lots of messages like this:
>
> msggrep: warning: Locale charset "UTF-8" is different from
> input file charset "ISO-8859-15".
> Output of 'msggrep' might be incorrect.
> Possible workarounds are:
> - Set LC_ALL to a locale with encoding ISO-8859-15.
> - Convert the translation catalog to UTF-8 using
> 'msgconv', then apply 'msggrep',
> then convert back to ISO-8859-15 using 'msgconv'.
>
> These messages are alarming, and I don't think they apply here.
The warning indeed is not useful here, because the regexp that you
provided would yield the same results in ISO-8859-15 encoding as in
UTF-8 encoding. But other regexps like 'foo \(.\)\1' (as a basic regexp)
do not have this property.
The problem I've here with msggrep is:
1) We don't have code that executes a regexp in an arbitrary encoding,
if no locale for this encoding is present on the system.
2) We don't have code that detects whether a regexp's result will be
encoding dependent or not.
> At any rate, there should be a reliable way to do this little task
> without getting the warning, and without having to set LC_ALL to a
> different value for each catalog, in a catalog-dependent way.
Yes, I agree with you.
Ideas or code to fix this are welcome.
Bruno