bug-gnubg
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-gnubg] utf-8 coding in the Icelandic translation


From: Jim Segrave
Subject: Re: [Bug-gnubg] utf-8 coding in the Icelandic translation
Date: Thu, 16 Oct 2003 08:24:22 +0200
User-agent: Mutt/1.4.1i

On Wed 15 Oct 2003 (23:00 +0100), Hlynur Sigurg?slason wrote:
> On Wed, 15 Oct 2003 20:57:48 +0100, Hlynur Sigurg?slason <address@hidden> 
> wrote:
> 
> >Hi.
> >I just got the new gnubg language package of the internet and I am 
> >afraid the coding of the is.po file got quite scrambled.  The thing is 
> >it has to be saved in utf-8 coding (in Emacs: Ctrl-x [RET] f [RET] 
> >utf-8).  Any changes to the file without preserving the coding will 
> >change the special Icelandic letters into gibberish.  This is strange 
> >since all the Icelandic letters are in the ISO 8859-1 (Latin-1) 
> >character list.  I don't know why this is, but if I don't use utf-8 
> >coding, no menu items containing Icelandic letters are displayed.
> >
> >If there is any way around this I would like to know, since this causes 
> >trouble for anyone interested in adding to the translation.
> >
> >When I translate I write into a file which is Latin-1 coded, then I 
> >write the file to is.po, change the coding to utf-8, and save.  Then I 
> >create the gnubg.mo file as usually.  I use Emacs on win9x.

I've seen what the problem was and I think I know what happened, I
should have realised it was going wrong when I was fixing the problem
in drawboard.c.

Your first file was in UTF-8 encoding, the one you just sent is in
Latin-1. So in the first file, the translation for File is 
'S' 'k' 'r' 0xc3 0xa1 'i' 'n'

the two bytes 0xc3 0xa1 are the UTF-8 encoding of a character with the
value 0xe1

Your new file is in Latin-1 and the translation is

'S' 'k' 'r' 0xe1 'i' 'n'

The problem was that the first file wasn't being processed as a UTF-8
input file, it was being processed as a Latin-1 file, so msgfmt very
helpfully inserted the UTF-8 encodings for 0xc3 and 0xa1 into the
translation when building the is.gmo file.

>From the gettext documentation I'm pretty sure that this was because
the header line had the line:

"Content-Type: text/plain; charset=ISO-8859-1\n"
but this should have been
"Content-Type: text/plain; charset=UTF-8\n"

I should have noticed the anomoly that the translation of "off" was
turning into 5 UTF-8 characters instead of 3.

Sorry for the mixup, I can well believe it's very frustrating to have
I'll commit your new version straight away. You should be able to
choose which of the two input formats you want to use, UTF-8 or
Latin-1, as long as the charset in the header line matches your
choice.

Looking at the other language files, everyone is using
charset=ISO-8859-1 and Latin-1 encodings except the Japanese one, but
I see no reason why you can't use UTF-8 if that's easier.

-- 
Jim Segrave           address@hidden





reply via email to

[Prev in Thread] Current Thread [Next in Thread]