[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: xgettext and Windows newlines (CRLF) in multi-line source
From: |
Adrien Morel |
Subject: |
Re: xgettext and Windows newlines (CRLF) in multi-line source |
Date: |
Sun, 22 Aug 2010 17:17:15 +0200 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; fr; rv:1.9.2.8) Gecko/20100802 Lightning/1.0b2pre Thunderbird/3.1.2 |
Hi Bruno!
Le 15/08/2010 02:28, Bruno Haible a écrit :
Discussions on mailing lists take place with plain-text mail. Please avoid
sending HTML formatted mails to mailing lists. There's is surely an option
for this in Thunderbird.
Sorry about HTML, Thunderbird unfortunately handles that very weirdly (must
SHIFT-click on "write" or "answer", didn't know that).
Thanks for the report. So, what you are saying is that:
When a source file in PHP syntax has Windows line endings, then
newlines in string literals are encoded as CR LF, but when the source
file has Unix line endings, then newlines in string literals are
encoded as LF.
Speaking about the way they are encoded in internal PHP processing, yes. PHP
does not internally convert CRLF newlines into LF. But I think this is
intended. You probably know that better than me.
I consider this a flaw in the design of PHP, because
1) For more than 10 years, the Unicode consortium recommends that on
input, CR LF and LF should be treated the same. See
<http://www.unicode.org/reports/tr13/tr13-9.html>
But for me that means something else. I understand that PHP should not convert
CRLF to LF, but treat them the same way. And the same applies to the gettext
extension's code, it should find the "Hello you.\r\nWelcome!" string in the
catalog even though the entry mention "Hello you.\nWelcome!"
2) PHP is used mainly for web programming, and it makes no sense for a
web application to behave differently whether the programmer wrote
his programs on a Windows or on a Unix machine, or whether the server
is running on a Windows or on a Unix machine.
Absolutely.
Because of this guideline, to treat CR LF and LF the same, strings in POT files
usually contain \n as newline marker. Usually - when the source file is using
Unix newlines, and xgettext is running on a Unix machine, or when the source
file is using Windows newlines, and xgettext is running on a Windows machine.
String in POT file could contain LF, CR, or CRLF, that should not change
anything, because they should all be considered as the same entity, if I got it.
Currently, however, for a file with Windows newlines and xgettext running on a
Unix machines, the resulting POT file will contain \r\n as newline marker
inside strings. This may be considered a bug, but before I fix it, it would
be good to have an official statement about this issue from the PHP people.
I cannot find anything on this topic in
<http://www.php.net/manual/en/langref.php>. In this situation, xgettext also
emits warnings:
warning: internationalized messages should not contain the `\r' escape
sequence
The reason is that translator tools are supposed to work with \n and not with
\r\n.
I see two possible solutions for your problem:
a) PHP should be fixed so that newlines in string literals are '\n',
independent of the platform.
That could lead in many PHP programs to stop working I guess, since many
developpers are not aware of that fact and rely on the presence of CRLF
markers to catch newlines. I know it's a bad habit but it's a fact.
b) The PHP gettext function family
<http://de2.php.net/manual/en/book.gettext.php>
gets changed to preprocess CR LF into LF in the argument string
before looking up the translation.
That's, I'm convinced, the best solution. It means treating all newlines the
same way, and the developers shouldn't worry anymore about having all newlines
as LF on every platform.
For each of these solutions, you should report a bug at<http://bugs.php.net/>.
I'll do it, at least for the second one.
Other than that, there are two workarounds:
c) You change the line terminator conventions of your source files.
Many text editors on Windows nowadays support this.
Sure, Notepad++ which I use does that without any problem, but I'm only the
one getting the files, I cannot please for this change.
d) You convert CR LF to LF in all strings before you call gettext, using
string operations<http://www.php.net/manual/en/book.strings.php>.
There are about 20,000 strings in the code, that would mean to change every
_("...") call into a double call. I could simply define a new function, __()
for example, which does this replacement and call _() afterwards.
Well, thank you for your time, I'll report here any further news.
Adrien