[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Po4a-dev]HTML module (first revision)
From: |
Laurent Hausermann |
Subject: |
Re: [Po4a-dev]HTML module (first revision) |
Date: |
Tue, 18 Feb 2003 10:48:00 +0100 |
User-agent: |
Internet Messaging Program (IMP) 3.1 |
Hi all,
> In fact, HTML being a DTD of SGML, I guess it could be easier to handle
> this
> format with the Sgml.pm module, which offers the whole mecanism to do what
> I
> wanted from the HTML.pm...
>
> You would only have to add the specific parts to HTML after the specific
> parts of docbook and debiandoc, and provided that your documents start with
> a line like
> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
> (as they should), it will work (I guess) !
Ok , I'll look deeper in SGML.pm it seems to be a good "theorical" idea , but I
am wondering if a SGML module could parse some "particular" HTML generated by
people that doesn't mind HTML 3.2 or 4.0 DTD :) !
> > > I have developped an HTML module for po4a. It has still some bugs and
> > > it's not perfect, but I think it's a good starting point.
> > > It uses HTML::TokeParser ( apt-get install libhtml-parser-perl )
> > > I sent the whole diff to Martin Quinson, not to this list
> > Ok, I commited this to the CVS, so that others can see it.
Thanks.
> > This module isn't ready to release yet in my opinion. Here are my
> objections:
> > * The parser you used don't allow to retrieve the line number. Why not
> > to use the HTML::Parser module, which seems somehow more powerfull ?
You are right. HTML::Parser is more powerful, but seemed to me more difficult
to parse HTML with it...
I am not a i18n expert, can you explain me why line number is so important..
Espacially for SGML/XML/HTML ?
> > That is to say that sentences are broken in subparts, which is BAD.
> > (see http://www.ens-lyon.fr/~mquinson/l10n.html for a rational).
Yes, you are probably right also, but for example poedit, a tool that can be
used by translators won't print <b> or <i> tags in bold or in italic... And I
think a translator should not be an HTML expert. The <a> tag is too much
difficult to "translate" to let a translator have a control on it.
> > * Your version don't put entry type in the po, which prevents from
> > using
> > gettextization (see po4a(7) for more details). I quickly hacked a
> > support for that in the version in CVS, but that's not perfect yet.
Ooops, I missed that point. I had a look at your "hack" but, I don't see a
better way to handle "gettization". Have you got any more idea for that point ?
> > I suggest that:
> > - you move to a parser that allows you to retrieve the line number (or
> > explain me that I'm an idiot and that this parser do allow you to
> > retrieve the line number, and how)
I will look to internals of HTML::TokeParser
> > - you look at the sgml module to see how we handle the fact that some
> > tags delimit a paragraph (like <p>), and should be translated, and that
> > some other tags shouldn't be touched because they don't delimit a
> > sentence (like <b>, <i> and so on)
OK. I look if I can add HTML to SGML module.
> > Sorry, but I really can't release this module as is...
> > Anyway, thanks for your contribution, it IS a good start.
Don't be sorry, you are responsible to provide to community good po4a
software , and I am a "mongers beginner" :)