grammatica-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Grammatica-users] HTML grammar??


From: Per Cederberg
Subject: Re: [Grammatica-users] HTML grammar??
Date: Sun, 18 Dec 2005 13:47:41 +0100

Well, I guess it would be possible to write an HTML
grammar for Grammatica. But the question is more if
it would really be a good fit. The thing with HTML
is that *lots* of the real-world web pages are
invalid (syntactically).

So I think to write a good HTML-parser, one really
needs to do it by hand. Adding special code
everywhere to recover from common problems and
issues.

Also, HTML is a very unstrict syntax, allowing new
unknown tags to be used, end tags to be omitted, etc,
etc. So it is very hard to create a correct BNF
grammar that covers all that still provides something
more than a pure tokenizer.

Cheers,

/Per

On thu, 2005-12-15 at 11:33 -0800, John Kleven wrote:
> Hi all,
> 
> Curious if anybody has used Grammatica to create an
> HTML parser?
> 
> Not sure if thats a good fit for grammatica or not but
> it seemed like it might be.  The existing C# HTML
> parsers out there all seem to leave something (or
> quite a bit) to be desired.
> 
> Any info appreciated!
> Thanks
> John
> 
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around 
> http://mail.yahoo.com 
> 
> 
> _______________________________________________
> Grammatica-users mailing list
> address@hidden
> http://lists.nongnu.org/mailman/listinfo/grammatica-users
> 





reply via email to

[Prev in Thread] Current Thread [Next in Thread]