|
From: | Paolo Bonzini |
Subject: | Re: [Help-smalltalk] HTML parser in GST |
Date: | Sat, 05 Jun 2010 14:41:14 +0200 |
User-agent: | Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100430 Fedora/3.0.4-3.fc13 Lightning/1.0b2pre Thunderbird/3.0.4 |
On 06/04/2010 06:11 PM, Holger Hans Peter Freyther wrote:
On 06/04/2010 11:56 PM, Andrei Stebakov wrote:I've noticed there are a number of XML parsers in the package. I wonder if I can use it as an HTML parser (similar to Soup http://news.squeak.org/2009/01/19/soup-for-squeak/) Are there any examples using it? The task is a simple web page retrieval and parsing, hunting for some tag with a value.
If Soup has some kind of SAX interface it would be easy to use it to build the DOM and then query it with XPath.
Well, HTML parsers are a funny thing... the best thing to do is to use the HTML5 parser specification and implement it from scratch, to my knowledge it is the first time that there is a specification on how to handle missing tags (e.g. how many elements to close, aka tag priorities).
Agreed. Paolo
[Prev in Thread] | Current Thread | [Next in Thread] |