[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Help-smalltalk] HTML parser in GST
From: |
Holger Hans Peter Freyther |
Subject: |
Re: [Help-smalltalk] HTML parser in GST |
Date: |
Sat, 05 Jun 2010 00:11:03 +0800 |
User-agent: |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1.9) Gecko/20100430 Fedora/3.0.4-3.fc13 Lightning/1.0b1 Thunderbird/3.0.4 |
On 06/04/2010 11:56 PM, Andrei Stebakov wrote:
> I've noticed there are a number of XML parsers in the package. I
> wonder if I can use it as an HTML parser (similar to Soup
> http://news.squeak.org/2009/01/19/soup-for-squeak/)
> Are there any examples using it? The task is a simple web page
> retrieval and parsing, hunting for some tag with a value.
Well,
HTML parsers are a funny thing... the best thing to do is to use the
HTML5 parser specification and implement it from scratch, to my
knowledge it is the first time that there is a specification on how to
handle missing tags (e.g. how many elements to close, aka tag priorities).