Re: [Help-smalltalk] HTML parser in GST

help-smalltalk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-smalltalk] HTML parser in GST

From:	Paolo Bonzini
Subject:	Re: [Help-smalltalk] HTML parser in GST
Date:	Sat, 05 Jun 2010 14:41:14 +0200
User-agent:	Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.1.9) Gecko/20100430 Fedora/3.0.4-3.fc13 Lightning/1.0b2pre Thunderbird/3.0.4

On 06/04/2010 06:11 PM, Holger Hans Peter Freyther wrote:

On 06/04/2010 11:56 PM, Andrei Stebakov wrote:

I've noticed there are a number of XML parsers in the package. I
wonder if I can use it as an HTML parser (similar to Soup
http://news.squeak.org/2009/01/19/soup-for-squeak/) Are there any
examples using it? The task is a simple web page retrieval and
parsing, hunting for some tag with a value.


If Soup has some kind of SAX interface it would be easy to use it to
build the DOM and then query it with XPath.

Well, HTML parsers are a funny thing... the best thing to do is to
use the HTML5 parser specification and implement it from scratch, to
my knowledge it is the first time that there is a specification on
how to handle missing tags (e.g. how many elements to close, aka tag
priorities).


Agreed.

Paolo

[Prev in Thread]

Current Thread

[Next in Thread]

[Help-smalltalk] HTML parser in GST, Andrei Stebakov, 2010/06/04
- Re: [Help-smalltalk] HTML parser in GST, Holger Hans Peter Freyther, 2010/06/04
  - Re: [Help-smalltalk] HTML parser in GST, Paolo Bonzini <=

Prev by Date: [Help-smalltalk] Re: GC bug
Next by Date: Re: [Help-smalltalk] make freeze at the end
Previous by thread: Re: [Help-smalltalk] HTML parser in GST
Next by thread: [Help-smalltalk] make freeze at the end
Index(es):
- Date
- Thread