help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to get title of web page by url?


From: Andreas Röhler
Subject: Re: How to get title of web page by url?
Date: Wed, 28 Jul 2010 18:03:58 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i686; de; rv:1.9.1.11) Gecko/20100711 Thunderbird/3.0.6

[ ... ]

The real solution for extracting title from a HTML text are not regular
expressions but a specific HTML parser. The Lisp way to write such
parser would be to turn the document (or only the head part) to nested
lists and other s-expressions and then dive into the list to find the
title. Such parsers already exist for Common Lisp but I'm not sure about
Emacs Lisp.



beg-end.el

at

http://bazaar.launchpad.net/~a-roehler/s-x-emacs-werkstatt

is an essay for such a parser

see thing-at-point-markup.el too, which serves markup-languages as xml, html

thing-at-point-utils.el offers functions to grasp everything between angles - and does count nesting.

try ar-angled-lesser-atpt for example

all this needs

thingatpt-utils-base.el,

where the core routines reside.

Have a look, how the parser mentioned is employed via beginning-of-form-base, end-of-form-base from there.


Andreas


Andreas

--
https://code.launchpad.net/~a-roehler/python-mode
https://code.launchpad.net/s-x-emacs-werkstatt/










reply via email to

[Prev in Thread] Current Thread [Next in Thread]