|
From: | Oscar Celma |
Subject: | Re: [Librefm-discuss] Re: lastscrape.py |
Date: | Tue, 02 Feb 2010 06:11:32 +0100 |
User-agent: | Thunderbird 2.0.0.23 (X11/20090817) |
Hi, is there any need *now* to scrape the user's page to get all the tracks he's been playing? Since only a few months ago last.fm API allows to get all the tracks from a user (using pagination of the results). See: http://bugs.libre.fm/wiki/LastToLibre and http://www.last.fm/api/show?service=278 Cheers, Oscar Seth Woodworth wrote: I would suggest, when possible, using the Html5lib parser and using the traverser from BeautifulSoup. The author himself suggests[1] this in any case of BS-3.1.0 or 3.0.8 behaving poorly. I have been doing work with python, BeautifulSoup and Html5Lib lately, and I've been collecting and slowly improving python scripts (like this) to liberate data from websites like Reddit or the Ubuntu forums. I would love to get involved with the lastscrape.py script. [1] http://www.crummy.com/software/BeautifulSoup/3.1-problems.html --Seth On Mon, Feb 1, 2010 at 6:32 PM, Gordon Haverland <address@hidden> wrote:Hello. My TODO list got low enough, to look into this a little bit. By default, I have the too new 3.1 version of BeautifulSoup (shortened to BS for this note). I also downloaded BS-3.0.7a and BS-3.0.8. I have a "few" songs to download (364884). I tried lastscrape as written with 3.0.8, and it died after a while. Thinking maybe the time between pages was too short, I edited lastscrape to use a 20 second pause instead of a 1 second pause. Both 3.0.8 and 3.0.7a versions of BS still cause crashing. I can see that at some point, a download of my data from Last.fm is probably going to take a day or more. Then I will have the pleasure of getting rid of duplicates. But, are there things a person can do to get over humps in the download? I'm not really a python person, I prefer perl. But I can probably adjust the python to do what is needed too. Thanks, Gord |
[Prev in Thread] | Current Thread | [Next in Thread] |