librefm-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Librefm-discuss] Re: lastscrape.py


From: Oscar Celma
Subject: Re: [Librefm-discuss] Re: lastscrape.py
Date: Tue, 02 Feb 2010 06:11:32 +0100
User-agent: Thunderbird 2.0.0.23 (X11/20090817)

Hi,

is there any need *now* to scrape the user's page to get all the tracks he's been playing?
Since only a few months ago last.fm API allows to get all the tracks from a user (using pagination of the results).

See:
http://bugs.libre.fm/wiki/LastToLibre
and
http://www.last.fm/api/show?service=278

Cheers, Oscar

Seth Woodworth wrote:
I would suggest, when possible, using the Html5lib parser and using
the traverser from BeautifulSoup.  The author himself suggests[1] this
in any case of  BS-3.1.0 or 3.0.8 behaving poorly.

I have been doing work with python, BeautifulSoup and Html5Lib lately,
and I've been collecting and slowly improving python scripts (like
this) to liberate data from websites like Reddit or the Ubuntu forums.
 I would love to get involved with the lastscrape.py script.


[1] http://www.crummy.com/software/BeautifulSoup/3.1-problems.html

--Seth

On Mon, Feb 1, 2010 at 6:32 PM, Gordon Haverland
<address@hidden> wrote:
  
Hello.

My TODO list got low enough, to look into this a little bit.  By
default, I have the too new 3.1 version of BeautifulSoup
(shortened to BS for this note).  I also downloaded BS-3.0.7a and
BS-3.0.8.

I have a "few" songs to download (364884).  I tried lastscrape as
written with 3.0.8, and it died after a while.  Thinking maybe the
time between pages was too short, I edited lastscrape to use a 20
second pause instead of a 1 second pause.  Both 3.0.8 and 3.0.7a
versions of BS still cause crashing.

I can see that at some point, a download of my data from Last.fm
is probably going to take a day or more.  Then I will have the
pleasure of getting rid of duplicates.  But, are there things a
person can do to get over humps in the download?

I'm not really a python person, I prefer perl.  But I can probably
adjust the python to do what is needed too.

Thanks,
Gord




    


  


reply via email to

[Prev in Thread] Current Thread [Next in Thread]