librefm-discuss
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Librefm-discuss] Re: lastscrape.py


From: Seth Woodworth
Subject: Re: [Librefm-discuss] Re: lastscrape.py
Date: Mon, 1 Feb 2010 23:29:42 -0500

I would suggest, when possible, using the Html5lib parser and using
the traverser from BeautifulSoup.  The author himself suggests[1] this
in any case of  BS-3.1.0 or 3.0.8 behaving poorly.

I have been doing work with python, BeautifulSoup and Html5Lib lately,
and I've been collecting and slowly improving python scripts (like
this) to liberate data from websites like Reddit or the Ubuntu forums.
 I would love to get involved with the lastscrape.py script.


[1] http://www.crummy.com/software/BeautifulSoup/3.1-problems.html

--Seth

On Mon, Feb 1, 2010 at 6:32 PM, Gordon Haverland
<address@hidden> wrote:
> Hello.
>
> My TODO list got low enough, to look into this a little bit.  By
> default, I have the too new 3.1 version of BeautifulSoup
> (shortened to BS for this note).  I also downloaded BS-3.0.7a and
> BS-3.0.8.
>
> I have a "few" songs to download (364884).  I tried lastscrape as
> written with 3.0.8, and it died after a while.  Thinking maybe the
> time between pages was too short, I edited lastscrape to use a 20
> second pause instead of a 1 second pause.  Both 3.0.8 and 3.0.7a
> versions of BS still cause crashing.
>
> I can see that at some point, a download of my data from Last.fm
> is probably going to take a day or more.  Then I will have the
> pleasure of getting rid of duplicates.  But, are there things a
> person can do to get over humps in the download?
>
> I'm not really a python person, I prefer perl.  But I can probably
> adjust the python to do what is needed too.
>
> Thanks,
> Gord
>
>
>
>




reply via email to

[Prev in Thread] Current Thread [Next in Thread]