[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Bug-wget] Implementation suggestion for JavaScript execution
From: |
Giuseppe Scrivano |
Subject: |
Re: [Bug-wget] Implementation suggestion for JavaScript execution |
Date: |
Mon, 26 May 2014 15:20:19 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux) |
Andrew Pennebaker <address@hidden> writes:
> Tumblr and other websites delay loading some of their content (images)
> through JavaScript events like *onload*. It would be nice if wget supported
> a *-j* flag for executing this, in order to access these dynamically loaded
> resources. Execution may add some time to downloads, but for users that
> really want the content, having the option is better than not.
>
> Possible solutions:
>
> The HtmlUnit <http://htmlunit.sourceforge.net/> library can already do
> this, but it's written in Java and I believe wget is written in C?
correct, wget is written in C.
> Another consideration for attaching JS execution to wget is
> Node<http://nodejs.org/>, a
> C++ implementation, though we probably only want the core, the
> V8<https://code.google.com/p/v8/>JavaScript engine itself.
>
> Other possibilities include
> SpiderMonkey<http://en.wikipedia.org/wiki/SpiderMonkey_(JavaScript_engine)>,
> the JS engine for Firefox, or
> JavaScriptCore<http://www.webkit.org/projects/javascript/>,
> Safari's JS engine.
how would you programmatically retrieve these links? Triggering
"onload" or other events? I wonder how many of these occurrences we can
cover by simply trying to parse cases like document.location='foo'
without involving any JS engine.
Giuseppe