bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Hello again


From: 'Darshit Shah'
Subject: Re: [Bug-wget] Hello again
Date: Thu, 11 Oct 2018 11:34:52 +0200
User-agent: NeoMutt/20180716

* address@hidden <address@hidden> [181009 17:12]:
> 
> Hello Darshit Shah,
> 
> Thank you for your welcome message. I am glad to be part of your project!
> 
> I don't understand the term "javascript engine". AFAK javascript is code that 
> run on the browser side, and we have no problem fetching it.
>
Exactly! Javascript is code that is executed on the client side and hence
requires a javascript engine which interprets the code and executes it.
However, Wget does not and will not package a javscript engine in order to run
those scripts. This means, sites where Javascript is used to create hyperlinks
won't work well when scraped through Wget.
> 
> There might be an "ajax" issues with sites rely on it. Ajax is dealt heavy by 
> programmers and they will have to take some action on their site to 
> incorporate the engine.

Similarly, sites that use Javascript to show menus or create AJAX requests are
usually not amenable to being scraped as a static HTML page.
> 
> POST requests to comments and mail will need to taken care of so they will 
> work on static site. One solution is to do hosted supplier that will carry 
> the task and deliver spam removal as well.
> I think I will be able to a howto document on that.
> 
> Michael
> 
> -----Original Message-----
> From: Darshit Shah <address@hidden> 
> Sent: Tuesday, 9 October, 2018 2:52 PM
> To: address@hidden
> Cc: address@hidden
> Subject: Re: [Bug-wget] Hello again
> 
> Hi Michael,
> 
> Nice to hear from you again. I vaguely remember a mention of someone who 
> wanted
> to work on this feature. When deciding to make this work, please remember that
> any of this can only work if the site does not rely on Javascript; which given
> Wordpress is a difficult thing. The reason for this is that we do _not_ intend
> to ship a javascript engine alongwith Wget2. It is too large, unwieldy and too
> much of a maintenance nightmare. However, if the site can work without
> Javascript, then I would assume that Wget2 can already handle making a static
> copy. If it can't handle something, please let us know / file a bug report
> about it.
> 
> Of course, I welcome you to work on Wget2 as you see fit. And we would love to
> look at any contributions you can make. We will also try and help you out as
> much as possible when dealing with the codebase.
> 
> About the dev setup, I only use vim and gdb to work with Wget. As Tim has
> already mentioned, he uses Netbeans and might be able to help you out.
> 
> You also mentioned something about the lib/ directory. That is an
> auto-generated dir with compatibility libs that you don't need to care about.
> All the code for Wget2 is in src/ and the code for the library is in libwget/.
> Those are the two main directories you need to care about. And of course 
> tests/
> for the tests.
> 
> * address@hidden <address@hidden> [181008 21:22]:
> > 
> > Hello again,
> > 
> > My name is Michael. I have approached you about a year ago.
> > 
> > I am interested in making wget2 a tool that can convert content management
> > systems (like WordPress) output to HTML. This actually limits the content
> > management system to generate the website every time it is changed, and the
> > presentation is done using the HTTP server only.
> > 
> > This is an important feature as it prevents security risk - penetration of
> > hacker to the site and installing viruses or stealing data.
> > It also allows the website to be delivered much faster as no PHP code needs
> > to run in order to deliver the content. Google already announced that site
> > download speed is a factor in its SEO evaluation.
> > 
> > I will be able to work for 3 hours every week on the project. I do need some
> > guidance from you.
> > 
> > I have started to configure Netbeans IDE as using a debugger can help me
> > delve into the code much faster. There are some issues with the Netbeans. Do
> > you use Id? Which one?
> > 
> > Best regards,
> > 
> > Michael
> > 
> > 
> > 
> > 
> 
> -- 
> Thanking You,
> Darshit Shah
> PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6
> 
> 

-- 
Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]