[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Problem during using of GNU wget
From: |
Darshit Shah |
Subject: |
Re: Problem during using of GNU wget |
Date: |
Sat, 16 Nov 2024 00:01:25 +0100 |
Hi Pawel,
On Tue, Jul 16, 2024, at 08:52, Pawel Wojciech Glod wrote:
> Hello GNU support,
>
> I am an employee of CERN and one of my tasks is web scraping the
> internal pages of our organisation. To do this, I use wget to download
> the entire directory structure of the website along with the HTML files.
>
> I have a problem with websites whose top-level domain (TLD) is ".cern".
> An example page is https://openlab.cern/
> According to our documentation, it does not require cookies or a
> session token. Unfortunately, a single HTML file is downloaded
> containing only the code of the home page. Are you able to diagnose why
> this is happening? Perhaps the website has additional security features
> or it requires a session token or cookies.
How are you invoking Wget? When I run:
`wget -r https://openlab.cern/`
I see multiple pages and their associated files being downloaded. Could you
please
give a more detailed description of the problem along with the command you used
and
the full output?
>
> My second question concerns the issue of when we need to download
> cookies and the session token. We have our own tool for this, but how
> do we take into account redirecting to another authentication page
> using wget so that after authentication, the wget command works
> correctly? What url address should be included?
>
I'm not sure I understand your question. If your authentication generates a
session token that is
stored as a cookie, you can save it in a standard Cookie text file (both
Firefox and Chrome have
ways to allow you to do this) and then import that cookie in Wget using the
`--load-cookies` option
Is there something else you need?
> I would appreciate a prompt reply.
>
> Best regards, Pawel Glod
> CERN, BE-CSS
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Re: Problem during using of GNU wget,
Darshit Shah <=