bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Relative links for wget2


From: Tim Rühsen
Subject: Re: Relative links for wget2
Date: Sat, 21 Aug 2021 19:15:49 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.12.0

Hi Matt,

this works as expected with wget2 built from latest git master. Which reminds me that we urgently need a new release.

If you want to build wget2 from tarball (which is more hassle-free than building from git master), follow the instruction from https://gitlab.com/gnuwget/wget2/#downloading-and-building-from-tarball). Don't forget to install the requisites beforehand.

Feel free to ask here if you run into trouble.

Regards, Tim


On 18.08.21 02:08, Matt Huszagh wrote:
Hello,

I'm trying to archive a single webpage for offline viewing with
wget2. To accomplish this, I'm invoking the following command:

```
wget2 --robots=off --page-requisites --adjust-extension --convert-links=on 
http://www.ke5fx.com/k22.htm
```

 From reading the help menu, it's my understanding that wget2 should
download everything need to display this page (from page-requisites) and
convert the links to these resources to point to the local copies (with
convert-links).

However, this is not the behavior I observe. For example, the HTML for
several of the images show up as

```
<i>Click on photos below to enlarge</i>
<hr>
<br clear=all>
<a href="http://www.ke5fx.com/k22/ext_large.jpg";><img 
src="http://www.ke5fx.com/k22/ext_sm.jpg"; hspace=30 vspace=30></a>
<br clear=all>
<a href="http://www.ke5fx.com/k22/int_large.jpg";><img 
src="http://www.ke5fx.com/k22/int_sm.jpg"; hspace=30 vspace=30></a>
<br clear=all>
<a href="http://www.ke5fx.com/k22/bfg_large.jpg";><img 
src="http://www.ke5fx.com/k22/bfg_sm.jpg"; hspace=30 vspace=30></a>
<br clear=all>
<hr>
```

The downloaded directory structure does, however, appear correct:

```
$ tree www.ke5fx.com
www.ke5fx.com
├── k22
│   ├── bfg_sm.jpg
│   ├── compare.png
│   ├── ext_sm.jpg
│   ├── HP_k22_s21_s12.gif
│   ├── int_sm.jpg
│   ├── k22_s11.png
│   └── k22_s21_s12.gif
└── k22.htm.html
```

Moreover, doing the same thing with wget works as I'd expect:

```
wget -e --robots=off --page-requisites --adjust-extension --convert-links 
http://www.ke5fx.com/k22.htm
```

```
<hr>
<br clear=all>
<a href="http://www.ke5fx.com/k22/ext_large.jpg";><img src="k22/ext_sm.jpg" hspace=30 
vspace=30></a>
<br clear=all>
<a href="http://www.ke5fx.com/k22/int_large.jpg";><img src="k22/int_sm.jpg" hspace=30 
vspace=30></a>
<br clear=all>
<a href="http://www.ke5fx.com/k22/bfg_large.jpg";><img src="k22/bfg_sm.jpg" hspace=30 
vspace=30></a>
<br clear=all>
<hr>
```

When I attempt the same wget2 command with a wikipedia page, I get
different results:

```
wget2 --robots=off --page-requisites --adjust-extension --convert-links=on 
https://en.wikipedia.org/wiki/EPROM
```

```
<div class="thumb tleft"><div class="thumbinner" style="width:252px;"><a href="/wiki/File:EPROM_Intel_C1702A.jpg" class="image"><img alt="" src="//upload.wikimedia.org/wikipedia/commons/thumb/3/39/EPROM_Intel_C1702A.jpg/250px-EPROM_Intel_C1702A.jpg" 
decoding="async" width="250" height="130" class="thumbimage" srcset="//upload.wikimedia.org/wikipedia/commons/thumb/3/39/EPROM_Intel_C1702A.jpg/375px-EPROM_Intel_C1702A.jpg 1.5x, //upload.wikimedia.org/wikipedia/commons/thumb/3/39/EPROM_Intel_C1702A.jpg/500px-EPROM_Intel_C1702A.jpg 2x" 
data-file-width="1275" data-file-height="665" /></a>  <div class="thumbcaption"><div class="magnify"><a href="/wiki/File:Eprom.jpg" class="internal" title="Enlarge"></a></div>An Intel 1702A EPROM, one of the earliest EPROM types (1971), 
256 by 8 bit. The small quartz window admits UV light for erasure.</div></div></div>
```

Rather than the src pointing to a remote url or local file, it points to
a nonexistant "//upload.wikimedia.org/...".

It's worth mentioning that wget doesn't get me the expected behavior
here either. The image files reference remote urls, rather than local
paths.

Am I misusing wget2 somehow? If so, what are the correct flags to
achieve what I want?

Thanks
Matt


Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]