bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] persistence with multiple hostnames


From: Tony Lewis
Subject: Re: [Bug-wget] persistence with multiple hostnames
Date: Tue, 17 Apr 2012 11:15:00 -0700

It seems to me that wget should not reuse a connection from one host to
access another (even if those hosts share an IP address). I suspect the
current behavior is accidental rather than intentional.

Tony
-----Original Message-----
From: address@hidden
[mailto:address@hidden On Behalf Of Ryan Rawdon
Sent: Tuesday, April 17, 2012 9:16 AM
To: address@hidden
Subject: [Bug-wget] persistence with multiple hostnames

I was speaking with Micah on IRC today regarding a behavior in wget which is
different than curl and most or all browsers.

Generally HTTP clients do not use a given persistent connection for more
than one hostname, which is why tricks work like spreading static content
across multiple name-based vhosts on the same IP address to encourage more
parallelization in the fetching of a page's static elements.

However, wget appears to use persistent connections for multiple hostnames
(see below).  In the case below, a connection is opened to soldat.pl which
302s to a new hostname.  Wget resolves the new hostname and selects the same
address, and decides to reuse the existing connection to this IP address.

The RFC does not appear to address the re-use of persistent connections with
regard to hostname, so the behavior is permissible (and fine from a protocol
standpoint since Host is specified with each request).

The problem stems from usage of privilege separation between virtualhosts.
In the case below, before I fixed it today, wget was receiving 403 on the
second request because the user that owned this fd on the server side did
not have privileges to access the content for the soldat.thd.vg vhost.

This is probably a reproducible behavior with any page fetched with wget
that 302s between two privilege-separated vhosts on the same server, or
scraping a page with elements from two or more hosts on the same IP address.

This behavior appears to be permissible based on the RFC, so this is more a
discussion of whether this is intended behavior in wget, a bug, or an
opportunity to behave more like curl and every day GUI browsers.

Micah took a quick look over the source (or was previously familiar with
it), and it sounds like there may be checks in place which should have
prevented this, however I did look to confirm.

nova-dhcp-host111:tmp ryan$ wget http://soldat.pl
--2012-04-17 11:57:25--  http://soldat.pl/ Resolving soldat.pl
(soldat.pl)... 2607:fd50:1:91b0::50:1d8, 192.168.152.5 Connecting to
soldat.pl (soldat.pl)|2607:fd50:1:91b0::50:1d8|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://soldat.thd.vg/ [following]
--2012-04-17 11:57:25--  http://soldat.thd.vg/ Resolving soldat.thd.vg
(soldat.thd.vg)... 2607:fd50:1:91b0::50:1d8, 192.168.152.5 Reusing existing
connection to soldat.pl:80.
HTTP request sent, awaiting response... 302 Found
Location: http://soldat.thd.vg/en/ [following]
--2012-04-17 11:57:26--  http://soldat.thd.vg/en/ Reusing existing
connection to soldat.pl:80.
HTTP request sent, awaiting response... 200 OK Cookie coming from
soldat.thd.vg attempted to set domain to soldat.thd.vg Cookie coming from
soldat.thd.vg attempted to set domain to soldat.thd.vg Cookie coming from
soldat.thd.vg attempted to set domain to soldat.thd.vg
Length: unspecified [text/html]

Here is the original report from a userwhich shows the 403:


address@hidden:~$ wget www.soldat.pl
--2012-04-17 11:50:29--  http://www.soldat.pl/ Resolving www.soldat.pl...
67.23.118.186, 2607:fd50:1:91b0::50:1d8 Connecting to
www.soldat.pl|67.23.118.186|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://soldat.thd.vg/ [following]
--2012-04-17 11:50:29--  http://soldat.thd.vg/ Resolving soldat.thd.vg...
67.23.118.186, 2607:fd50:1:91b0::50:1d8 Reusing existing connection to
www.soldat.pl:80.
HTTP request sent, awaiting response... 403 Forbidden
2012-04-17 11:50:29 ERROR 403: Forbidden.

address@hidden:~$ wget -6 www.soldat.pl
--2012-04-17 11:50:39--  http://www.soldat.pl/ Resolving www.soldat.pl...
2607:fd50:1:91b0::50:1d8 Connecting to
www.soldat.pl|2607:fd50:1:91b0::50:1d8|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://soldat.thd.vg/ [following]
--2012-04-17 11:50:39--  http://soldat.thd.vg/ Resolving soldat.thd.vg...
2607:fd50:1:91b0::50:1d8 Reusing existing connection to www.soldat.pl:80.
HTTP request sent, awaiting response... 403 Forbidden
2012-04-17 11:50:39 ERROR 403: Forbidden.








reply via email to

[Prev in Thread] Current Thread [Next in Thread]