[bug #60287] Windows recursive download escapes utf8 URLs twice

bug-wget

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #60287] Windows recursive download escapes utf8 URLs twice

From:	Cameron Tacklind
Subject:	[bug #60287] Windows recursive download escapes utf8 URLs twice
Date:	Sun, 28 Mar 2021 23:28:21 -0400 (EDT)
User-agent:	Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36

Follow-up Comment #11, bug #60287 (project wget):

Except a URI is always in a restricted character set, by design, to make all
the encoding issues go away.

I hear the point about writing the file to disk and making sure the path used
on disk can be reliably generated from an arbitrary encoding scheme. But that
should happen independently from contactinating the relative uri with the base
uri, both of which are always in a restricted subset of octets that is a
subset of printable ascii characters.

So, while I agree that a conversion to the local charset needs to happen, that
should *only* happen with regard to the file system file name, which is
independent from the request line sent to the HTTP server.

The 404 is *exactly* the problem I think is a bug. The downloaded HTML file
has embedded <a> tags with `href` attributes that are *never* outside of the
printable ascii range.

This 404 happens, as far as I can tell, because wget *assumes* local character
set is important instead of doing what is specified in the HTML/HTTP
standards, as far as I understand them, of not doing any character encoding
translations.

    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?60287>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/

[Prev in Thread]

Current Thread

[Next in Thread]

[bug #60287] Windows recursive download escapes utf8 URLs twice, (continued)
- [bug #60287] Windows recursive download escapes utf8 URLs twice, Cameron Tacklind, 2021/03/25
  - [bug #60287] Windows recursive download escapes utf8 URLs twice, Eli Zaretskii, 2021/03/25
    - [bug #60287] Windows recursive download escapes utf8 URLs twice, Cameron Tacklind, 2021/03/26
    - [bug #60287] Windows recursive download escapes utf8 URLs twice, Eli Zaretskii, 2021/03/26
    - [bug #60287] Windows recursive download escapes utf8 URLs twice, Cameron Tacklind, 2021/03/26
    - [bug #60287] Windows recursive download escapes utf8 URLs twice, Eli Zaretskii, 2021/03/26
    - [bug #60287] Windows recursive download escapes utf8 URLs twice, Cameron Tacklind, 2021/03/26
    - [bug #60287] Windows recursive download escapes utf8 URLs twice, Eli Zaretskii, 2021/03/27
    - [bug #60287] Windows recursive download escapes utf8 URLs twice, Cameron Tacklind, 2021/03/27
    - [bug #60287] Windows recursive download escapes utf8 URLs twice, Eli Zaretskii, 2021/03/28
    - [bug #60287] Windows recursive download escapes utf8 URLs twice, Cameron Tacklind <=
    - [bug #60287] Windows recursive download escapes utf8 URLs twice, Eli Zaretskii, 2021/03/29
    - [bug #60287] Windows recursive download escapes utf8 URLs twice, Cameron Tacklind, 2021/03/29

Prev by Date: [bug #60287] Windows recursive download escapes utf8 URLs twice
Next by Date: [bug #60287] Windows recursive download escapes utf8 URLs twice
Previous by thread: [bug #60287] Windows recursive download escapes utf8 URLs twice
Next by thread: [bug #60287] Windows recursive download escapes utf8 URLs twice
Index(es):
- Date
- Thread