bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug #60287] Windows recursive download escapes utf8 URLs twice


From: Eli Zaretskii
Subject: [bug #60287] Windows recursive download escapes utf8 URLs twice
Date: Fri, 26 Mar 2021 15:48:56 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 5.1; rv:52.0) Gecko/20100101 Firefox/52.0

Follow-up Comment #6, bug #60287 (project wget):

> Isn't the encoding specified in the HTTP header?

Not the local one.  (And not every page you download has these headers, so the
remote one isn't always known, either.)

You must specify the local encoding, especially on MS-Windows, because Windows
filesystems aren't agnostic about encoding file names, they don't allow
arbitrary byte sequences to be part of a file name.  The file names are
written on disk in UTF-16, and so the file I/O APIs on Windows must convert
file names to UTF-16, and for that they need to know its encoding.

> If feels like a bug because my browser handles the links just fine, without
the chatset specified by the server.

The browser just shows the page, it doesn't save it to a disk file.  So
encoding of the page's name isn't an issue for the browser, as it is for
Wget.


    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?60287>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]