[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[bug #60287] Windows recursive download escapes utf8 URLs twice
From: |
Eli Zaretskii |
Subject: |
[bug #60287] Windows recursive download escapes utf8 URLs twice |
Date: |
Sat, 27 Mar 2021 02:43:56 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (Windows NT 5.1; rv:52.0) Gecko/20100101 Firefox/52.0 |
Follow-up Comment #8, bug #60287 (project wget):
> Is this because wget first downloads the html file and then reads the
contents off disk
No. It's because Wget downloads the pages you told it to, and saves them as
disk files. Any links in the downloaded pages that lead to other pages
produce additional disk files (e.g., if you told Wget to download
recursively).
IOW, the file-name encoding issue happens when a Web page needs to be saved to
a file for some reason.
> If the bytes were downloaded with the correct encoding, and written to the
file system with the correct encoding, I would expect it to be able to parse
the file with the correct encoding.
What is the "correct encoding", though?
> the file `wget-test.html` has no non-ascii characters in it
Of course, it doesn't: the non-ASCII characters appear when we decode the
hex-encoded bytes.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?60287>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [bug #60287] Windows recursive download escapes utf8 URLs twice, Cameron Tacklind, 2021/03/25
- [bug #60287] Windows recursive download escapes utf8 URLs twice, Cameron Tacklind, 2021/03/25
- [bug #60287] Windows recursive download escapes utf8 URLs twice, Eli Zaretskii, 2021/03/25
- [bug #60287] Windows recursive download escapes utf8 URLs twice, Cameron Tacklind, 2021/03/26
- [bug #60287] Windows recursive download escapes utf8 URLs twice, Eli Zaretskii, 2021/03/26
- [bug #60287] Windows recursive download escapes utf8 URLs twice, Cameron Tacklind, 2021/03/26
- [bug #60287] Windows recursive download escapes utf8 URLs twice, Eli Zaretskii, 2021/03/26
- [bug #60287] Windows recursive download escapes utf8 URLs twice, Cameron Tacklind, 2021/03/26
- [bug #60287] Windows recursive download escapes utf8 URLs twice,
Eli Zaretskii <=
- [bug #60287] Windows recursive download escapes utf8 URLs twice, Cameron Tacklind, 2021/03/27
- [bug #60287] Windows recursive download escapes utf8 URLs twice, Eli Zaretskii, 2021/03/28
- [bug #60287] Windows recursive download escapes utf8 URLs twice, Cameron Tacklind, 2021/03/28
- [bug #60287] Windows recursive download escapes utf8 URLs twice, Eli Zaretskii, 2021/03/29
- [bug #60287] Windows recursive download escapes utf8 URLs twice, Cameron Tacklind, 2021/03/29