[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] [bug #47701] wget 1.17.1 fails to convert from percent encodi

From: anonymous
Subject: [Bug-wget] [bug #47701] wget 1.17.1 fails to convert from percent encoding to unicode correctly (mingw32)
Date: Fri, 15 Apr 2016 04:31:09 +0000
User-agent: Mozilla/5.0 (Windows NT 10.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/49.0.2623.112 Safari/537.36


                 Summary: wget 1.17.1 fails to convert from percent encoding
to unicode correctly (mingw32)
                 Project: GNU Wget
            Submitted by: None
            Submitted on: Fri 15 Apr 2016 04:31:08 AM UTC
                Category: Program Logic
                Severity: 3 - Normal
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
         Originator Name: Anonymous Coward
        Originator Email: 
             Open/Closed: Open
         Discussion Lock: Any
                 Release: 1.17.1
        Operating System: Microsoft Windows
         Reproducibility: None
           Fixed Release: None
         Planned Release: None
              Regression: None
           Work Required: None
          Patch Included: None



The version of GNU Wget you were using

1.17.1 (mingw32)

How you invoked wget

wget -d -r -np -nH --cut-dirs=4

(in case this is some sort of mingw32-compiler bug)
I obtained my binary copy of mingw32 wget from
https://eternallybored.org/misc/wget/ as recommended by

What you expected wget to do

Recursively download the requested open directory.

What wget did (include output messages).

didn't download it.

I tried running with debug on, from looking at the
output it seems wget converted from the percent-
encoded URL to w32's native unicode encoding wrong.

(full copy (input and output) of my run of wget is
included as an attachment named "wget-output.txt".)

specifically, the output includes this line (exactly.)

(ASCII) -> 'https://leoandpeto.com/Music/Peto/Non/o-z/RC6yksopp - The
Understanding/' (UTF-8)

wget changed the word from "R%C3%B6yksopp" to "RC6yksopp"
with no percent signs.

It seems to be stripping the leading first bit off of
each byte of %C3 (11000011) and %B6 (10110110), and so
converting them into their 7-bit ASCII equivalents:

| HEX | BIN      | ASCII |
| C3  | 11000011 | --    |
| 43  |  1000011 | C     |
| B6  | 10110110 | --    |
| 36  |  0110110 | 6     |

Also, I'm not 100% sure that this isn't a duplicate of
but I figured it's best to let you developers decide
rather than failing to file a bug-report.

Thank you for making/working on wget, and


File Attachments:

Date: Fri 15 Apr 2016 04:31:08 AM UTC  Name: wget-output.txt  Size: 5kB   By:



Reply to this item at:


  Message sent via/by Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]