bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Incorrect handling of Cyrillic characters in http request


From: Tim Rühsen
Subject: Re: [Bug-wget] Incorrect handling of Cyrillic characters in http request - any workaround?
Date: Tue, 31 Mar 2015 22:50:18 +0200
User-agent: KMail/4.14.2 (Linux/3.16.0-4-amd64; KDE/4.14.2; x86_64; ; )

Hi Steven,

Am Dienstag, 31. März 2015, 18:11:58 schrieb Stephen Wells:
> Dear all - I am currently trying to use wget to obtain mp3 files from the
> Google Translate TTS system. In principle this can be done using:
> 
> wget -U Mozilla -O "${string}.mp3" "
> http://translate.google.com/translate_tts?tl=TL&q=${string}";
> 
> where TL is a twoletter language code (en,fr,de and so on).
> 
> However I am meeting a serious error when I try to send Russian strings
> (tl=ru) in Cyrillic characters. I'm working in a UTF-8 environment (under
> Cygwin) and the file system will display the cyrillic strings no problem.
> If I provide a command like this:
> 
> http://translate.google.com/translate_tts?tl=ru&q=мазать
> 
> wget incorrectly processes the Cyrillic characters _before_ sending the
> http request, so what it actually requests is:
> 
> http://translate.google.com/translate_tts?tl=ru&q=%D0%BC%D0%B0%D0%B7%D0%B0%D
> 1%82%D1%8C

This seems to be the correct behavior of a web client.
The URL in the GET request is transmitted UTF-8 encoded and percent escaping 
is performed for chars >127 (not mentioning control chars here).

> This of course produces a string of gibberish in the resulting mp3 file!

This is something different. If you are talking about the file name, well 
there is --restrict-file-names=nocontrol. Did you give it a try ?

> Is there any way to make wget actually send the string it is given, instead
> of mangling it on the way out? This is really blocking me.

From what you write, I am unsure if you are talking about the resulting file 
name or about HTTP URL encoding in a GET request.

Regards, Tim

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]