bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Bug-wget] Incorrect handling of Cyrillic characters in http request - a


From: Stephen Wells
Subject: [Bug-wget] Incorrect handling of Cyrillic characters in http request - any workaround?
Date: Tue, 31 Mar 2015 18:11:58 +0100

Dear all - I am currently trying to use wget to obtain mp3 files from the
Google Translate TTS system. In principle this can be done using:

wget -U Mozilla -O "${string}.mp3" "
http://translate.google.com/translate_tts?tl=TL&q=${string}";

where TL is a twoletter language code (en,fr,de and so on).

However I am meeting a serious error when I try to send Russian strings
(tl=ru) in Cyrillic characters. I'm working in a UTF-8 environment (under
Cygwin) and the file system will display the cyrillic strings no problem.
If I provide a command like this:

http://translate.google.com/translate_tts?tl=ru&q=мазать

wget incorrectly processes the Cyrillic characters _before_ sending the
http request, so what it actually requests is:


http://translate.google.com/translate_tts?tl=ru&q=%D0%BC%D0%B0%D0%B7%D0%B0%D1%82%D1%8C

This of course produces a string of gibberish in the resulting mp3 file!

Is there any way to make wget actually send the string it is given, instead
of mangling it on the way out? This is really blocking me.

Cheers,
Stephen


reply via email to

[Prev in Thread] Current Thread [Next in Thread]