bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: wget: unable to resolve host address


From: Tim Rühsen
Subject: Re: wget: unable to resolve host address
Date: Fri, 18 Feb 2022 13:35:19 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.1

On 16.02.22 21:04, Seymour J Metz wrote:
Given that RFCs 3490-3492 came out in 2003 and 5890-5895  came out in 2010, I 
would have expected IDNA support by now. Does anybody know for sure?

This issue has nothing to do with IDN support.

It is about the fact that the input file uses a charset that is not compatible with UTF-8 or ASCII, namely UTF-16 [1].

UTF-16 uses 2 or 4 bytes per character, so it needs to be converted into UTF-8 before wget can read it. Also, that file uses a BOM (byte order mark), which needs to be processed.

This does the job:
iconv -f utf-16 -t utf-8 /tmp/url-list.txt > url-list-utf8.txt

Just a small glimpse over to Wget2 :-)
Wget2 understands `--input-encoding=utf-16`, BUT it currently doesn't handle the BOM. This is easy to implement as the code already exists to deal with HTML files encoded as UTF-16 with or without BOM.
I created https://gitlab.com/gnuwget/wget2/-/issues/586 for this.

Regards, Tim

[1] https://en.wikipedia.org/wiki/UTF-16
[2] https://en.wikipedia.org/wiki/Byte_order_mark


________________________________________
From: Bug-wget <bug-wget-bounces+smetz3=gmu.edu@gnu.org> on behalf of 
pythonomorpha@gmail.com <pythonomorpha@gmail.com>
Sent: Tuesday, February 8, 2022 1:26 PM
To: bug-wget@gnu.org
Subject: wget: unable to resolve host address

Hello,

I am trying to download from a list of files (jpeg images). The website
utilizes Cyrillic in its URL. I get the following error message: wget:
unable to resolve host address 'xn--h-xubc'

I've checked the links manually and the do work.

I am enclosing a shortened version of the file list.

I've tried different commands to no avail:

wget.exe -i C:\dl_files\url-list.txt --secure-protocol=auto
--remote-encoding=Windows-1251 -nc -c -P C:\dl_files\

I've used Windows-1251 as I did not see a list of encoding names in the
manual 
https://secure-web.cisco.com/1ooTZPy8h-fBRcp0Zjk_hT6tQbv4w0wsk879mz0uB6aG15KQwcB5um7xiytswPhvpEx2CdU9QntWH_SPxAnAAG2ARAaxmvTXfptU_z__MN1SAGF4Sez144I6e5o6wRDx_cSKPXoTDNyplauirv54vbnDS5kLuXXsirRhFl1o3guYaHHwaf3LYbyLEOP1sfTL44_bLjOocvGciGnBwA68K2ME4JREkRcBuegw_-t6YfWN3v9vCCIziBr8G5DQ-u2wZVCytrHEb423jdgKX3xtQJQrfCnNBUT243xpqVx57lS8cbrgaBTxvUOBIKj0Se4FctlqI9ZanNX4VKAbM5laWTi54FjwlpdEqS5p2a-_mHFAGnfVznDud3Ng47NLEw8LBwKlZSNA26ms9KzvmbbG0zDq3PF5CE_nwWxjc01-0kGa2qeRISiPFM58HpVsAG3Pt/https%3A%2F%2Fwww.gnu.org%2Fsoftware%2Fwget%2Fmanual%2Fwget.html%23Wgetrc-Commands

wget.exe -i C:\dl_files\url-list.txt --secure-protocol=auto -nc -c -P
C:\dl_files\



Apparently the problem is caused by Cyrillic characters. I have inkling that
I am not using the correct options for the program.

I would appreciate if you gave me a hint on how to solve the problem.



Regards,

Max






Attachment: OpenPGP_signature
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]