bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] bad filenames (again)


From: Eli Zaretskii
Subject: Re: [Bug-wget] bad filenames (again)
Date: Tue, 18 Aug 2015 17:45:13 +0300

> Date: Tue, 18 Aug 2015 12:55:50 +0200
> From: "Andries E. Brouwer" <address@hidden>
> Cc: address@hidden, "Andries E. Brouwer" <address@hidden>,
>         Eli Zaretskii <address@hidden>
> 
> The point is: it is the user's choice to load a font. (Or to set a locale.)

Most users never change a locale, unless they are trying something
special, precisely because their file names will display as mujibake.
So wget should IMO by default cater to this use case, and allow saving
the bytes verbatim as an option.

> For historical reasons a single directory can have files with names
> in several character sets.

Again, this is a rare situation.  We shouldn't punish the majority on
behalf of such rare use cases.

> All this is about the local situation. One cannot know "the character set"
> of a filename because that concept does not exist in Unix.

Of course, it exists.  The _filesystem_ doesn't know it, but users do.

> About the remote situation even less is known.

Assuming UTF-8 will go a long way towards resolving this.  When this
is not so, we have the --remote-encoding switch.

> It would be terrible if wget decided to use obscure heuristics to
> invent a remote character set and then invoke iconv.

But what you suggest instead -- create a file name whose bytes are an
exact copy of the remote -- is just another heuristic.  And the
effects are no less terrible, because file names will become
illegible, especially on systems where UTF-8 is not the locale's
codeset.

I'm okay with having an option to do that, but it shouldn't be the
default, IMO.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]