bug-wget
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] bad filename


From: Andries E. Brouwer
Subject: Re: [Bug-wget] bad filename
Date: Thu, 24 Apr 2014 20:00:18 +0200
User-agent: Mutt/1.5.21 (2010-09-15)

On Thu, Apr 24, 2014 at 03:43:40PM +0200, Tim Ruehsen wrote:

> 1. How do you know, what filesystem you are writing to ?
> I just think of these fat32 USB sticks flying around everywhere. 
> UTF-8 might be a problem (see 
> http://en.wikipedia.org/wiki/Comparison_of_file_systems).
> I just mention fat32, because it is pretty common.

Wget already knows about such restrictions.
These "high control bytes" have no special status in FAT32,
so not escaping them does not introduce any problems there.

> 2. Backward compatibility.

In this particular case I see no reason to expect any problems.

> 3. (Strictly another issue) If we touch the code, what about
> --restrict-file-names=nocontrol,lowercase ?
> Should we case-convert UTF-8 ?

Fortunately, as you say, our present topic is unrelated to case conversion.

> My answer is yes (and that is what I did in the already mentioned Mget).

My answer would be that case converting UTF-8 is something to avoid.
For ASCII, case conversion is simple and well-defined.
For backward compatibility it may be necessary to convert a-z to A-Z
or conversely. 

(What is the upper case of 'i'? Maybe 'I'? In Turkey it is İ,
while I is the upper case of ı. What is the upper case of 'ß'?
Maybe 'SS'? But then, what is the lower case of 'SS'? Maybe 'ss'?
What is the lower case of Σ? Is it σ or ς?
Case conversion is difficult.)

Andries



reply via email to

[Prev in Thread] Current Thread [Next in Thread]