help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Is there a way to "asciify" a string?


From: James K. Lowden
Subject: Re: Is there a way to "asciify" a string?
Date: Thu, 31 May 2018 19:23:48 -0400

On Thu, 31 May 2018 17:42:33 +0200
Marcin Borkowski <mbork@mbork.pl> wrote:

> > I really strongly recommend you try to solve this problem by doing
> > nothing: keep the name in its full glory.  Nowadays users *should*
> > expect this to work.
> 
> It's tempting, but no: these files will eventually be sent to
> e.g. people on Windows XP and the like.  I don't want to take risks of
> unreadable filenames.

It's good advice, though treacherous.  If you use any encoding other
than ASCII, you'll need to indicate the encoding used, and put up with
recipients who don't know what "encoding" is, or can't re-encode the
names to their machine's preferred encoding.  

For instance, if you send UTF-8, you can expect befuddlement from
Windows users, whose system implicitly recognizes UTF-16LE.  

I can hardly blame you for not wanting to do that.  

If Windows's filename rules were the actual constraint, the allowed
characters in a Windows filename is well defined.  The
prohibited characters could be URL-encoded or similar.  That would
yield a recognizable, unique name, and the original could be recovered
by reversing the process.  

If I were solving your problem, I'd look for something similar to what
you describe, but wholly reversible.  I'd use ascii//TRANSLIT or similar
to get the "unaccented" version of the character, and insert a
URL-style escape after each one representing the original
Unicode character in hex.  So, 

        Jönköping

becomes

        Jo%F6nko%F6ping

If you escape literal percent signs, too, ("%" becomes "%%25") then
the reversal rule is simply "for every /%[:xdigit:]{2}/, replace the
previous character with the indicated codepoint".  

This approach preserves uniqueness in the filename, so you can dispense
with "uniquifying" it with a meaningless integer.  

--jkl


reply via email to

[Prev in Thread] Current Thread [Next in Thread]