help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: on eshell's encoding


From: Eli Zaretskii
Subject: Re: on eshell's encoding
Date: Tue, 02 Aug 2016 18:12:19 +0300

> From: Daniel Bastos <dbastos@toledo.com>
> Date: Tue, 02 Aug 2016 10:24:32 -0300
> 
> > Like I said, Eshell is not a shell, it just pretends to be one.  It
> > will eventually cause execve, or something like it, to be called, but
> > before it, the command-line arguments will be encoded in the locale's
> > encoding, since that's what execve expects.  This is true on Windows
> > and on Unix alike.  
> 
> That's true of EMACS.  You're saying EMACS always encodes the command
> line arguments.  But what I said about UNIX is that whatever execve
> receives in argv[] will remain as such, which apparently is not the
> MS-Windows behavior.
> 
> Precisely: if on UNIX I use EMACS to call /program/ with argv[] encoded
> in X, then /program/ will definitely receive its argv[] as prepared by
> EMACS.  That does not happen on MS-Windows.  EMACS encodes the command
> line in utf-8, but /program/ receives it in another encoding.

That's not true.  Emacs encodes the command line passed to
subprocesses on Windows and Unix alike.  On each OS, it always encodes
them in the locale's codeset.  If the Unix locale specified UTF-8 as
its codeset, then the command line will be encoded in UTF-8, but
that's no more than a coincidence.  (On Windows, the locale's codeset,
a.k.a. "system codepage", can never be UTF-8, but that's the only
difference between Unix and Windows wrt encoding command lines of
subprocesses by Emacs.)

So, as long as you launch processes from Emacs, the difference between
Windows and Unix in this respect is all but non-existent.

The difference between the 2 OSes comes into play when you put
arbitrary byte sequences into argv[] passed to execve etc.  (This
cannot be easily done in Emacs, but you can do that in your own
programs.)  If those bytes are not valid for the locale's codeset,
Unix will nevertheless pass them verbatim to the subprogram.  By
contrast, Windows will convert those bytes to UTF-16, assuming they
are in the current locale's codeset, then convert back to that codeset
when it invokes the subprogram.  This conversion is lossy when the
bytes are not valid for the locale, as Windows will replace the
invalid bytes with either their close equivalents or with blanks or
with question marks.  (When these bytes are all valid in the current
locale, this conversion happens as well, but it's not lossy, and
therefore its effect is exactly as on Unix.)

> This surprises me.  MS-Windows should not care what a program puts in
> argv[].

It cares, because it attempts to transparently support both Unicode
programs, which expect their arguments in UTF-16, and non-Unicode
programs which expect their arguments in the locale's codeset.

> I think it violates an important principle: an operating system
> should help programs to communicate, but it should not care what they're
> saying to each other.  That's an important principle UNIX has given us.

Clearly, Unix and Windows differ in their philosophy in this regard.
Each alternative has its advantages and disadvantages; which one you
like better is up to you.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]