help-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Cyrillic VC Git commit messages


From: Eli Zaretskii
Subject: Re: Cyrillic VC Git commit messages
Date: Sat, 22 Nov 2014 15:42:05 +0200

> From: Nikolay Kudryavtsev <nikolay.kudryavtsev@gmail.com>
> Date: Fri, 21 Nov 2014 19:48:47 +0300
> 
> > Can you tell how you decided that, or where did you see that described?
> That part implies that there is some new functionality in msysgit that 
> does the recoding for windows cmd.exe.
> 
> > And it sends codepage 1252 (not 1251) to the cmd.exe window?
> It first decodes the message with logoutputencoding, then recodes it 
> with windows-1252. If you set logoutputencoding to windows-1251, like I 
> do, it breaks cmd.exe output.
> 
> > Moreover, you seem to say that Git outputs in UTF-8 even though you
> > customized i18n.logoutputencoding to be windows-1251?
> For vc log the second encoding with windows-1252 does not happen.
> 
> For the commit message,  git first recodes from windows-1251 to utf-8 
> and then recodes to commitencoding. This behavior is shared when called 
> from VC and cmd.exe.

I looked into this some more and ran some simple tests, and I'm not
sure I see the same behavior as the one you describe.

First, preliminaries: I tried this with msysGit version
1.9.4.msysgit.2 (the latest binary release) on Windows XP SP3.  I
cannot easily set up a Cyrillic locale on my machine, so I tried the
Latin-1 locale, i.e. codepage 1252, instead.  Also, I only have access
to a Git repository whose commit log messages are encoded in UTF-8, so
that's what I tried.

What I see is this:

  . By default, Git outputs commit log messages in UTF-8 when
    redirected to a file and to Emacs.  When it writes to the console,
    Git seems to use WriteConsoleW API after converting text from
    UTF-8 to UTF-16.  The Windows console then displays that text
    according to the current codepage, converting to the supported
    characters if it can, and displaying '?' characters if not.

  . If I set i18n.logoutputencoding = windows-1252, Git outputs commit
    log messages in that encoding, both to the cmd, when redirected to
    a file, and to Emacs (I tried "C-x v L" command to see that).

This behavior looks reasonable and expectable, given what the
documentation says.  In particular, I see no differences between the
encoding Git outputs to the console and to Emacs.

Please note that there's one more player in this game, when you invoke
Git from cmd.exe prompt: in some versions of msysGit, when you type a
Git command at cmd.exe prompt, what gets invoked is a git.cmd batch
file supplied by msysGit, and that batch file manipulates the console
codepage.  (On my system, I disabled that manipulation, because it
interferes with Git invocations from Emacs.)  So it could be that what
that batch file does is one reason for the unreasonable behavior you
describe.

If git.cmd is not the culprit, or if you run Git not through such a
batch file, then perhaps you could see what encoding Git emits in the
above 3 scenarios: to console, to file, and to Emacs.  Also, please
tell how you determine the encoding in each case.

P.S.  I tried to verify my observations by looking at the msysGit
sources, but I cannot find the source distribution that corresponds to
the 1.9.4.msysgit.2 binaries I installed.  The download page provides
a link to "Source code", but what gets downloaded by clicking that
link is binaries without sources, which AFAIU is against the GPL.

HTH



reply via email to

[Prev in Thread] Current Thread [Next in Thread]