[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[AUCTeX-devel] preview-latex coding system problem with Japanese LaTeX

From: Ikumi Keita
Subject: [AUCTeX-devel] preview-latex coding system problem with Japanese LaTeX
Date: Fri, 30 Sep 2016 22:09:26 +0900

Dear AUCTeX developers,

I have some problems with preview-latex with regard to the coding system
when I use Japanese LaTeX.  Since the recent TeXLive contains Japanese
LaTeX by default, I suppose that non-Japanese users can experience the
problems if sample file is provided.  So I organize this email as the
following 3 parts:

A. The problems are described with the attached sample files so
   that anyone can actually experience the situation and examine
   what's going on in detail.
B. The reasons of the problems are explained and their tentative fixes
   are proposed by the attached patches.
C. The patches in B. fix problems only partially.  The remaining
   problem is described and call for help is expressed.

A. There are two problems.  I will describe them in order.
A-1. How to reproduce:
(1) Start a new emacs session with
env LC_ALL=ja_JP.SJIS emacs &
    and enable preview-latex.
(2) Open the attached file "preview-error-test.tex", which has many
    \section lines.  They are all commented out initially.
(3) Uncomment any one of them and start preview-latex with C-c C-p C-d.
    Answer with n to "Cache preamble?" question.  Then the error or bad
    result described on the next line of the uncommented \section will
    occur, e.g.
Invalid regexp: "Unmatched ( or \\("
(4) Comment out again that \section line, uncomment another \section
    line, and try C-c C-p C-d again.  Another error will come out.
(5) Repeat the procedure described in (4).

The process (3) will not work if your tex distribution lacks the
Japanese LaTeX command binary "platex".  In that case, please check up
the following list.
o Be sure to install TeXLive.  Other tex distributions usually lack
  Japanese TeX engines.
o If you (or the package manager you are using) didn't select a scheme
  large enough when installing TeXLive, Japanese LaTeX suite is not
  present on your machine.
o Japanese TeX was first included in TeXLive several years ago.  Thus if
  your TeXLive is older than that, Japanese LaTeX is not available.
o If your ghostscript is not configured to handle PS file with Japanese
  font, the character in the preview image may be garbled.  However,
  that is not the point I'm speaking of now.  Rather, it is the error in
  regexp match preventing preview-latex to do the job that I'd like you
  to look at.

A-2. How to reproduce:
(1) This time, start a new emacs session with another locale
env LC_ALL=ja_JP.eucJP emacs &
    and enable preview-latex.
(2) Open the attached file "preview-error-test2.tex" and type C-c C-p
    C-d.  This time, answer with y, not n, to "Cache preamble?"
(3) Then the preview image will come out at wrong position.

This example requires `platex' binary, too.

B. The reasons and tentative fixes to the problems.
B-1. Shift-JIS encoding problem.
The bad results demonstrated in A-1 are caused by the nature of the
coding system `japanese-shift-jis' (SJIS for short).  SJIS is one of the
major encodings for Japanese text and the standard encoding in the
Japanese edition of windows for historical reasons.  Basically, SJIS
represents one Japanese character by two bytes.  Examples of such
two-byte sequences are, in hexadecimal form:

8E 82


81 5B

.  While the first byte of the sequence is always 8-bit (MSB on), the
second is not necessarily so.  In the above two examples, the second
byte of the first example (82) is 8-bit, but the second one (5B) is
7-bit (MSB off).  It is this 7-bit byte that brings the problems in A-1
above.  Unfortunately, this 7-bit byte sometimes coincides with a regexp
meta character.  Thus it is interfered with `regexp-quote' in the
function `preview-error-quote'.  Roughly speaking, 'preview-error-quote'
works along this flow:
1. Encodes string in the given coding system (i.e., SJIS in this
2. Replaces texts which begin with "^^" with the corresponding byte.
3. Supplies regular expression, for later use to locate the position
   in the buffer for putting the preview image, guarding the meta
   character in the original text by `regexp-quote'.
4. Decodes back the obtained string out of the coding system again.
However, when `regexp-quote' in the item 3 quotes the 7-bit byte in
SJIS, decoding back fails to gain the original character.

The following example illustrates what is going on:
(let* ((s1 (char-to-string (make-char 'japanese-jisx0208 37 63)))
       ;; s1 is multibyte Japanese string.
       ;; Encode s1 in SJIS.
       (s2 (encode-coding-string s1 'shift_jis))
       ;; At this point s2 is "\203^".
       (s3 (regexp-quote s2))
       ;; Now s3 is "\203\\^".
       ;; Then decode back assuming SJIS encoding.
       (s4 (decode-coding-string s3 'shift_jis)))
  (string-equal s1 s4))
=> nil ;; no longer goes back to the original string s1.

The attached patch "preview-latex-fix" is my approach to fix this
problem.  It avoids to handle encoded string and does the relavant
operations on the decoded string consistently.  (In addition, it fixes a
problem that `char-to-string' in the original code does not do the
expected job in unicode-based emacs for chars of #x80 through #xFF.  I
changed to use `byte-to-string' instead when that function is

B-2. preview-latex drops the necessary command option.
Japanese TeX command sometimes needs "-kanji" option to know the coding
system of the given TeX file.  In AUCTeX, this requirement is usually
covered by the "%(kanjiopt)" construct in the following lines quoted
from tex-jp.el:

(setq TeX-engine-alist-builtin
      (append TeX-engine-alist-builtin
             '((ptex "pTeX" "ptex %(kanjiopt)" "platex %(kanjiopt)" "eptex")
               (jtex "jTeX" "jtex" "jlatex" nil)
               (uptex "upTeX" "euptex" "uplatex" "euptex"))))

This "%(kanjiopt)" is changed to suitable option string like "-kanji
XXX" when necessary.  However, if the answer to the question "Cache
preamble?" is y, preview-latex drops this option, which leads to the
results described in A-2 above.

The reason why the option "-kanji XXX" is missing is that
`TeX-inline-preview-internal' transforms the command line passed to the
OS shell by `(preview-do-replacements command
preview-undump-replacements)' when caching preamble is enabled.  Here
the regular expression in `preview-undump-replacements' is designed to
pick up the very first word of the value of the variable `command',
leaving behind the option "-kanji XXX".

The attached patch "preview-latex-fix2" aims to resolve this problem.
It gives back the latex command options provided in the entry which
`(TeX-engine-alist)' returns so that the command will run smoothly.

C. Call for help
There are still some problems remained.  I think we should have a
integrated framework which can serve for both preview-latex and
tex-jp.el to determine the suitable process coding system.

The coding systems to communicate with Japanese TeX command are not
constant but vary with the environments.  In fact it can only be
determined at run time.  Currently that situation is handled by the
function `japanese-TeX-set-process-coding-system' in tex-jp.el during
the normal runs.  That function is set to the value of
`TeX-after-start-process-function' and called after the TeX process
starts.  In that way, the process coding systems are set to suitable
values under the environment at that point of time.  However, the way
preview-latex handles process coding systems sometimes conflicts with
such setting.  For example, `TeX-inline-preview-internal' overwrites the
process coding system after `japanese-TeX-set-process-coding-system'
does its job.  (Current preview-latex uses the value of
`TeX-japanese-process-output-coding-system', but it is not sufficient to
rely on such constant value.  In fact the default value of
`TeX-japanese-process-output-coding-system' was changed to nil
recently.)  Even my patch "preview-latex-fix" is not sufficient about
this point.  The coding-system argument supplied to
`decode-coding-string' should not simply be `buffer-file-coding-system'.

I would appreciate if anyone who has deeper knowledge of AUCTeX could
help to resolve all these coding system issues in preview-latex.

Best regards,
Ikumi Keita

P.S. I subscribed to auctex-devel ML temporarily, so it is not necessary
to put me on CC: when replying.  I will stay on the ML until the
discussion about this issue is settled.

Attachment: preview-error-test.tex
Description: bad results with SJIS encoding

Attachment: preview-error-test2.tex
Description: command option is dropped when caching preamble

Attachment: preview-latex-fix
Description: tentative fix to SJIS problem

Attachment: preview-latex-fix2
Description: tentative fix to regain option string

reply via email to

[Prev in Thread] Current Thread [Next in Thread]