|Subject:||[AUCTeX-devel] preview-latex coding system problem with Japanese LaTeX|
|Date:||Fri, 30 Sep 2016 22:09:26 +0900|
Dear AUCTeX developers, I have some problems with preview-latex with regard to the coding system when I use Japanese LaTeX. Since the recent TeXLive contains Japanese LaTeX by default, I suppose that non-Japanese users can experience the problems if sample file is provided. So I organize this email as the following 3 parts: A. The problems are described with the attached sample files so that anyone can actually experience the situation and examine what's going on in detail. B. The reasons of the problems are explained and their tentative fixes are proposed by the attached patches. C. The patches in B. fix problems only partially. The remaining problem is described and call for help is expressed. A. There are two problems. I will describe them in order. A-1. How to reproduce: (1) Start a new emacs session with env LC_ALL=ja_JP.SJIS emacs & and enable preview-latex. (2) Open the attached file "preview-error-test.tex", which has many \section lines. They are all commented out initially. (3) Uncomment any one of them and start preview-latex with C-c C-p C-d. Answer with n to "Cache preamble?" question. Then the error or bad result described on the next line of the uncommented \section will occur, e.g. Invalid regexp: "Unmatched ( or \\(" (4) Comment out again that \section line, uncomment another \section line, and try C-c C-p C-d again. Another error will come out. (5) Repeat the procedure described in (4). The process (3) will not work if your tex distribution lacks the Japanese LaTeX command binary "platex". In that case, please check up the following list. o Be sure to install TeXLive. Other tex distributions usually lack Japanese TeX engines. o If you (or the package manager you are using) didn't select a scheme large enough when installing TeXLive, Japanese LaTeX suite is not present on your machine. o Japanese TeX was first included in TeXLive several years ago. Thus if your TeXLive is older than that, Japanese LaTeX is not available. o If your ghostscript is not configured to handle PS file with Japanese font, the character in the preview image may be garbled. However, that is not the point I'm speaking of now. Rather, it is the error in regexp match preventing preview-latex to do the job that I'd like you to look at. A-2. How to reproduce: (1) This time, start a new emacs session with another locale env LC_ALL=ja_JP.eucJP emacs & and enable preview-latex. (2) Open the attached file "preview-error-test2.tex" and type C-c C-p C-d. This time, answer with y, not n, to "Cache preamble?" question. (3) Then the preview image will come out at wrong position. This example requires `platex' binary, too. B. The reasons and tentative fixes to the problems. B-1. Shift-JIS encoding problem. The bad results demonstrated in A-1 are caused by the nature of the coding system `japanese-shift-jis' (SJIS for short). SJIS is one of the major encodings for Japanese text and the standard encoding in the Japanese edition of windows for historical reasons. Basically, SJIS represents one Japanese character by two bytes. Examples of such two-byte sequences are, in hexadecimal form: 8E 82 and 81 5B . While the first byte of the sequence is always 8-bit (MSB on), the second is not necessarily so. In the above two examples, the second byte of the first example (82) is 8-bit, but the second one (5B) is 7-bit (MSB off). It is this 7-bit byte that brings the problems in A-1 above. Unfortunately, this 7-bit byte sometimes coincides with a regexp meta character. Thus it is interfered with `regexp-quote' in the function `preview-error-quote'. Roughly speaking, 'preview-error-quote' works along this flow: 1. Encodes string in the given coding system (i.e., SJIS in this example). 2. Replaces texts which begin with "^^" with the corresponding byte. 3. Supplies regular expression, for later use to locate the position in the buffer for putting the preview image, guarding the meta character in the original text by `regexp-quote'. 4. Decodes back the obtained string out of the coding system again. However, when `regexp-quote' in the item 3 quotes the 7-bit byte in SJIS, decoding back fails to gain the original character. The following example illustrates what is going on: (let* ((s1 (char-to-string (make-char 'japanese-jisx0208 37 63))) ;; s1 is multibyte Japanese string. ;; Encode s1 in SJIS. (s2 (encode-coding-string s1 'shift_jis)) ;; At this point s2 is "\203^". (s3 (regexp-quote s2)) ;; Now s3 is "\203\\^". ;; Then decode back assuming SJIS encoding. (s4 (decode-coding-string s3 'shift_jis))) (string-equal s1 s4)) => nil ;; no longer goes back to the original string s1. The attached patch "preview-latex-fix" is my approach to fix this problem. It avoids to handle encoded string and does the relavant operations on the decoded string consistently. (In addition, it fixes a problem that `char-to-string' in the original code does not do the expected job in unicode-based emacs for chars of #x80 through #xFF. I changed to use `byte-to-string' instead when that function is available.) B-2. preview-latex drops the necessary command option. Japanese TeX command sometimes needs "-kanji" option to know the coding system of the given TeX file. In AUCTeX, this requirement is usually covered by the "%(kanjiopt)" construct in the following lines quoted from tex-jp.el: (setq TeX-engine-alist-builtin (append TeX-engine-alist-builtin '((ptex "pTeX" "ptex %(kanjiopt)" "platex %(kanjiopt)" "eptex") (jtex "jTeX" "jtex" "jlatex" nil) (uptex "upTeX" "euptex" "uplatex" "euptex")))) This "%(kanjiopt)" is changed to suitable option string like "-kanji XXX" when necessary. However, if the answer to the question "Cache preamble?" is y, preview-latex drops this option, which leads to the results described in A-2 above. The reason why the option "-kanji XXX" is missing is that `TeX-inline-preview-internal' transforms the command line passed to the OS shell by `(preview-do-replacements command preview-undump-replacements)' when caching preamble is enabled. Here the regular expression in `preview-undump-replacements' is designed to pick up the very first word of the value of the variable `command', leaving behind the option "-kanji XXX". The attached patch "preview-latex-fix2" aims to resolve this problem. It gives back the latex command options provided in the entry which `(TeX-engine-alist)' returns so that the command will run smoothly. C. Call for help There are still some problems remained. I think we should have a integrated framework which can serve for both preview-latex and tex-jp.el to determine the suitable process coding system. The coding systems to communicate with Japanese TeX command are not constant but vary with the environments. In fact it can only be determined at run time. Currently that situation is handled by the function `japanese-TeX-set-process-coding-system' in tex-jp.el during the normal runs. That function is set to the value of `TeX-after-start-process-function' and called after the TeX process starts. In that way, the process coding systems are set to suitable values under the environment at that point of time. However, the way preview-latex handles process coding systems sometimes conflicts with such setting. For example, `TeX-inline-preview-internal' overwrites the process coding system after `japanese-TeX-set-process-coding-system' does its job. (Current preview-latex uses the value of `TeX-japanese-process-output-coding-system', but it is not sufficient to rely on such constant value. In fact the default value of `TeX-japanese-process-output-coding-system' was changed to nil recently.) Even my patch "preview-latex-fix" is not sufficient about this point. The coding-system argument supplied to `decode-coding-string' should not simply be `buffer-file-coding-system'. I would appreciate if anyone who has deeper knowledge of AUCTeX could help to resolve all these coding system issues in preview-latex. Best regards, Ikumi Keita P.S. I subscribed to auctex-devel ML temporarily, so it is not necessary to put me on CC: when replying. I will stay on the ML until the discussion about this issue is settled.
Description: bad results with SJIS encoding
Description: command option is dropped when caching preamble
Description: tentative fix to SJIS problem
Description: tentative fix to regain option string
|[Prev in Thread]||Current Thread||[Next in Thread]|