[6995] xetex multibyte support

texinfo-commits
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[6995] xetex multibyte support

From:	Gavin D. Smith
Subject:	[6995] xetex multibyte support
Date:	Sun, 07 Feb 2016 10:23:35 +0000
Revision: 6995
          http://svn.sv.gnu.org/viewvc/?view=rev&root=texinfo&revision=6995
Author:   gavin
Date:     2016-02-07 10:23:34 +0000 (Sun, 07 Feb 2016)
Log Message:
-----------
xetex multibyte support

Modified Paths:
--------------
    trunk/ChangeLog
    trunk/doc/texinfo.tex

Modified: trunk/ChangeLog
===================================================================
--- trunk/ChangeLog     2016-02-06 20:04:40 UTC (rev 6994)
+++ trunk/ChangeLog     2016-02-07 10:23:34 UTC (rev 6995)
@@ -1,3 +1,48 @@
+2016-02-07  Masamichi Hosoda  <address@hidden>
+
+        * doc/texinfo.tex:
+       Add native Unicode support for XeTeX and LuaTex.
+
+       (\iftxinativeunicodecapable): New switch.
+       (\iftxiusebytewiseio): New switch.
+
+        (\setbytewiseio): Set I/O by bytes instead of UTF-8 sequence
+       for XeTeX and LuaTex non-UTF-8 (byte-wise) encodings.
+
+       (\documentencoding): Remove input by bytes settings for XeTeX.
+       Add I/O by bytes settings for single-byte encodings.
+       Add native Unicode settings for UTF-8 encoding.
+
+       (\U): Any Unicode characters can be used by native Unicode.
+
+       (\DeclareUnicodeCharacterUTFviii): Rename from
+       \DeclareUnicodeCharacter.
+       (\DeclareUnicodeCharacterNative): For native Unicode,
+       Definition macro to replace the Unicode character.
+       (\DeclareUnicodeCharacterNativeThru): For native Unicode,
+       Definition macro not to replace (through) the Unicode character.
+       (\DeclareUnicodeCharacterNativeAtU): For native Unicode,
+       Definition macro that is used by @U command.
+       (\DeclareUnicodeCharacterNativeOther): For native Unicode,
+       Definition macro that is set catcode other non global.
+
+       (\unicodechardefs): Rename from \utfeightchardefs.
+       (\utfeightchardefs): UTF-8 byte sequence definitions (replacing and
+       @U command). It makes the setting that replace UTF-8 byte sequence.
+       (\nativeunicodechardefs): Native Unicode character replacing
+       definitions. It makes the setting that replace the Unicode characters.
+       (\nativeunicodechardefsthru): Native Unicode character ``through''
+       definitions. It makes the setting that does not replace
+       the Unicode characters.
+       (\nativeunicodechardefsatu): Native Unicode @U command definitions.
+       (\nativeunicodecharscatcodeothernonglobal):
+       Native Unicode catcode other non global definitions.
+       (\setcharscatcodeothernonglobal):
+       Catcode (non-ascii or native Unicode) are set to other non global.
+
+       (\throughcharactersdefs): Character ``through'' definitions.
+       It makes the setting that does not replace the characters.
+
 2016-02-06  Gavin Smith  <address@hidden>
 
        * configure.ac: Update version to 6.1dev.

Modified: trunk/doc/texinfo.tex
===================================================================
--- trunk/doc/texinfo.tex       2016-02-06 20:04:40 UTC (rev 6994)
+++ trunk/doc/texinfo.tex       2016-02-07 10:23:34 UTC (rev 6995)
@@ -3,7 +3,7 @@
 % Load plain if necessary, i.e., if running under initex.
 \expandafter\ifx\csname fmtname\endcsname\relax\input plain\fi
 %
-\def\texinfoversion{2016-02-05.07}
+\def\texinfoversion{2016-02-07.10}
 %
 % Copyright 1985, 1986, 1988, 1990, 1991, 1992, 1993, 1994, 1995,
 % 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006,
@@ -7781,7 +7781,7 @@
   \catcode`\_=\other
   \catcode`\|=\other
   \catcode`\~=\other
-  \ifx\declaredencoding\ascii \else \setnonasciicharscatcodenonglobal\other \fi
+  \ifx\declaredencoding\ascii \else \setcharscatcodeothernonglobal \fi
 }
 
 \def\scanargctxt{% used for copying and captions, not macros.
@@ -8896,7 +8896,7 @@
   \catcode`\\=\other
   %
   % Make the characters 128-255 be printing characters.
-  {\setnonasciicharscatcodenonglobal\other}%
+  {\setcharscatcodeothernonglobal}%
   %
   % @ is our escape character in .aux files, and we need braces.
   \catcode`\{=1
@@ -9501,43 +9501,68 @@
   \global\righthyphenmin = #3\relax
 }
 
-% Get input by bytes instead of by UTF-8 codepoints for XeTeX and LuaTeX, 
-% otherwise the encoding support is completely broken.
+% XeTeX and LuaTeX can handle native Unicode.
+% Their default I/O is UTF-8 sequence instead of byte-wise.
+% Other TeX engine (pdfTeX etc.) I/O is byte-wise.
+%
+\newif\iftxinativeunicodecapable
+\newif\iftxiusebytewiseio
+
 \ifx\XeTeXrevision\thisisundefined
+  \ifx\luatexversion\thisisundefined
+    \txinativeunicodecapablefalse
+    \txiusebytewiseiotrue
+  \else
+    \txinativeunicodecapabletrue
+    \txiusebytewiseiofalse
+  \fi
 \else
-\XeTeXdefaultencoding "bytes"  % For subsequent files to be read
-\XeTeXinputencoding "bytes"  % Effective in texinfo.tex only
-% Unfortunately, there seems to be no corresponding XeTeX command for
-% output encoding.  This is a problem for auxiliary index and TOC files.
-% The only solution would be perhaps to write out @U{...} sequences in
-% place of UTF-8 characters.
+  \txinativeunicodecapabletrue
+  \txiusebytewiseiofalse
 \fi
 
-\ifx\luatexversion\thisisundefined
-\else
-\directlua{
-local utf8_char, byte, gsub = unicode.utf8.char, string.byte, string.gsub
-local function convert_char (char)
-  return utf8_char(byte(char))
-end
+% Set I/O by bytes instead of UTF-8 sequence for XeTeX and LuaTex
+% for non-UTF-8 (byte-wise) encodings.
+%
+\def\setbytewiseio{%
+  \ifx\XeTeXrevision\thisisundefined
+  \else
+    \XeTeXdefaultencoding "bytes"  % For subsequent files to be read
+    \XeTeXinputencoding "bytes"  % For document root file
+    % Unfortunately, there seems to be no corresponding XeTeX command for
+    % output encoding.  This is a problem for auxiliary index and TOC files.
+    % The only solution would be perhaps to write out @U{...} sequences in
+    % place of non-ASCII characters.
+  \fi
 
-local function convert_line (line)
-  return gsub(line, ".", convert_char)
-end
+  \ifx\luatexversion\thisisundefined
+  \else
+    \directlua{
+    local utf8_char, byte, gsub = unicode.utf8.char, string.byte, string.gsub
+    local function convert_char (char)
+      return utf8_char(byte(char))
+    end
 
-callback.register("process_input_buffer", convert_line)
+    local function convert_line (line)
+      return gsub(line, ".", convert_char)
+    end
 
-local function convert_line_out (line)
-  local line_out = ""
-  for c in string.utfvalues(line) do
-     line_out = line_out .. string.char(c)
-  end
-  return line_out
-end
+    callback.register("process_input_buffer", convert_line)
 
-callback.register("process_output_buffer", convert_line_out)
+    local function convert_line_out (line)
+      local line_out = ""
+      for c in string.utfvalues(line) do
+         line_out = line_out .. string.char(c)
+      end
+      return line_out
+    end
+
+    callback.register("process_output_buffer", convert_line_out)
+    }
+  \fi
+
+  \txiusebytewiseiotrue
 }
-\fi
 
 
 % Helpers for encodings.
@@ -9564,13 +9589,6 @@
 %
 \def\documentencoding{\parseargusing\filenamecatcodes\documentencodingzzz}
 \def\documentencodingzzz#1{%
-  % Get input by bytes instead of by UTF-8 codepoints for XeTeX,
-  % otherwise the encoding support is completely broken.
-  % This settings is for the document root file.
-  \ifx\XeTeXrevision\thisisundefined
-  \else
-    \XeTeXinputencoding "bytes"
-  \fi
   %
   % Encoding being declared for the document.
   \def\declaredencoding{\csname #1.enc\endcsname}%
@@ -9587,22 +9605,37 @@
      \asciichardefs
   %
   \else \ifx \declaredencoding \lattwo
+     \iftxinativeunicodecapable
+       \setbytewiseio
+     \fi
      \setnonasciicharscatcode\active
      \lattwochardefs
   %
   \else \ifx \declaredencoding \latone
+     \iftxinativeunicodecapable
+       \setbytewiseio
+     \fi
      \setnonasciicharscatcode\active
      \latonechardefs
   %
   \else \ifx \declaredencoding \latnine
+     \iftxinativeunicodecapable
+       \setbytewiseio
+     \fi
      \setnonasciicharscatcode\active
      \latninechardefs
   %
   \else \ifx \declaredencoding \utfeight
-     \setnonasciicharscatcode\active
-     % since we already invoked \utfeightchardefs at the top level
-     % (below), do not re-invoke it, then our check for duplicated
-     % definitions triggers.  Making non-ascii chars active is enough.
+     \iftxinativeunicodecapable
+       % For native Unicode (XeTeX and LuaTeX)
+       \nativeunicodechardefs
+     \else
+       % For UTF-8 byte sequence (pdfTeX)
+       \setnonasciicharscatcode\active
+       % since we already invoked \utfeightchardefs at the top level
+       % (below), do not re-invoke it, then our check for duplicated
+       % definitions triggers.  Making non-ascii chars active is enough.
+     \fi
   %
   \else
     \message{Ignoring unknown document encoding: #1.}%
@@ -9917,13 +9950,26 @@
 % @U{xxxx} to produce U+xxxx, if we support it.
 \def\U#1{%
   \expandafter\ifx\csname uni:#1\endcsname \relax
-    \errhelp = \EMsimple       
-    \errmessage{Unicode character U+#1 not supported, sorry}%
+    \iftxinativeunicodecapable
+      % Any Unicode characters can be used by native Unicode.
+      % However, if the font does not have the glyph, the letter will miss.
+      \begingroup
+        \uccode`\.="#1\relax
+        \uppercase{.}
+      \endgroup
+    \else
+      \errhelp = \EMsimple     
+      \errmessage{Unicode character U+#1 not supported, sorry}%
+    \fi
   \else
     \csname uni:#1\endcsname
   \fi
 }
 
+% For UTF-8 byte sequence (pdfTeX)
+% Definition macro to replace the Unicode character
+% Definition macro that is used by @U command
+%
 \begingroup
   \catcode`\"=12
   \catcode`\<=12
@@ -9932,7 +9978,7 @@
   \catcode`\;=12
   \catcode`\!=12
   \catcode`\~=13
-  \gdef\DeclareUnicodeCharacter#1#2{%
+  \gdef\DeclareUnicodeCharacterUTFviii#1#2{%
     \countUTFz = "#1\relax
     %\wlog{\space\space defining Unicode char U+#1 (decimal \the\countUTFz)}%
     \begingroup
@@ -9990,6 +10036,44 @@
     \uppercase{\gdef\UTFviiiTmp{#2#3#4}}}
 \endgroup
 
+% For native Unicode (XeTeX and LuaTeX)
+% Definition macro to replace the Unicode character
+%
+\def\DeclareUnicodeCharacterNative#1#2{%
+  \catcode"#1=\active
+  \begingroup
+    \uccode`\~="#1\relax
+    \uppercase{\gdef~}{#2}%
+  \endgroup}
+
+% For native Unicode (XeTeX and LuaTeX)
+% Definition macro not to replace (through) the Unicode character
+%
+\def\DeclareUnicodeCharacterNativeThru#1#2{%
+  \catcode"#1=\active
+  \begingroup
+    \uccode`\.="#1\relax
+    \uppercase{\endgroup \def\UTFNativeTmp{.}}%
+  \begingroup
+    \uccode`\~="#1\relax
+    \uppercase{\endgroup \edef~}{\UTFNativeTmp}%
+}
+
+% For native Unicode (XeTeX and LuaTeX)
+% Definition macro that is used by @U command
+%
+\def\DeclareUnicodeCharacterNativeAtU#1#2{%
+  \def\UTFAtUTmp{#2}
+  \expandafter\globallet\csname uni:#1\endcsname \UTFAtUTmp
+}
+
+% For native Unicode (XeTeX and LuaTeX)
+% Definition macro that is set catcode other non global
+%
+\def\DeclareUnicodeCharacterNativeOther#1#2{%
+  \catcode"#1=\other
+}
+
 % https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_M
 % U+0000..U+007F = https://en.wikipedia.org/wiki/Basic_Latin_(Unicode_block)
 % U+0080..U+00FF = 
https://en.wikipedia.org/wiki/Latin-1_Supplement_(Unicode_block)
@@ -10004,7 +10088,7 @@
 % We won't be doing that here in this simple file.  But we can try to at
 % least make most of the characters not bomb out.
 %
-\def\utfeightchardefs{%
+\def\unicodechardefs{%
   \DeclareUnicodeCharacter{00A0}{\tie}
   \DeclareUnicodeCharacter{00A1}{\exclamdown}
   \DeclareUnicodeCharacter{00A2}{{\tcfont \char162}}% 0242=cent
@@ -10674,14 +10758,57 @@
   %
   \global\mathchardef\checkmark="1370 % actually the square root sign
   \DeclareUnicodeCharacter{2713}{\ensuremath\checkmark}
-}% end of \utfeightchardefs
+}% end of \unicodechardefs
 
+% UTF-8 byte sequence (pdfTeX) definitions (replacing and @U command)
+% It makes the setting that replace UTF-8 byte sequence.
+\def\utfeightchardefs{%
+  \let\DeclareUnicodeCharacter\DeclareUnicodeCharacterUTFviii
+  \unicodechardefs
+}
+
+% Native Unicode (XeTeX and LuaTeX) character replacing definitions
+% It makes the setting that replace the Unicode characters.
+\def\nativeunicodechardefs{%
+  \let\DeclareUnicodeCharacter\DeclareUnicodeCharacterNative
+  \unicodechardefs
+}
+
+% Native Unicode (XeTeX and LuaTeX) character ``through'' definitions
+% It makes the setting that does not replace the Unicode characters.
+\def\nativeunicodechardefsthru{%
+  \let\DeclareUnicodeCharacter\DeclareUnicodeCharacterNativeThru
+  \unicodechardefs
+}
+
+% Native Unicode (XeTeX and LuaTeX) @U command definitions
+\def\nativeunicodechardefsatu{%
+  \let\DeclareUnicodeCharacter\DeclareUnicodeCharacterNativeAtU
+  \unicodechardefs
+}
+
+% Native Unicode (XeTeX and LuaTeX) catcode other non global definitions
+\def\nativeunicodecharscatcodeothernonglobal{%
+  \let\DeclareUnicodeCharacter\DeclareUnicodeCharacterNativeOther
+  \unicodechardefs
+}
+
+% Catcode (non-ascii or native Unicode) are set to other non global.
+\def\setcharscatcodeothernonglobal{%
+  \iftxiusebytewiseio
+    \setnonasciicharscatcodenonglobal\other
+  \else
+    \nativeunicodecharscatcodeothernonglobal
+  \fi
+}
+
 % US-ASCII character definitions.
 \def\asciichardefs{% nothing need be done
    \relax
 }
 
-% Latin1 (ISO-8859-1) character definitions.
+% Non-ASCII bytes ``through'' definitions.
+% It makes the setting that does not replace the non-ASCII byte.
 \def\nonasciistringdefs{%
   \setnonasciicharscatcode\active
   \def\defstringchar##1{\def##1{\string##1}}%
@@ -10727,9 +10854,23 @@
   \defstringchar^^fc\defstringchar^^fd\defstringchar^^fe\defstringchar^^ff%
 }
 
+% Character ``through'' definitions.
+% It makes the setting that does not replace the characters.
+\def\throughcharactersdefs{%
+  \iftxiusebytewiseio
+    \nonasciistringdefs
+  \else
+    \nativeunicodechardefsthru
+  \fi
+}
 
+
 % define all the unicode characters we know about, for the sake of @U.
-\utfeightchardefs
+\iftxinativeunicodecapable
+  \nativeunicodechardefsatu
+\else
+  \utfeightchardefs
+\fi
 
 
 % Make non-ASCII characters printable again for compatibility with
@@ -11078,7 +11219,7 @@
 %
 address@hidden = @active
  @address@hidden
-   @nonasciistringdefs
+   @throughcharactersdefs
    @address@hidden
    @let"address@hidden
    @address@hidden %$ font-lock fix
[Prev in Thread]
Current Thread
[Next in Thread]
[6995] xetex multibyte support, Gavin D. Smith <=
Prev by Date: [6994] update version to 6.1dev
Next by Date: [6996] texinfo.tex comment changes
Previous by thread: [6994] update version to 6.1dev
Next by thread: [6996] texinfo.tex comment changes
Index(es):
- Date
- Thread