[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: different encodings for input and output file names and command line
From: |
Gavin Smith |
Subject: |
Re: different encodings for input and output file names and command line |
Date: |
Fri, 4 Mar 2022 18:36:09 +0000 |
On Fri, Mar 04, 2022 at 08:15:54AM +0100, Patrice Dumas wrote:
> > + if ($self->get_conf('DOC_ENCODING_FOR_INPUT_FILE_NAME')) {
> > + my $document_encoding;
> > + $document_encoding = $self->{'parser_info'}->{'input_perl_encoding'}
> > + if ($self->{'parser_info'}
> > + and defined($self->{'parser_info'}->{'input_perl_encoding'}));
> > + return Texinfo::Common::encode_file_name($self, $file_name,
> > + $document_encoding);
> > + } else {
> > + return Texinfo::Common::encode_file_name($self, $file_name,
> > + $self->get_conf('LOCALE_INPUT_FILE_NAME_ENCODING'));
> > + }
> > +}
>
> The code looks right.
I've implemented this in the XS parser although haven't been able to
get DOC_ENCODING_FOR_INPUT_FILE_NAME=0 to work.
(note this email is UTF-8 encoded but this is copied and pasted from a
Latin-1 terminal)
$ locale
LANG=en_GB.UTF-8
LANGUAGE=en_GB:en
LC_CTYPE="fr_FR"
LC_NUMERIC="fr_FR"
LC_TIME="fr_FR"
LC_COLLATE="fr_FR"
LC_MONETARY="fr_FR"
LC_MESSAGES="fr_FR"
LC_PAPER="fr_FR"
LC_NAME="fr_FR"
LC_ADDRESS="fr_FR"
LC_TELEPHONE="fr_FR"
LC_MEASUREMENT="fr_FR"
LC_IDENTIFICATION="fr_FR"
LC_ALL=fr_FR
$ cat é.texi
\input texinfo
@documentencoding ISO-8859-1
@setfilename ü.info
@include aß.texi
@bye
$ cat aß.texi
faerrra
$ ../texi2any.pl é.texi
é.texi: warning: document without nodes
$ cat ü.info
This is ü.info, produced by texi2any version 6.8dev+dev from é.texi.
faerrra
Tag Table:
End Tag Table
Local Variables:
coding: iso-8859-1
End:
$ # so far so good
$ ../texi2any.pl é.texi -c DOC_ENCODING_FOR_INPUT_FILE_NAME=0 -c
LOCALE_INPUT_FILE_NAME_ENCODING=ISO-8859-1
é.texi:7: @include: could not find aß.texi
$ ../texi2any.pl é.texi -c DOC_ENCODING_FOR_INPUT_FILE_NAME=0
é.texi:7: @include: could not find aß.texi
$ TEXINFO_XS=omit ../texi2any.pl é.texi -c DOC_ENCODING_FOR_INPUT_FILE_NAME=0
é.texi:7: @include: could not find aß.texi
$ TEXINFO_XS=omit ../texi2any.pl é.texi -c DOC_ENCODING_FOR_INPUT_FILE_NAME=0
-c LOCALE_INPUT_FILE_NAME_ENCODING=ISO-8859-1
é.texi:7: @include: could not find aß.texi
--------------
On making the following change, it appears that LOCALE_INPUT_FILE_NAME_ENCODING
is undefined:
diff --git a/tp/Texinfo/ParserNonXS.pm b/tp/Texinfo/ParserNonXS.pm
index 3920bd076e..8a23690cf7 100644
--- a/tp/Texinfo/ParserNonXS.pm
+++ b/tp/Texinfo/ParserNonXS.pm
@@ -2024,6 +2024,8 @@ sub _encode_file_name($$)
return Texinfo::Common::encode_file_name($self, $file_name,
$self->{'info'}->{'input_perl_encoding'});
} else {
+ warn "<" . $self->get_conf('LOCALE_INPUT_FILE_NAME_ENCODING')
+ .">\n";
return Texinfo::Common::encode_file_name($self, $file_name,
$self->get_conf('LOCALE_INPUT_FILE_NAME_ENCODING'));
}
This leads to extra output
Use of uninitialized value in concatenation (.) or string at
../../tp/Texinfo/ParserNonXS.pm line 2027.
<>
with TEXINFO_XS=omit.
I've attached these two files in a tar file to this email.
I haven't investigated yet why LOCALE_INPUT_FILE_NAME_ENCODING is undefined.
test.tar
Description: Unix tar archive
- Re: different encodings for input and output file names and command line, (continued)