[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
branch master updated: * tp/Texinfo/Common.pm (locate_include_file), tp
From: |
Patrice Dumas |
Subject: |
branch master updated: * tp/Texinfo/Common.pm (locate_include_file), tp/Texinfo/Convert/Converter.pm (encode_file_name, txt_image_text), tp/Texinfo/Convert/DocBook.pm (_convert), tp/Texinfo/Convert/HTML.pm (html_image_file_location_name), tp/Texinfo/Convert/IXIN.pm (output_ixin), tp/Texinfo/Convert/Info.pm (format_image), tp/Texinfo/Convert/LaTeX.pm (_convert), tp/Texinfo/Convert/Utils.pm (expand_verbatiminclude), tp/Texinfo/ParserNonXS.pm (_end_line): encode file name before calling locate_include_file() in ord [...] |
Date: |
Wed, 23 Feb 2022 18:08:21 -0500 |
This is an automated email from the git hooks/post-receive script.
pertusus pushed a commit to branch master
in repository texinfo.
The following commit(s) were added to refs/heads/master by this push:
new 26111f550e * tp/Texinfo/Common.pm (locate_include_file),
tp/Texinfo/Convert/Converter.pm (encode_file_name, txt_image_text),
tp/Texinfo/Convert/DocBook.pm (_convert), tp/Texinfo/Convert/HTML.pm
(html_image_file_location_name), tp/Texinfo/Convert/IXIN.pm (output_ixin),
tp/Texinfo/Convert/Info.pm (format_image), tp/Texinfo/Convert/LaTeX.pm
(_convert), tp/Texinfo/Convert/Utils.pm (expand_verbatiminclude),
tp/Texinfo/ParserNonXS.pm (_end_line): encode file name before calling locate_
[...]
26111f550e is described below
commit 26111f550e284dde82828e87ce1b5f2c810c2840
Author: Patrice Dumas <pertusus@free.fr>
AuthorDate: Thu Feb 24 00:08:07 2022 +0100
* tp/Texinfo/Common.pm (locate_include_file),
tp/Texinfo/Convert/Converter.pm (encode_file_name, txt_image_text),
tp/Texinfo/Convert/DocBook.pm (_convert),
tp/Texinfo/Convert/HTML.pm (html_image_file_location_name),
tp/Texinfo/Convert/IXIN.pm (output_ixin),
tp/Texinfo/Convert/Info.pm (format_image),
tp/Texinfo/Convert/LaTeX.pm (_convert),
tp/Texinfo/Convert/Utils.pm (expand_verbatiminclude),
tp/Texinfo/ParserNonXS.pm (_end_line): encode file name before
calling locate_include_file() in order to be able to pass file
names already encoded (case of CSS_FILES) and find the information
in different structures in converters and NonXS Parser.
Note in comments that no information source is available in
Texinfo/Convert/Utils.pm expand_verbatiminclude().
---
.gitignore | 1 +
ChangeLog | 17 +++++
tp/Texinfo/Common.pm | 20 ------
tp/Texinfo/Convert/Converter.pm | 28 +++++++-
tp/Texinfo/Convert/DocBook.pm | 3 +-
tp/Texinfo/Convert/HTML.pm | 6 +-
tp/Texinfo/Convert/IXIN.pm | 7 +-
tp/Texinfo/Convert/Info.pm | 5 +-
tp/Texinfo/Convert/LaTeX.pm | 10 ++-
tp/Texinfo/Convert/Utils.pm | 10 ++-
tp/Texinfo/ParserNonXS.pm | 19 +++++-
tp/tests/many_input_files/Makefile.am | 12 ++--
tp/tests/many_input_files/input_dir_non_ascii.sh | 74 ++++++++++++++++++++++
.../dir_\303\256ncl\303\271de/file_image.png" | 0
.../dir_\303\256ncl\303\271de/included_file.texi" | 1 +
.../input_files/simple_including_file.texi | 8 +++
16 files changed, 180 insertions(+), 41 deletions(-)
diff --git a/.gitignore b/.gitignore
index 6d157fd7c0..81ac6b313a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -210,6 +210,7 @@ tp/tests/test_scripts/*.trs
tp/tests/many_input_files/different_encodings
tp/tests/many_input_files/different_languages_gen_master_menu
+tp/tests/many_input_files/input_dir_non_ascii
tp/tests/many_input_files/tex_l2h
tp/tests/many_input_files/tex_t4ht
tp/tests/many_input_files/raw_out
diff --git a/ChangeLog b/ChangeLog
index eb4ae89f63..22b80f7ee7 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,20 @@
+2022-02-23 Patrice Dumas <pertusus@free.fr>
+
+ * tp/Texinfo/Common.pm (locate_include_file),
+ tp/Texinfo/Convert/Converter.pm (encode_file_name, txt_image_text),
+ tp/Texinfo/Convert/DocBook.pm (_convert),
+ tp/Texinfo/Convert/HTML.pm (html_image_file_location_name),
+ tp/Texinfo/Convert/IXIN.pm (output_ixin),
+ tp/Texinfo/Convert/Info.pm (format_image),
+ tp/Texinfo/Convert/LaTeX.pm (_convert),
+ tp/Texinfo/Convert/Utils.pm (expand_verbatiminclude),
+ tp/Texinfo/ParserNonXS.pm (_end_line): encode file name before
+ calling locate_include_file() in order to be able to pass file
+ names already encoded (case of CSS_FILES) and find the information
+ in different structures in converters and NonXS Parser.
+ Note in comments that no information source is available in
+ Texinfo/Convert/Utils.pm expand_verbatiminclude().
+
2022-02-23 Gavin Smith <gavinsmith0123@gmail.com>
Recode filenames into input encoding
diff --git a/tp/Texinfo/Common.pm b/tp/Texinfo/Common.pm
index 5e6fcfb597..746e1e4a60 100644
--- a/tp/Texinfo/Common.pm
+++ b/tp/Texinfo/Common.pm
@@ -1511,26 +1511,6 @@ sub locate_include_file($$)
my $text = shift;
my $file;
- # Reverse the decoding of the file name from the input encoding. When
- # dealing with file names, we want Perl strings representing sequences of
- # bytes, not Unicode codepoints.
- # This is necessary even if the name of the included file is purely
- # ASCII, as the name of the directory it is located within may contain
- # non-ASCII characters.
- # Otherwise, the -e operator and similar may not work correctly.
- #
- if ($configuration_information) {
- my $info = Texinfo::Parser::global_information($configuration_information);
- my $encoding = $info->{'input_perl_encoding'};
- if ($encoding) {
- if ($encoding eq 'utf-8' or $encoding eq 'utf-8-strict') {
- utf8::encode($text);
- } else {
- $text = Encode::encode($encoding, $text);
- }
- }
- }
-
my $ignore_include_directories = 0;
my ($volume, $directories, $filename) = File::Spec->splitpath($text);
diff --git a/tp/Texinfo/Convert/Converter.pm b/tp/Texinfo/Convert/Converter.pm
index d03a017676..b70c0d1e56 100644
--- a/tp/Texinfo/Convert/Converter.pm
+++ b/tp/Texinfo/Convert/Converter.pm
@@ -1009,12 +1009,38 @@ sub present_bug_message($$;$)
warn "You found a bug: $message\n\n".$additional_information;
}
+# Reverse the decoding of the file name from the input encoding. When
+# dealing with file names, we want Perl strings representing sequences of
+# bytes, not Unicode codepoints.
+# This is necessary even if the name of the included file is purely
+# ASCII, as the name of the directory it is located within may contain
+# non-ASCII characters.
+# Otherwise, the -e operator and similar may not work correctly.
+sub encode_file_name($$)
+{
+ my $self = shift;
+ my $file_name = shift;
+
+ # FIXME use the locale instead?
+ my $info = $self->{'parser_info'};
+ if ($info) {
+ my $encoding = $info->{'input_perl_encoding'};
+ if ($encoding and ($encoding eq 'utf-8' or $encoding eq 'utf-8-strict')) {
+ utf8::encode($file_name);
+ } else {
+ $file_name = Encode::encode($encoding, $file_name);
+ }
+ }
+ return $file_name;
+}
sub txt_image_text($$$)
{
my ($self, $element, $basefile) = @_;
- my $txt_file = Texinfo::Common::locate_include_file($self, $basefile.'.txt');
+ my $text_file_name = $self->encode_file_name($basefile.'.txt');
+
+ my $txt_file = Texinfo::Common::locate_include_file($self, $text_file_name);
if (!defined($txt_file)) {
return undef;
} else {
diff --git a/tp/Texinfo/Convert/DocBook.pm b/tp/Texinfo/Convert/DocBook.pm
index 0a48ad2ea7..e997d6f542 100644
--- a/tp/Texinfo/Convert/DocBook.pm
+++ b/tp/Texinfo/Convert/DocBook.pm
@@ -1118,7 +1118,8 @@ sub _convert($$;$)
}
my @files;
foreach my $extension (@docbook_image_extensions) {
- if ($self->Texinfo::Common::locate_include_file
("$basefile.$extension")) {
+ my $file_name = $self->encode_file_name("$basefile.$extension");
+ if ($self->Texinfo::Common::locate_include_file($file_name)) {
push @files, ["$basefile.$extension", uc($extension)];
}
}
diff --git a/tp/Texinfo/Convert/HTML.pm b/tp/Texinfo/Convert/HTML.pm
index f29f3f832e..484ef4e09d 100644
--- a/tp/Texinfo/Convert/HTML.pm
+++ b/tp/Texinfo/Convert/HTML.pm
@@ -261,6 +261,7 @@ sub html_image_file_location_name($$$$)
my $image_file;
my $image_basefile;
my $image_extension;
+ # this variable is bytes encoded in the filesystem encoding
my $image_path;
if (defined($args->[0]->{'monospacetext'}) and $args->[0]->{'monospacetext'}
ne '') {
$image_basefile = $args->[0]->{'monospacetext'};
@@ -270,13 +271,16 @@ sub html_image_file_location_name($$$$)
unshift @extensions, ("$extension", ".$extension");
}
foreach my $extension (@extensions) {
+ my $file_name = $self->encode_file_name($image_basefile.$extension);
my $located_image_path
- =
$self->Texinfo::Common::locate_include_file($image_basefile.$extension);
+ = $self->Texinfo::Common::locate_include_file($file_name);
if (defined($located_image_path) and $located_image_path ne '') {
$image_path = $located_image_path;
# use the @-command argument and not the file found using the
# include paths. It is considered that the files in include paths
# will be moved by the caller anyway.
+ # If the file path found was to be used it should be decoded to perl
+ # codepoints too.
$image_file = $image_basefile.$extension;
$image_extension = $extension;
last;
diff --git a/tp/Texinfo/Convert/IXIN.pm b/tp/Texinfo/Convert/IXIN.pm
index 94002fad8d..aca2ca24e3 100644
--- a/tp/Texinfo/Convert/IXIN.pm
+++ b/tp/Texinfo/Convert/IXIN.pm
@@ -838,8 +838,9 @@ sub output_ixin($$)
@extension = ($extension);
}
foreach my $extension (@extension, @image_files_extensions) {
- my $filename = $basefile.'.'.$extension;
- my $file = $self->Texinfo::Common::locate_include_file($filename);
+ my $file_name_text = "$basefile.$extension";
+ my $file_name = $self->encode_file_name($file_name_text);
+ my $file = $self->Texinfo::Common::locate_include_file($file_name);
if (defined($file)) {
my $filehandle = do { local *FH };
if (open ($filehandle, $file)) {
@@ -866,7 +867,7 @@ sub output_ixin($$)
}
$blobs_index .= $self->ixin_element('blobentry',
['bloblen', $blob_len, 'encoding', 'base64',
- 'mimetype', $mime_type, 'filename', $filename]) ."\n";
+ 'mimetype', $mime_type, 'filename', $file_name_text]) ."\n";
}
}
}
diff --git a/tp/Texinfo/Convert/Info.pm b/tp/Texinfo/Convert/Info.pm
index 7de96d0c7c..712d2d3d8f 100644
--- a/tp/Texinfo/Convert/Info.pm
+++ b/tp/Texinfo/Convert/Info.pm
@@ -510,9 +510,12 @@ sub format_image($$)
}
my $image_file;
foreach my $extension (@extensions) {
- if ($self->Texinfo::Common::locate_include_file ($basefile.$extension)) {
+ my $file_name = $self->encode_file_name($basefile.$extension);
+ if ($self->Texinfo::Common::locate_include_file($file_name)) {
# use the basename and not the file found. It is agreed that it is
# better, since in any case the files are moved.
+ # If the file path found was to be used it should be decoded to perl
+ # codepoints too.
$image_file = $basefile.$extension;
last;
}
diff --git a/tp/Texinfo/Convert/LaTeX.pm b/tp/Texinfo/Convert/LaTeX.pm
index 07b100c77e..666fc488bd 100644
--- a/tp/Texinfo/Convert/LaTeX.pm
+++ b/tp/Texinfo/Convert/LaTeX.pm
@@ -2308,19 +2308,17 @@ sub _convert($$)
my $image_file;
foreach my $extension (@LaTeX_image_extensions) {
+ my $file_name = $self->encode_file_name("$basefile.$extension");
my $located_file =
-
$self->Texinfo::Common::locate_include_file("$basefile.$extension");
+ $self->Texinfo::Common::locate_include_file($file_name);
if (defined($located_file)) {
# use the basename and not the file found. It is agreed that it is
# better, since in any case the files are moved.
+ # If the file path found was to be used it should be decoded to
perl
+ # codepoints too.
# using basefile with escaped characters, no extension to let
LaTeX choose the
# extension
$image_file = $converted_basefile;
- #my ($image_volume, $image_directories, $image_filename)
- # = File::Spec->splitpath($located_file);
- ## using basefile with escaped characters
- #$image_file = File::Spec->catpath($image_volume,
- # $image_directories,
$converted_basefile);
}
}
if (not defined($image_file)) {
diff --git a/tp/Texinfo/Convert/Utils.pm b/tp/Texinfo/Convert/Utils.pm
index cffd54c890..317ae979c5 100644
--- a/tp/Texinfo/Convert/Utils.pm
+++ b/tp/Texinfo/Convert/Utils.pm
@@ -197,14 +197,18 @@ sub expand_verbatiminclude($$$)
my $current = shift;
return unless ($current->{'extra'} and
defined($current->{'extra'}->{'text_arg'}));
- my $text = $current->{'extra'}->{'text_arg'};
- my $file = Texinfo::Common::locate_include_file($configuration_information,
$text);
+ my $file_name_text = $current->{'extra'}->{'text_arg'};
+ # FIXME $file_name_text should be encoded to the file system
+ # encoding here to be passed to locate_include_file
+ my $file = Texinfo::Common::locate_include_file($configuration_information,
+ $file_name_text);
my $verbatiminclude;
if (defined($file)) {
if (!open(VERBINCLUDE, $file)) {
if ($registrar) {
+ # FIXME $file should be decoded to perl internal codepoints here
$registrar->line_error($configuration_information,
sprintf(__("could not read %s: %s"), $file, $!),
$current->{'line_nr'});
@@ -235,7 +239,7 @@ sub expand_verbatiminclude($$$)
} elsif ($registrar) {
$registrar->line_error($configuration_information,
sprintf(__("\@%s: could not find %s"),
- $current->{'cmdname'}, $text),
+ $current->{'cmdname'}, $file_name_text),
$current->{'line_nr'});
}
return $verbatiminclude;
diff --git a/tp/Texinfo/ParserNonXS.pm b/tp/Texinfo/ParserNonXS.pm
index 14fbc5cf14..6845082616 100644
--- a/tp/Texinfo/ParserNonXS.pm
+++ b/tp/Texinfo/ParserNonXS.pm
@@ -3206,7 +3206,22 @@ sub _end_line($$$)
} elsif ($superfluous_arg) {
# An error message is issued below.
} elsif ($command eq 'include') {
- my $file = Texinfo::Common::locate_include_file($self, $text) ;
+ my $file_name = $text;
+ # When dealing with file names, we want Perl strings representing
sequences
+ # of bytes, not codepoints in the internal perl encoding.
+ # This is necessary even if the name of the included file is
purely
+ # ASCII, as the name of the directory it is located within may
contain
+ # non-ASCII characters.
+ # Otherwise, the -e operator and similar may not work correctly.
+ if (defined $self->{'info'}->{'input_perl_encoding'}) {
+ my $encoding = $self->{'info'}->{'input_perl_encoding'};
+ if ($encoding and ($encoding eq 'utf-8' or $encoding eq
'utf-8-strict')) {
+ utf8::encode($file_name);
+ } else {
+ $file_name = Encode::encode($encoding, $file_name);
+ }
+ }
+ my $file = Texinfo::Common::locate_include_file($self, $file_name);
if (defined($file)) {
my $filehandle = do { local *FH };
if (_open_in ($self, $filehandle, $file)) {
@@ -3223,6 +3238,8 @@ sub _end_line($$$)
# expand the @-command
$current->{'type'} = 'replaced';
} else {
+ # FIXME $text does not show the include directory. However
using $file
+ # would require to decode it to perl internal codepoints
$self->_command_error($current, $line_nr,
__("\@%s: could not open %s: %s"),
$command, $text, $!);
diff --git a/tp/tests/many_input_files/Makefile.am
b/tp/tests/many_input_files/Makefile.am
index af41e48241..a595b0fc5f 100644
--- a/tp/tests/many_input_files/Makefile.am
+++ b/tp/tests/many_input_files/Makefile.am
@@ -12,16 +12,20 @@
EXTRA_DIST = $(TESTS) \
tex_l2h_res tex_t4ht_res different_encodings_res
different_languages_gen_master_menu_res \
input_files/no_master_menu_fr.texi \
- input_files/no_master_menu_no_documentlanguage.texi
-
+ input_files/no_master_menu_no_documentlanguage.texi \
+ input_files/simple_including_file.texi \
+ input_files/dir_înclùde/file_image.png \
+ input_files/dir_înclùde/included_file.texi
TESTS = tex_l2h.sh tex_t4ht.sh \
- different_encodings.sh different_languages_gen_master_menu.sh
+ different_encodings.sh different_languages_gen_master_menu.sh \
+ input_dir_non_ascii.sh
AM_TESTS_ENVIRONMENT = srcdir="$(srcdir)"; export srcdir;
top_srcdir="$(top_srcdir)"; export top_srcdir; builddir="$(builddir)"; export
buildir; top_builddir="$(top_builddir)"; export top_builddir;
tex_html_dirs = tex_l2h tex_t4ht
-tests_dirs = different_encodings different_languages_gen_master_menu
+tests_dirs = different_encodings different_languages_gen_master_menu \
+ input_dir_non_ascii
long-checks: all
$(MAKE) $(AM_MAKEFLAGS) check LONG_TESTS=yes
diff --git a/tp/tests/many_input_files/input_dir_non_ascii.sh
b/tp/tests/many_input_files/input_dir_non_ascii.sh
new file mode 100755
index 0000000000..dcbd96b742
--- /dev/null
+++ b/tp/tests/many_input_files/input_dir_non_ascii.sh
@@ -0,0 +1,74 @@
+#! /bin/sh
+#
+# Copyright 2022 Free Software Foundation, Inc.
+#
+# This file is free software; as a special exception the author gives
+# unlimited permission to copy and/or distribute it, with or without
+# modifications, as long as this notice is preserved.
+#
+# This program is distributed in the hope that it will be useful, but
+# WITHOUT ANY WARRANTY, to the extent permitted by law; without even the
+# implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
+#
+# Originally written by Patrice Dumas.
+
+LC_ALL=C.UTF-8; export LC_ALL
+LANGUAGE=C.UTF-8; export LANGUAGE
+
+basename=input_dir_non_ascii
+diffs_dir=diffs
+raw_output_dir=raw_out
+logfile=$basename.log
+stdout_file=stdout_$basename.out
+
+[ "z$srcdir" = 'z' ] && srcdir=.
+
+. ../../defs || exit 1
+
+[ -d $diffs_dir ] || mkdir $diffs_dir
+staging_dir=$diffs_dir/staging
+[ -d $staging_dir ] || mkdir $staging_dir
+[ -d $raw_output_dir ] || mkdir $raw_output_dir
+
+echo "$basename" > $logfile
+
+[ -d $basename ] && rm -rf $basename
+raw_outdir=$raw_output_dir/$basename
+[ -d $raw_outdir ] && rm -rf $raw_outdir
+mkdir $basename
+: > $basename/$stdout_file
+
+echo "$PERL -I $srcdir/../.. -I
$srcdir/../../maintain/lib/Unicode-EastAsianWidth/lib/ -I
$srcdir/../../maintain/lib/libintl-perl/lib -I
$srcdir/../../maintain/lib/Text-Unidecode/lib/ -w $srcdir/../../texi2any.pl
--html --no-split --set-customization-variable 'TEST 1' -I
$srcdir/input_files/dir_înclùde --conf-dir $srcdir/../../init --out $basename/
$srcdir/input_files/simple_including_file.texi --force >>
$basename/$stdout_file 2>$basename/${basename}.2" >> $logfile
+$PERL -I $srcdir/../.. -I
$srcdir/../../maintain/lib/Unicode-EastAsianWidth/lib/ -I
$srcdir/../../maintain/lib/libintl-perl/lib -I
$srcdir/../../maintain/lib/Text-Unidecode/lib/ -w $srcdir/../../texi2any.pl
--html --no-split --set-customization-variable 'TEST 1' -I
$srcdir/input_files/dir_înclùde --conf-dir $srcdir/../../init --out $basename/
$srcdir/input_files/simple_including_file.texi --force >>
$basename/$stdout_file 2>$basename/${basename}.2
+
+return_code=0
+ret=$?
+if [ $ret != 0 ]; then
+ echo "F: $basename/$basename.2"
+ return_code=1
+else
+ outdir=$basename
+ cp -pr $outdir $raw_output_dir
+
+ dir=$basename
+ if [ -d "$srcdir/${dir}_res" ]; then
+ rm -rf $staging_dir/${dir}_res
+ cp -pr "$srcdir/${dir}_res" $staging_dir
+ chmod -R u+w "$staging_dir/${dir}_res"
+ diff $DIFF_U_OPTION -r "$staging_dir/${dir}_res" "$outdir" 2>>$logfile >
"$diffs_dir/$dir.diff"
+ dif_ret=$?
+ if [ $dif_ret != 0 ]; then
+ echo "D: $diffs_dir/$dir.diff"
+ return_code=1
+ else
+ rm "$diffs_dir/$dir.diff"
+ fi
+ else
+ echo "no res: ${dir}_res"
+ fi
+fi
+
+rm -rf $tmp_dir
+
+exit $return_code
+
diff --git
"a/tp/tests/many_input_files/input_files/dir_\303\256ncl\303\271de/file_image.png"
"b/tp/tests/many_input_files/input_files/dir_\303\256ncl\303\271de/file_image.png"
new file mode 100644
index 0000000000..e69de29bb2
diff --git
"a/tp/tests/many_input_files/input_files/dir_\303\256ncl\303\271de/included_file.texi"
"b/tp/tests/many_input_files/input_files/dir_\303\256ncl\303\271de/included_file.texi"
new file mode 100644
index 0000000000..37ada09a47
--- /dev/null
+++
"b/tp/tests/many_input_files/input_files/dir_\303\256ncl\303\271de/included_file.texi"
@@ -0,0 +1 @@
+In included file
diff --git a/tp/tests/many_input_files/input_files/simple_including_file.texi
b/tp/tests/many_input_files/input_files/simple_including_file.texi
new file mode 100644
index 0000000000..3be1318e73
--- /dev/null
+++ b/tp/tests/many_input_files/input_files/simple_including_file.texi
@@ -0,0 +1,8 @@
+\input texinfo
+
+@node Top
+@top top section
+
+@include included_file.texi
+
+@image{file_image}
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- branch master updated: * tp/Texinfo/Common.pm (locate_include_file), tp/Texinfo/Convert/Converter.pm (encode_file_name, txt_image_text), tp/Texinfo/Convert/DocBook.pm (_convert), tp/Texinfo/Convert/HTML.pm (html_image_file_location_name), tp/Texinfo/Convert/IXIN.pm (output_ixin), tp/Texinfo/Convert/Info.pm (format_image), tp/Texinfo/Convert/LaTeX.pm (_convert), tp/Texinfo/Convert/Utils.pm (expand_verbatiminclude), tp/Texinfo/ParserNonXS.pm (_end_line): encode file name before calling locate_include_file() in ord [...],
Patrice Dumas <=