[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Encoding error when reading file with ISO-8859-1 filename
From: |
Patrice Dumas |
Subject: |
Re: Encoding error when reading file with ISO-8859-1 filename |
Date: |
Sun, 6 Mar 2022 15:14:26 +0100 |
On Sat, Mar 05, 2022 at 09:00:03PM +0000, Gavin Smith wrote:
> Here's something that came up when I was testing filename encodings
> and a proposed fix to silence a warning message.
>
> A one-line fix is the following:
>
> diff --git a/tp/Texinfo/Convert/Converter.pm b/tp/Texinfo/Convert/Converter.pm
> index df9d68d701..30eaea1e13 100644
> --- a/tp/Texinfo/Convert/Converter.pm
> +++ b/tp/Texinfo/Convert/Converter.pm
> @@ -546,7 +546,8 @@ sub determine_files_and_directory($;$)
> my $input_file_name = $self->{'parser_info'}->{'input_file_name'};
> my $encoding = $self->get_conf('DATA_INPUT_ENCODING_NAME');
> if (defined($encoding)) {
> - $input_file_name = decode($encoding, $input_file_name);
> + $input_file_name = decode($encoding, $input_file_name,
> + sub { '?' });
> }
> my ($directories, $suffix);
> ($input_basefile, $directories, $suffix) = fileparse($input_file_name);
>
>
> This eliminates the problematic U+FFFD character at the point of reading
> the filename. In the output Info file, a question mark will harmlessly
> appear in the filename, like:
>
> This is ΓΌ.info, produced by texi2any version 6.8dev+dev from ?.texi.
>
> Patrice, do you think it's ok to commit the above change?
Your analysis and solution looks good. I added a note that the
corresponding test could be added, but it would require having encoded
file name (I believe) in the test suite.
--
Pat