Re: Encoding error when reading file with ISO-8859-1 filename

bug-texinfo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Encoding error when reading file with ISO-8859-1 filename

From:	Patrice Dumas
Subject:	Re: Encoding error when reading file with ISO-8859-1 filename
Date:	Sun, 6 Mar 2022 15:14:26 +0100

On Sat, Mar 05, 2022 at 09:00:03PM +0000, Gavin Smith wrote:
> Here's something that came up when I was testing filename encodings
> and a proposed fix to silence a warning message.
> 
> A one-line fix is the following:
> 
> diff --git a/tp/Texinfo/Convert/Converter.pm b/tp/Texinfo/Convert/Converter.pm
> index df9d68d701..30eaea1e13 100644
> --- a/tp/Texinfo/Convert/Converter.pm
> +++ b/tp/Texinfo/Convert/Converter.pm
> @@ -546,7 +546,8 @@ sub determine_files_and_directory($;$)
>      my $input_file_name = $self->{'parser_info'}->{'input_file_name'};
>      my $encoding = $self->get_conf('DATA_INPUT_ENCODING_NAME');
>      if (defined($encoding)) {
> -      $input_file_name = decode($encoding, $input_file_name);
> +      $input_file_name = decode($encoding, $input_file_name,
> +                                sub { '?' });
>      }
>      my ($directories, $suffix);
>      ($input_basefile, $directories, $suffix) = fileparse($input_file_name);
> 
> 
> This eliminates the problematic U+FFFD character at the point of reading
> the filename.  In the output Info file, a question mark will harmlessly
> appear in the filename, like:
> 
> This is ü.info, produced by texi2any version 6.8dev+dev from ?.texi.
> 
> Patrice, do you think it's ok to commit the above change?

Your analysis and solution looks good.  I added a note that the
corresponding test could be added, but it would require having encoded
file name (I believe) in the test suite.

-- 
Pat

[Prev in Thread]

Current Thread

[Next in Thread]

Encoding error when reading file with ISO-8859-1 filename, Gavin Smith, 2022/03/05
- Re: Encoding error when reading file with ISO-8859-1 filename, Patrice Dumas <=

Prev by Date: Encoding error when reading file with ISO-8859-1 filename
Next by Date: Re: different encodings for input and output file names and command line
Previous by thread: Encoding error when reading file with ISO-8859-1 filename
Next by thread: Repository has non-UTF8 names, creating issues with certain file systems
Index(es):
- Date
- Thread