[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Non-ASCII characters in @include search path
From: |
Patrice Dumas |
Subject: |
Re: Non-ASCII characters in @include search path |
Date: |
Sat, 26 Feb 2022 00:17:46 +0100 |
On Mon, Feb 21, 2022 at 08:46:56PM +0000, Gavin Smith wrote:
> On Sun, Feb 20, 2022 at 10:32:00PM +0100, Patrice Dumas wrote:
> > On Sun, Feb 20, 2022 at 05:27:51PM +0000, Gavin Smith wrote:
> > > If the error message became something like
> > >
> > > "nœud « �sseul� » non référencé"
> > >
> > > then encoding this to UTF-8 would break the parts which already were in
> > > UTF-8.
> >
> > I just commited input decoding (command line, environment, translated
> > messages) and output messages encoding. I left file names as is, but
> > prepared a customization variable for them.
> >
> > Now the error message is:
> >
> > testé.texi:8: warning: nœud « ésseulé » non référencé
>
> One way of fixing this would be to store the filename separately along with
> the rest of the error message, and prepend the filename when it is output.
> I can try to implement this.
I am reviewing the code to find where we mix file names that will be
used as bytes at some point and character strings, and it is very common.
* unless I missed something, string constants are character strings. If
thay are to appear mostly in file names we need to encode them at some
point, but it does not seems to be easy to me to decide when, unless
when we are sure that the string will only be considered as a byte
sequence from then on.
* many strings can come from documents, as character strings or from
command line, possibly kept encoded. For example document file name
can come from @setfilename or the command line (or customization
variable).
* many strings are used both in file names and in texts. For example
the customization variable 'EXTENSION'. Even strings that are almost
only used as bytes can appear in error messages, which means that we
need to keep the information somewhere on how to decode them.
* it is much more simpler to require customization variables from init
files to be character strings, which means that we need an API to
encode those we want to mix with bytes, and we cannot do this early so
it means more complexity.
For all those reasons, I really think that we should use character
strings almost everywhere and encode when needed, such that there is
no need to track down where a string comes from to be sure whether it
is encoded or not. We already decode and encode in many places as we
have file names used in error messages combined with character strings,
character strings from Texinfo manuals that need to be encoded. The
gain of avoiding to decode and encode a few strings is not covered, in
my opinion by the complexity of having strings that cannot be mixed.
In some cases, we can decide to consider encoded strings, still, but I
think that it should only be if we are sure that they will not ever be
mixed with decoded character strings.
--
Pat
- Re: Non-ASCII characters in @include search path, (continued)
- Re: Non-ASCII characters in @include search path, Gaël Bonithon, 2022/02/23
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/23
- Re: Non-ASCII characters in @include search path, Eli Zaretskii, 2022/02/23
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/23
- Re: Non-ASCII characters in @include search path, Eli Zaretskii, 2022/02/24
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/24
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/24
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/24
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/24
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/21
- Re: Non-ASCII characters in @include search path,
Patrice Dumas <=
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/26
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/26
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/26
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/26
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/26
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/26
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/26
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/26
- Re: Non-ASCII characters in @include search path, Gavin Smith, 2022/02/26
- Re: Non-ASCII characters in @include search path, Patrice Dumas, 2022/02/26