Re: Build from git broken - missing gperf?

bug-texinfo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Build from git broken - missing gperf?

From:	Gavin Smith
Subject:	Re: Build from git broken - missing gperf?
Date:	Wed, 7 Feb 2024 22:51:31 +0000

On Tue, Feb 06, 2024 at 07:13:09PM +0100, Patrice Dumas wrote:
> On Mon, Feb 05, 2024 at 07:35:59PM +0000, Gavin Smith wrote:
> > I don't know if uniconv/u8-conv-from-enc is a necessary module.  It's
> > not easy to find out how the module is used as the documentation is
> > lacking, but it appears to match libunistring.  The documentation is
> > here:
> > https://www.gnu.org/software/libunistring/manual/html_node/uniconv_002eh.html
> > 
> > I found uses of "u8_strconv_from_encoding" throughout the XS code,
> > although most of the uses (I didn't check them all) have "UTF-8" as one
> > of the arguments, making it appear that we are converting from UTF-8
> > to UTF-8.
> 
> It is the case.  We actually already discussed that issue peviously, in
> the codes I did, and in order to follow what I understood from the
> libunistring documentation, char * is converted to uint8_t by calling
> u8_strconv_from_encoding even though the string is already UTF-8.  In
> your code in xspara.c you simply cast to uint8_t.  It could also be done
> like that in other codes, I do not know what is best.

The immediate solution is to require gperf as a tool for developers, just
like automake, autoconf, etc.

Getting away from u8_strconv_from_encoding could take some more effort
and isn't immediately necessary, but would be nice to do to reduce bloat.
Since we only use it for UTF-8 validation, we could do this in some other
function that is simpler and doesn't pull in as much from gnulib.

I saw your private email from November 2023.  Here's part of what
I wrote in my response (for the benefit of the mailing list):

  We can assume the text strings coming out of Perl are encoded already
  in UTF-8, so running a conversion on them is pointless and confusing.

  According to the libunistring manual:

    The five types char *, uint8_t *, uint16_t *, uint32_t *, and wchar_t
    * are incompatible types at the C level. Therefore, ‘gcc -Wall’
    will produce a warning if, by mistake, your code contains a mismatch
    between these types. In the context of using GNU libunistring, even
    a warning about a mismatch between char * and uint8_t * is a sign of
    a bug in your code that you should not try to silence through a cast.

https://www.gnu.org/software/libunistring/manual/libunistring.html#In_002dmemory-representation

  However, I don't understand how this can possibly be avoided, other than
  by running pointless conversions.  SvPV, which we use in XSParagraph.xs
  to get the pointer, returns a char * value.  Unless the Perl API can
  give a value with a type of uint8_t * to represent a UTF-8 string,
  then we can only avoid such warnings with a cast.

I can see the appeal of not fully trusting Perl's API to provide correct
values for use in our own XS code.  I suggest that if we do use a cast
we can do it in one single place in the code along with any validation
we do on the UTF-8.  We could start with a wrapper around
u8_strconv_from_encoding.  I'm happy to work on this myself when I have
time to.

> That being said, we also directly use gnulib iconv, so I think that
> iconv_open would still be brought in anyway.

We'd have to see if this module was still worth using for the platforms
it supports and the problems it solves.

[Prev in Thread]

Current Thread

[Next in Thread]

Build from git broken - missing gperf?, Gavin Smith, 2024/02/04
- Re: Build from git broken - missing gperf?, Gavin Smith, 2024/02/05
  - Re: Build from git broken - missing gperf?, Eli Zaretskii, 2024/02/05
  - Re: Build from git broken - missing gperf?, Patrice Dumas, 2024/02/06
    - Re: Build from git broken - missing gperf?, Gavin Smith <=

Prev by Date: Re: index sorting in texi2any in C issue with spaces
Next by Date: Segmentation fault with GNU texinfo 7.1
Previous by thread: Re: Build from git broken - missing gperf?
Next by thread: Segmentation fault with GNU texinfo 7.1
Index(es):
- Date
- Thread