bug-texinfo
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Build from git broken - missing gperf?


From: Gavin Smith
Subject: Re: Build from git broken - missing gperf?
Date: Wed, 7 Feb 2024 22:51:31 +0000

On Tue, Feb 06, 2024 at 07:13:09PM +0100, Patrice Dumas wrote:
> On Mon, Feb 05, 2024 at 07:35:59PM +0000, Gavin Smith wrote:
> > I don't know if uniconv/u8-conv-from-enc is a necessary module.  It's
> > not easy to find out how the module is used as the documentation is
> > lacking, but it appears to match libunistring.  The documentation is
> > here:
> > https://www.gnu.org/software/libunistring/manual/html_node/uniconv_002eh.html
> > 
> > I found uses of "u8_strconv_from_encoding" throughout the XS code,
> > although most of the uses (I didn't check them all) have "UTF-8" as one
> > of the arguments, making it appear that we are converting from UTF-8
> > to UTF-8.
> 
> It is the case.  We actually already discussed that issue peviously, in
> the codes I did, and in order to follow what I understood from the
> libunistring documentation, char * is converted to uint8_t by calling
> u8_strconv_from_encoding even though the string is already UTF-8.  In
> your code in xspara.c you simply cast to uint8_t.  It could also be done
> like that in other codes, I do not know what is best.

The immediate solution is to require gperf as a tool for developers, just
like automake, autoconf, etc.

Getting away from u8_strconv_from_encoding could take some more effort
and isn't immediately necessary, but would be nice to do to reduce bloat.
Since we only use it for UTF-8 validation, we could do this in some other
function that is simpler and doesn't pull in as much from gnulib.

I saw your private email from November 2023.  Here's part of what
I wrote in my response (for the benefit of the mailing list):

  We can assume the text strings coming out of Perl are encoded already
  in UTF-8, so running a conversion on them is pointless and confusing.

  According to the libunistring manual:

    The five types char *, uint8_t *, uint16_t *, uint32_t *, and wchar_t
    * are incompatible types at the C level. Therefore, ‘gcc -Wall’
    will produce a warning if, by mistake, your code contains a mismatch
    between these types. In the context of using GNU libunistring, even
    a warning about a mismatch between char * and uint8_t * is a sign of
    a bug in your code that you should not try to silence through a cast.

  
https://www.gnu.org/software/libunistring/manual/libunistring.html#In_002dmemory-representation

  However, I don't understand how this can possibly be avoided, other than
  by running pointless conversions.  SvPV, which we use in XSParagraph.xs
  to get the pointer, returns a char * value.  Unless the Perl API can
  give a value with a type of uint8_t * to represent a UTF-8 string,
  then we can only avoid such warnings with a cast.

I can see the appeal of not fully trusting Perl's API to provide correct
values for use in our own XS code.  I suggest that if we do use a cast
we can do it in one single place in the code along with any validation
we do on the UTF-8.  We could start with a wrapper around
u8_strconv_from_encoding.  I'm happy to work on this myself when I have
time to.

> That being said, we also directly use gnulib iconv, so I think that
> iconv_open would still be brought in anyway.

We'd have to see if this module was still worth using for the platforms
it supports and the problems it solves.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]