octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #63930] fprintf writes incorrect characters wh


From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #63930] fprintf writes incorrect characters when converting the encoding
Date: Tue, 4 Apr 2023 14:42:33 -0400 (EDT)

Follow-up Comment #63, bug #63930 (project octave):

This looks like something isn't quite right in libc++:
https://github.com/llvm/llvm-project/blob/5c950a3127da7c4121da75df9751208ba2aa9cad/libcxx/include/locale#L4110

            do
            {
                const char_type* __e;
                __r = __cv_->out(__st_, this->pbase(), this->pptr(), __e,
                                        __extbuf_, __extbuf_ + __ebs_,
__extbe);
                if (__e == this->pbase())
                    return traits_type::eof();
[...]
            } while (__r == codecvt_base::partial);



A this point, `this->pptr()` seems to point only one ahead of `this->pbase()`
(inspecting with gdb - not sure why that is though). That means that we got an
incomplete UTF-8 character. We need to reset `__e` to `this->pbase()` to
restart with more characters in the buffer.
The next expression terminated the conversion.

We currently don't do the part about resetting `__e` correctly. That's the
reason for the random crashes. With `libstdc++`, it is initialized reasonably
and we don't need to touch it when not converting anything. But the standard
doesn't seem to make any guarantees about that.

This might have worked before because we didn't reverse the `from_next`
pointer ever. But that was wrong (because it could lead to incorrect
conversions from partial multi-byte UTF-8 surrogates).

Looking at the following defect report and its resolution, libc++ might be
working according to the standard:
https://cplusplus.github.io/LWG/issue76

However, `libstdc++` seems to do just fine with that situation.

I don't know what the best solution is now. 🤷‍♂️
We probably need to rethink the entire transcoding. It wouldn't even help if
we switched to UTF-16 internally because the current approach would still not
be standard compliant for characters outside the BMP. (UTF-32 might work.)

As a short-term workaround, it might make sense to disable the transcoding
with libc++.
Is there a way to detect on compile time that we will be linking to libc++?
Are there configure checks for that?

And we should try to not crash even with libc++. I can probably look at that
part some time during this week.



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?63930>

_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]