lmi
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Dealing with deleted operator<<() overloads for wide char type


From: Vadim Zeitlin
Subject: Re: [lmi] Dealing with deleted operator<<() overloads for wide char types in C++20
Date: Tue, 2 Mar 2021 15:13:09 +0100

On Mon, 1 Mar 2021 21:59:59 +0000 Greg Chicares <gchicares@sbcglobal.net> wrote:

GC> On 2/26/21 12:29 AM, Vadim Zeitlin wrote:
GC> > On Thu, 25 Feb 2021 22:35:34 +0000 Greg Chicares 
<gchicares@sbcglobal.net> wrote:
GC> > 
GC> > GC> On 2/25/21 8:55 PM, Vadim Zeitlin wrote:
[...]
GC> >  This is perfectly understandable, but IMO casting chars is exactly a case
GC> > which just doesn't make sense. You wouldn't, presumably, attempt to make
GC> > bourn_cast() work with strings, so why would you ever want to use it with
GC> > char{16,32}_t which can be exactly the same thing as UTF-8-encoded
GC> > multi-byte string?
GC> 
GC> I'm not sure I follow. As you know, C++20 (N4861) says [6.8.1]:
GC> 
GC> | 11. Types bool, char, wchar_t, char8_t, char16_t, char32_t, and
GC> | the signed and unsigned integer types are collectively called
GC> | integral types. A synonym for integral type is integer type.
GC> 
GC> | 12. There are three floating-point types: float, double, and
GC> | long double. ... Integral and floating-point types are
GC> | collectively called arithmetic types.
GC> 
GC> Shouldn't the domain of bourn_cast<T,U> be all arithmetic types {T,U}?

 I don't think so, but we clearly disagree about it. To briefly define my
position before attempting to defend it below, I think its domain should be
all "normal arithmetic types" where the "normal" part means exactly "those
for which it makes sense".

GC> I don't even know what these goofy types are supposed to be: maybe
GC> UCS-2 and UCS-4, or something like that. But I don't see why the
GC> motivation for their existence would need to be considered here:
GC> they're arithmetic types, and isn't that all I need to know?

 IMO it doesn't make much (or maybe any) sense to treat them as arithmetic
types, even though they are defined as integral types in the standard. They
are only ever useful as type of elements of an array representing a string
in a particular encoding (UTF-{8,16,32} respectively) and they should be
never used for anything else.

 The traditional "char" type has been always overloaded to mean "byte" in
C++ (and in C before it, of course) and this was considered to be bad but
mostly unavoidable (and C++20 std::byte is not going to change this, or at
least not for another decade). With the new types, I really don't see any
justification for using them as integral types, whatever the Standard says.

GC> I did something much simpler in commit e89831e23. IOW, here:
GC> 
GC> > GC> I'd be inclined to replace
GC> > GC>   INVOKE_BOOST_TEST_EQUAL(x, y, file, line);
GC> > GC> (only where it is actually problematic) with
GC> > GC>   INVOKE_BOOST_TEST(x == y, file, line);
GC> 
GC> what does the (ambiguous) "actually problematic" mean? I
GC> intended "on any line flagged in a compiler diagnostic,
GC> regardless of what remote line it was invoked from": thus,
GC> "potentially problematic" might have been a better phrase.
GC> 
GC> The tests for equality are preserved. All that's lost is
GC> the incidental printing of their (unequal) values.

 IMO this is very far from incidental and it's quite invaluable to see the
values resulting in the test failures, especially in this age when tests
often are run remotely, on some CI system. In practice I've regretted not
seeing the values resulting in the test failures very often and this is, in
fact, one of the major reasons I prefer CATCH[1] or any of the newer
testing frameworks based on it, such as doctest[2]: they always give you
the values for free, without having to use the ugly special versions (with
_EQUAL or whatever suffix) of the testing macro.

[1]: https://github.com/catchorg/Catch2/
[2]: https://github.com/onqtam/doctest


GC> But maybe I misunderstand something. From my POV above,
GC> integral types are integral types, and that's that, so
GC> making some of them streamable and others not is lunacy.

 The fact is that these types are not streamable and there is nothing we
can do about it. IMO it's just another reason to not group them with the
"normal integral types".

GC> > So I was rather
GC> > thinking of implementing a complete solution for this by defining a
GC> > template lmi_test::to_string() function
GC> 
GC> Nothing new to invent:
GC> 
GC> template<typename T>
GC> lmi_test::to_string(T t)
GC>   {return value_cast<std::string>(t);}
GC> 
GC> Would you be in favor of a change such as the following?
GC> 
GC>  #define LMI_ASSERT_EQUAL(observed,expected)                         \
GC>      LMI_ASSERT_WITH_MSG                                             \
GC>          ((observed) == (expected)                                   \
GC> -        ,"expected " << (expected) << " vs observed " << (observed) \
GC> +        ,      "expected " << value_cast<std::string>(expected) "\
GC> +        << " vs observed " << value_cast<std::string>(observed) \
GC>          )

 No, if we change anything here, I'd really like to use
lmi_test::to_string() inside the macro in order to allow customizing the
values appearance specifically for the test failure messages. Again, this
is something that more or less all test frameworks do and there is a reason
for it: it can be very useful to see just the _relevant_ parts of the
values when the tests fail.

GC> I'd be somewhat opposed because that introduces heavyweight
GC> header "value_cast.hpp" into lightweight "assert_lmi.hpp",
GC> compounding the fragility of assertion code that already
GC> uses heavyweight std:: stuff that can throw exceptions.

 I'd have no problem[*] with passing "std::ostream& os" to to_string() and
use "os << t" as its default implementation. It's really more about
customizing the output for some types than for dealing with charN_t types
specifically.

GC> >  But my preference would still be to just forbid using bourn_cast() with
GC> > the wide char types as I remain convinced that this is an operation which
GC> > simply never makes sense.
GC> 
GC> Again, maybe I'm missing something, but to me:
GC>  - arithmetic types should be streamable
GC>  - the real error is that types like char16_t and char32_t exist

 It's a valid point of view, but you can't correct this error. So I suggest
working around it by introducing the concept of "normal arithmetic type"
excluding char{8,16,32}_t. IMHO it's a more constructive way to look at
things as they are.

GC> > [*] We really, really ought to rename these macros to LMI_TEST_EQUAL() or
GC> >     something else not starting with BOOST_, pretending that they come 
from
GC> >     Boost is still confusing, even after all these years, and just imagine
GC> >     how puzzling it is to anybody new to lmi code base (I didn't have to
GC> >     imagine this when I saw Ilya being completely misled by their names
GC> >     relatively recently).
GC> 
GC> Okay, I'll do that.

 TIA!
VZ

[*] Well, to be totally honest, of course I would have a problem with using
    std::ostream here or, in fact, anywhere else, because I hope for its
    total eradication and replacement by std::format and related C++20
    facilities (available as far back as C++11 via fmt library from
    https://fmt.dev/), but this is a fight for another day.

Attachment: pgp8CnoYBk8C4.pgp
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]