Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Integer Conversions and Safe Comparisons in C++20 (cppstories.com)
46 points by ibobev on Sept 13, 2022 | hide | past | favorite | 28 comments


Virgil has a family of completely well-defined (i.e. no UB) fixed-size integer types with some hard-fought rules that I eventually got around to documenting here:

https://github.com/titzer/virgil/blob/master/doc/tutorial/Fi...

One of the key things is that values are never silently truncated (other than 2's-complement wrap-around that is built-in to arithmetic) or values changed; only promotions. The only sane semantics for over-shifts (shifts larger than the size of the type) is to shift the bits out, like a window.

The upshot of all that is that Virgil has a pretty sane semantics for fixed-size integers, IMHO. Particularly comparisons. Comparing signed and unsigned ints is not UB and cannot silently give the wrong answer; integers are always compared on the same number line. It takes only a maximum of one extra comparison to achieve this. AFAICT this is what achieved by the std::cmp* referenced in the article.


Looking at horror stories like this along with their "solutions" that are done by slapping more standard functions on top of the already existing piles, I'm glad that I do my personal programming in languages which 1) have meaningful numeric towers and automatically do promotion/demotion as appropriate, 2) don't implicitly convert signed integers into unsigned integers and error out instead.

With each standard revision, C++ is becoming more and more horrible as much as it strives to become a better and more useful language.


Pray tell, what are your personal programming languages?


Lisp and Erlang. The latter doesn't have a numeric tower but has automatic bigint conversion and handling. The former is eager to error upon signed/unsigned type violations once you specify them.


Not sure Erlang is strong on the basic types front on the whole though; I've never been a fan of strings in Erlang. Greatly prefer C++ on this front.


Erlang binaries on the other hand are awesome, and the use of lists in what Erlang refers to as chardata/iodata is really nice when generating data that will be written to some kind of output device, since no preprocessing is needed to convert the data to a flat array of bytes. Overall though, yeah, it plays pretty loose with types, though it is generally strict about not using integers where floats are expected and vice versa.


> has automatic bigint conversion

how can one implement this without a branch before literally any arithmetic operation?


Erlang checks for overflow after arithmetic ops are performed internally (IIRC), so you are generally only penalized when overflow actually occurs and requires promotion, which probably isn’t much different in performance from a language that would raise an error on overflow, though obviously not as efficient as a language that just wraps around on overflow.


In CL, you can declare that you need modulo arithmetic, which will give you branchless arithmetic operations.

In Erlang, you don't; the language was never made for that sort of speed.


You're not using Erlang for speed. You're using Erlang for reliability. (Note this is not the same thing as correctness or "safety")


Once again the comments prove that people are more interested in trash-talking C++ than than solving real problems.

C++20 IS fixing many complaints or design-choices made in the past.

In terms of the article it casually mentions in the end using

'''-Werror -Wall -Wextra''' The unsigned signed comparisons will fail to compile. Most serious projects use these flags anyway.

I wonder if clang-tidy will also catch this?


I'm fully aware of pitfalls of c++ (and c), but I keep coming back to them as all the rest alternatives failed to provide the same speed, performance, and small size.

every language has its dark corner, none is perfect, same as C and C++. C and C++ are popular for decades is a fact, like it or not, data does not lie.


As a person working on big C++ telco codebases for a living, I'm fully aware that C++ is solving real problems and that -Weverything makes it somewhat possible to work with the language. At the same time, I'm appalled by the negative consequences of the "let's fix problems with the old and broken comparison operator by introducing a new, better comparison operator, since we need to maintain almost complete backwards compatibility" approach that C++ takes here.

I'm interested in solving real problems, but C++ deserves trash talking for this reason alone.


What's the alternative? Change the semantics of the comparison operator so that code looks cleaner? That means that when you read a comparison in source you have to know how it's going to be compiled in order to tell if the code is correct or not.


You can opt in a new behavior, attribute syntax added in C++11 should make this much more bearable. In fact, I'm honestly surprised that this mode of language evolution didn't seem to be frequently discussed in C++ WG at all; ECMAScript had `use strict` which is very much the same thing and well received, and if ECMAScript can do it with a lot of legacy code, there is no real reason C++ can do the same.


This isn't that, though. C/C++ continue to leave undefined behavior in specifically for performance. It would, AFAICT, be standard-compliant behavior for a compiler to handled mixed-sign comparisons involving negative numbers in the intuitive number-line way, but the language specification doesn't require this because it means another comparison[1]. Because performance, it's the programmer's responsibility to avoid having negative numbers in such comparisons. And woe to they who have bugs in their programs, the compiler may even exploit that UB to assume the code is never reached. The default is all wrong because the priorities are all wrong.

[1] It's non-obvious to me why an implicit conversion with potential UB would ever be a good idea. If a programmer really did want a UB-having single-machine compare between a signed (but non-negative) int and an unsigned int, they could write an explicit static_cast and then compare. Such a static_cast would be the UB-having nop that unlocked a non-UB comparison. You'd get the exact machine code that you wanted, but you had to opt into UB. Bad default that the short, intuitive-looking code has UB.


I don't think this is correct. Comparison operators are defined to perform the usual arithmetic conversions before they are applied, and the usual arithmetic conversions for converting to unsigned integer types are defined like so:

> If the destination type is unsigned, the resulting value is the least unsigned integer congruent to the source integer (modulo 2n where n is the number of bits used to represent the unsigned type). [Note: In a two’s complement representation, this conversion is conceptual and there is no change in the bit pattern (if there is no truncation). ]


Converting a signed negative number to unsigned is UB.


Can you provide a citation? I have above quoted the part of the standard that defines the value of the result of the conversion.

It looks like C++20 updated this so that conversion both ways is defined in a way that preserves two's complement bit-patterns, which you can see here: https://eel.is/c++draft/conv#integral-3


I don't know why they don't do that either. Perhaps because a transition like that will take multiple decades of consistent and steady guidance to succeed, given the nature and size of C++ codebases and nobody will take the responsibility for it?


Welcome to every language that has the balls to say "yeah you know what, that didn't work, here's a revision". Stuff is deprecated all the time.


> Most serious projects use these flags anyway.

Most serious projects are opting in to being unable to compile on future compilers whenever new warnings are added?


Obviously, -Werror shouldn't be enabled by default, only for things like CI tests.


On the contrary, it should be on by default for every build configuration! If you move to a newer compiler version and the compilation breaks, then you just saved yourself valuable debugging time. The alternative is that the new compiler has a new optimization that causes your code to break in runtime. You can always disable specific warnings if you think they're not relevant for your code.


The usual arithmetic conversions linked from the article [0] are a bit different from the ones I know. In particular, if int can represent all values of all operands, both of them will be converted to int.

Thus comparing unsigned short and short will usually (if short is narrower than int) do the right thing. C++Insights (which I didn't know! This is the real pearl of this article.) agrees with me [1].

[0] https://en.cppreference.com/w/cpp/language/operator_arithmet...

[1] https://cppinsights.io/lnk?code=I2luY2x1ZGUgPGNzdGRpbz4KCmlu...


The relevant part of your first link seems to be right at the beginning:

> If the operand passed to an arithmetic operator is integral or unscoped enumeration type, then before any other action (but after lvalue-to-rvalue conversion, if applicable), the operand undergoes integral promotion.

Clicking "integral promotion" leads here: https://en.cppreference.com/w/cpp/language/implicit_conversi...


If there are already std functions "safe" for integral comparison, why not replace the < and > operators impl with those? Is it because it violates existing specs? If so, shouldn't the specs be considered bad and get revised?

I thought C and C++ were advertised as language that don't do a lot of hand-holding and requires programmers to know what they are doing, and many aspects of it such as memory management really reflects that. So why is there implicit integral type conversion? Why not require programmer to always explicitly convert like Rust does?


Forcing the programmer to convert integer types explicitly doesn't really help much. It makes it obvious that a conversion is happening, but that's it. People will simply add the required explicit casts without thinking about possible truncation or sign changes. UBSan has a very useful -fsanitize=implicit-conversion flag which can detect when truncation or sign changes occur, but this stops working when you make the cast explicit. So in practice, implicit casts actually allow more errors to be detected, especially in connection with fuzzing. Languages like Go or Rust would really need two types of casts to detect unexpected truncation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: