New Safe C++ Proposal

Pages: 12 3... 6

There is now a Safe C++ proposal that extends C++ by defining a superset of the language that can be used to write code with strong safety guarantees. See:
https://www.infoq.com/news/2024/10/safe-cpp-proposal/

deleted account xyzzy (5768)

Rather lean on details, but an interesting proposal all the same. Make it a 3rd party library like Boost and I'd be willing to consider using it as long as there are some documentation and examples on usage and what could go wrong.

jonnin (11474)

sounds like the managed code idea, 3.0
The idea is fine, but so far implementations (including java and other 'new' languages) of safer coding cut out too much. Its one thing to prevent abject stupidity (which c++ does currently) and another to remove swaths of the language because they could be used poorly or to add molasses to everything to ensure no screw ups (eg vectors where [x] is banned and only .at(x) is allowed, adding a major performance hit for unnecessary checks). If you gut it far enough to be absolutely safe, there won't be much left in terms of power and performance. If you do a half measure, you still have bugs and security flaws. There is no win here that works -- this kind of thing requires a new language with safety first, performance second designs from the get-go, IMHO.

That said, maybe take a page from python and just fork the language over to a new one that keeps the c++ syntax mostly intact but reworks everything with a safety first design. That would be very doable; in the example above you now have [] syntax but its doing the .at() work.

Last edited on

helios (17607)

sounds like the managed code idea, 3.0

This doesn't sound like managed code a la Java or C#, this sounds like prohibiting undefined behavior through an appropriate type system like Rust does.

vectors where [x] is banned and only .at(x) is allowed, adding a major performance hit for unnecessary checks

Nah. std::vector::operator[]() not doing bounds checking is a mistake informed by the sorry state of compilers at the time the decision was made. Nowadays, between the advent of speculative processors and the major advances to compiler technology developed from formal methods since the '80s and '90s, it makes no sense at all to expose the programmer to the risk of undefined behavior for the tiny performance cost of a simple bounds check. Modern compilers are smart enough that they can prove the index parameter will never exceed the bounds of the vector and eliminate the check altogether in those cases. Not in every case, but in enough cases to not make it worth it to not have the check.

If you gut it far enough to be absolutely safe, there won't be much left in terms of power and performance.

Rust proves that's not true. I'm not its greatest fan, but I'm not going to argue that rustc generates slow code. It's a major load off your mind not having to be on constant alert about avoiding footguns.
(It does come at a mental cost of a different kind, but that's an unrelated issue.)

jonnin (11474)

I agree with that! What I was trying to say is that you need a new (or different, not c++) language, that applying all this on top of C++ specifically is troublesome.

Smarter compilers works for some of the issues, that is true. Even with those tools, though, I still think trying to overlay this onto c++ is going to end up akin to managed code (specifically Microsoft's c++ managed framework, not the other languages, where it prevents using some of the language).

If all it does is detect UB, then I am wrong. But it looks like its deeper than that. Maybe I am reading too much into "where only a rigorously safe subset of C++ is allowed."

Last edited on

helios (17607)

I read this as "we want to have Rust-like safety assurances, but we want to still work with C++". Then the goal is not to detect UB, but rather to forbid it either by defining previously undefined behavior, or by making operations that lead to UB forbidden. For example, one simple way to eliminate some UB is to not allow pointer arithmetic. If you want to handle arrays then you must know how long they are.

where only a rigorously safe subset of C++ is allowed.

Yes, it's like an inversion of Rust's unsafe. C++ code is already "unsafe" (since it allows doing unsafe things), so you introduce a new "safe" context where only safe operations are allowed, and which adds new language constructs for support.

applying all this on top of C++ specifically is troublesome

Language extensions like this are difficult to pull off, for sure. But you never know. C++ was born out of C like this, after all.
What I'm worried about is that right now there's multiple initiatives like this and it's uncertain which ones will reach a feature-complete state and which ones (if any) will become run-away successes, and that it feels like development in this space is progressing too slowly to be useful. I don't think C++ will pass into irrelevance for a long time, but if this takes too long the standard practice for developing safe code may become "write in Rust what you can and in C++ what you must".

zapshe (1954)

Never understood this idea of fearing the all-powerful "unsafe code". If you're using C++, I don't see many situations where a person is forced to use unsafe code.

C++ doesn't feel anymore unsafe to me than, let's say, Python. You could have a Python program that works fine for weeks, then crashes because some code you wrote that it hadn't gotten to yet was incorrect.

You're always allowed to be unsafe, but there's always safe approaches.

The feature I really want in C++ is better error messages that can actually point you to the line of code that caused the issue... C# is really good with that.

helios (17607)

If you're using C++, I don't see many situations where a person is forced to use unsafe code.

In the sense being used here, C++ is unsafe, period. It has no facilities to separate safe from unsafe constructs (and in fact doesn't define which is which), and does not require the programmer to use safe constructs only.

C++ doesn't feel anymore unsafe to me than, let's say, Python. You could have a Python program that works fine for weeks, then crashes because some code you wrote that it hadn't gotten to yet was incorrect.

Importantly though, the Python program crashes. It doesn't corrupt its own state and keep going, allowing input data to execute arbitrary code.
Safe code is not about not crashing, it's about predictability. Too many operations in C++ have unpredictable consequences that can't be reasoned about at compile time.

You're always allowed to be unsafe, but there's always safe approaches.

Yes, you always can write correct code instead of incorrect code. The point of safety features is to disallow whole classes of bugs entirely through automated and rigorous checking. It's the same reason nobody recommends using new and delete anymore; you can use them correctly, but it's been proven that over a long enough time frame you'll make a mistake. The compiler will never forget to release your memory or release it twice.

jonnin (11474)

ignoring idiocy like out of bound index or blown pointers or the like, what I would care to see is more like java where the same code does the same thing on all reasonably up to date compilers and OS etc. Currently code that works great on one compiler and OS can go a bit nuts on another setup in c++ ... a lot less of that than "the old days" but plenty of it to go around. If I were to focus on c++ safety, it would be a mix of defining and trapping UB such that when things go south the program exits gracefully rather than trudge on (potentially running spliced in hack code) and standardization on the executable side (such that the same code does the same thing as I said above).

Screw ups that just crash don't bother me. Its the subtle things where you can send the instruction pointer off into the ether that needs a lot of work. And to be fair, a lot of that is machine level ... you can fubar the instruction pointer in any program in any language, if the conditions are ripe enough.

zapshe (1954)

Importantly though, the Python program crashes. It doesn't corrupt its own state and keep going, allowing input data to execute arbitrary code.

I suppose this is possible, but usually C++ will crash as well... if you're using safer methods. An array may not crash, a vector will.

Too many operations in C++ have unpredictable consequences that can't be reasoned about at compile time.

I can't say I remember running into any issues like this, though I haven't used C++ for anything crazy. I have run into such unpredictability issues with x86 though.

Safe code is not about not crashing, it's about predictability.

Biggest reason Javascript is unsafe..

helios (17607)

I suppose this is possible, but usually C++ will crash as well... if you're using safer methods. An array may not crash, a vector will.

That's the thing about UB. All behaviors are permissible.

I can't say I remember running into any issues like this, though I haven't used C++ for anything crazy.

You don't need to do anything crazy. One time I corrupted my memory and crashed somewhere unrelated by switching on an uninitialized variable.

Biggest reason Javascript is unsafe..

JS is safe. Barring bugs in the runtime, a JS program cannot be used to take control of the machine.

jonnin (11474)

You can get JS to write bad files, a batch or shell script or similar, and then run it, which can do whatever. I am not sure how much of that is tolerated via a browser JS script, but a command line executed script can do some damage.

I like node js. Its a great little language esp for parallel processing without the fuss.

Last edited on

seeplus (6616)

The issue with UB is that it's UB. You don't know what's going to happen and what happens can easily change between different compiler versions and between different vendors. If a program with UB seems 'to work' with one version of a compiler, with the next version the program could format the disk - as the behaviour isn't defined it could be anything and still be valid!

helios (17607)

You can get JS to write bad files, a batch or shell script or similar, and then run it, which can do whatever.

By that standard no language is safe. You can use any language to write a VM and make it execute untrusted input.
The difference is that in JS you have to explicitly code that capability in, and if it's not there it can't be used. There's a reason eval() is considered a dangerous function. Meanwhile in C++ an attacker can use a buggy program for arbitrary code execution, regardless of what the program did originally. If the program processes untrusted input it's potentially exploitable.

Just like static and strong typing, memory safety is not about making buggy software impossible, it's about making a certain class of bugs impossible.

deleted account xyzzy (5768)

Years ago MS created "safe" versions of C stdlib string library functions that manipulated strings/char arrays, they could be exploited. C11 officially added safe versions of the functions, distinguishable from the unsafe versions by adding _s suffix to the function name Same as what MS did:

https://en.cppreference.com/w/c/string/byte
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/strcpy-s-wcscpy-s-mbscpy-s?view=msvc-170

The C++ version of the C string library continues to having only the earlier unsafe C versions.

Even the C stdlib safe versions do not create bullet-proof executables against being exploited by a determined bad actor or doing unintended operations, they just theoretically make it harder to do. Though they still can have UB.

Look at what is happening with the Windows 11 24H2 update to see what can happen. *ouch!*

zapshe (1954)

One time I corrupted my memory and crashed somewhere unrelated by switching on an uninitialized variable.

Yea, but you crashed! Yes, likely the operating system closed your program, but how much damage are you likely to do on a modern OS? Particularly since such UB programs have been used in cyber attacks, so Windows looks out for them.

Maybe that would make a good video - testing how much damage you can do to a computer with UB code.

Barring bugs in the runtime, a JS program cannot be used to take control of the machine.

Maybe not the machine, but could definitely corrupt or crash your browser.

I mean, undefined behavior is just a state in which the code's behavior cannot be predicted - you still have to write such bad code. Is it really that much different than simply writing bad code that would be defined.

Of course, in this case, the code would theoretically have defined behavior, but the complexity of the code may make it difficult to actually determine what behaviors are possible.

Not that I'd mind a safer C++, but I wouldn't go out of my way to get it.

helios (17607)

Yea, but you crashed!

Yeah, somewhere else. An indeterminate amount of time after the corruption happened. That's almost as bad as not crashing ever.

but how much damage are you likely to do on a modern OS?

Forget about getting attacked. The fact that the program continues running after its state is corrupt means the damage it can do to its own (and your) data is unbounded. Imagine a program with corrupted memory that starts issuing write commands to a database with garbage. How long will such a program continue running unsupervised it's killed or crashes? The answer is it's unknowable.

Maybe not the machine, but could definitely corrupt or crash your browser.

Again, barring bugs in the runtime, that's impossible, other than by excessive allocation, perhaps. Not sure what kind of policies browsers have in place for that.

zapshe (1954)

An indeterminate amount of time after the corruption happened. That's almost as bad as not crashing ever.

Well, again, you'd have to write the code poorly to get to this point. I don't consider it easy to corrupt your own memory - especially when modern compilers won't even allow you to use uninitialized variables.

That said, I do recall my 3-star programming professor write code so inconceivable, that somehow the creation of a class variable did not trigger a constructor call (I don't remember the details) and therefore a variable did not get initialized but was still used.

I don't think it was like this, but this example shows how it could happen:

#include <iostream>

class MyClass
{
public:

    int a;

    MyClass()
    {
        a = 0;
        std::cout << "Constructor called" << std::endl;
    }
    void Display() {
        std::cout << "Displaying: " << a << std::endl;
    }
};

int main()
{
    // Allocate raw memory for MyClass
    char buffer[sizeof(MyClass)];

    // Create a pointer to the raw memory
    MyClass* myClassPtr = reinterpret_cast<MyClass*>(buffer);

    myClassPtr->Display();

    return 0;
}

This being one reason I recommend just setting the variables to a default value when you're declaring them in the class rather than writing out "a = 0;" in the constructor and wasting your time. I know that wouldn't work for this example, but it did work for the 3-star professor's code which I don't recall.

But you don't get to this point on accident, you have to go out of your way to do things the "wrong" (dangerous) way.

Also, I don't think I've used any language that didn't allow for race conditions (undefined behavior) when using multi-threading - arguably the easiest thing to make a mistake on while coding.

I think if you're coding so poorly that you cause serious and damaging undefined behavior, you were probably gonna shoot yourself in the foot with defined behavior too.

Last edited on

jonnin (11474)

heh, that is a good one.
To be fair, raw byte reshapes have always been risky -- its (part of) why they nerfed unions into oblivion. Reinterpret cast is on par with unioning something, and yea, you don't get ctor/dtor with those in all cases. I won't do it anymore with the single exception of into and out of raw byte arrays, and then only at great need.

Last edited on

helios (17607)

Well, again, you'd have to write the code poorly to get to this point.

If everyone just wrote code perfectly, we wouldn't need any kind of checking at compile time. We add static and strong typing because we know people's judgment will eventually lapse. It's a matter of when, not if. If you can, you will eventually write shitty code that will behave weirdly.

This being one reason I recommend just setting the variables to a default value when you're declaring them in the class rather than writing out "a = 0;" in the constructor and wasting your time.

It doesn't matter. The compiler will put those initializations in the constructor, so if the constructor doesn't get called, they will not run.

Also, I don't think I've used any language that didn't allow for race conditions (undefined behavior) when using multi-threading - arguably the easiest thing to make a mistake on while coding.

Rust doesn't let you, due to the way the borrow checker works. To share state between threads you have to put it behind an Arc and wrapped by a Mutex.

MyClass* myClassPtr = reinterpret_cast<MyClass*>(buffer);

I've done something vaguely like that (only safeish, with placement new) one time when I needed to create a bunch of objects and I needed to allocate everything and then construct it.

Pages: 12 3... 6

C++

Forum

New Safe C++ Proposal