Skip to content

Commit 9295f7b

Browse files
committed
Add blog post about C/C++ ease of use.
1 parent 13ed56b commit 9295f7b

File tree

1 file changed

+70
-0
lines changed

1 file changed

+70
-0
lines changed
Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
layout: post
3+
title: "CHERI Myths: Writing C/C++ for CHERI is hard"
4+
date: 2024-08-22
5+
categories: cheri myths
6+
author: David Chisnall
7+
---
8+
9+
I've had several conversations over the past six months where people who have never written C/C++ code on CHERI have told me that they expect it to be harder than on non-CHERI systems.
10+
I struggle a bit to understand this.
11+
If it were true, using tools like [valgrind](https://valgrind.org) and [Address Sanitier](https://github.com/google/sanitizers/wiki/AddressSanitizer) would make development harder, which makes you wonder why these tools exist.
12+
13+
I recently wrote some C and C++ for a non-CHERI target and, honestly, I can't believe I used to do that regularly given how much harder it is.
14+
Even in environments with a fully working interactive debugger, writing working C/C++ is more effort than on CHERIoT where we don't (yet) have debugger support.
15+
16+
Imagine you have an off-by-one error that overflows a buffer.
17+
On non-CHERI systems, it's hard to track down.
18+
On the stack, it may be in padding and have no effect.
19+
It may have no effect in debug builds, but cause corruption in release builds where the stack layout is different.
20+
If it's in the heap, it may corrupt some unrelated object and the symptoms show up much later.
21+
I run in valgrind or address sanitiser and hopefully get a useful result.
22+
23+
On any CHERI target, I get a deterministic fault.
24+
Every time I read or write that one-byte-out-of-bounds value, I get the same fault.
25+
On [CheriBSD](https://www.cheribsd.org), I'd attach the debugger and see where it happened.
26+
On CHERIoT, until we get a working debugger, I'd include the (somewhat poorly named) [`fail-simulator-on-error.h`](https://github.com/CHERIoT-Platform/cheriot-rtos/blob/main/sdk/include/fail-simulator-on-error.h) header, which installs a default error handler.
27+
When the error is triggered, this prints the exact instruction that tried to read or write out of bounds.
28+
I'd then look in the dump file, which would tell me the line number, and fix it.
29+
This typically takes me a minute or two, if that.
30+
31+
Similarly, if I have a use-after-free error, there's some probability that address sanitiser will find it.
32+
Valgrind is a bit better, but is *very* slow.
33+
On CHERIoT, I get a trap as soon as I try to use the dangling pointer and I fix it in the same way as a spatial error.
34+
35+
Importantly, the CHERI exception happens *before* any data corruption.
36+
I'm not trying to work backwards from a point where my heap or stack is corrupted to try to find the place where the corruption occurred, I'm told exactly where the bug is.
37+
The first use of a dangling pointer or the first out-of-bounds access to an object will trigger a CHERI exception and point to precisely the instruction that is doing the wrong thing.
38+
39+
Note that all of this is about *incorrect* code.
40+
CHERI C and C++ try very hard to give you a standards-compliant (and de-factor standards-compliant, allowing things that the standard leaves open to implementations but everyone assumes are fine) implementation.
41+
Almost all of the C and C++ code that we've tried to run on CHERIoT has worked with no source-code modifications.
42+
Most of these are well-tested codebases, sometimes MISRA C with loads of static analyses run, which *probably* don't have any memory-safety bugs.
43+
44+
The things that cause CHERI traps are undefined behaviour in C/C++.
45+
When your program does something that is undefined behaviour, the space of possible behaviours is unbounded.
46+
You may get a segmentation fault.
47+
You may get arbitrary data corruption.
48+
You may get a totally unexpected sequence of instructions executed.
49+
Bugs that introduce undefined behaviour are the *hardest* to debug, because they mean that later code (or, in some exciting examples, [earlier code](https://devblogs.microsoft.com/oldnewthing/20140627-00/?p=633)) is all depending on properties that are not true and so can do absolutely anything.
50+
Trapping on these things, rather than corrupting state, is a *huge* improvement to the debugging experience.
51+
52+
If you're writing correct code, you probably won't notice the difference between CHERI and non-CHERI systems.
53+
If you're writing buggy code (which, let's face it, we all do, at least some of the time), CHERI lets you catch errors sooner.
54+
55+
We've heard from several of the companies that prototyped on [Morello](https://www.morello-project.org) that they want to keep their Morello systems for CI for precisely this reason: testing in Morello finds bugs earlier.
56+
57+
The 'shift-left' idea comes from the fact that bugs cost more the later they're found.
58+
If you can avoid bugs at the design time, that's perfect.
59+
If you can avoid them before you ship a product, that's good.
60+
If you can detect them in production and recover, that's okay.
61+
If you don't detect them and they impact customers, that's the worst (just ask CrowdStrike).
62+
Developing for a CHERI target makes it easy to find bugs before you ship them.
63+
It typically costs at least one order of magnitude less to fix them at this point than after deployment.
64+
65+
The 'shift-left' benefits for CHERIoT don't end at catching bugs early.
66+
If you compartmentalise your software, failures in production can become *recoverable* failures in production.
67+
For example, the CHERIoT network stack now [restarts the compartment that contains the FreeRTOS TCP/IP stack if it crashes](https://github.com/CHERIoT-Platform/network-stack/pull/27).
68+
From the perspective of the rest of the system, all connections drop (something that you have to handle anyway because networks are unreliable) and need to be reconnected.
69+
70+
All of this makes developing and shipping products cheaper on CHERI systems than on conventional hardware.

0 commit comments

Comments
 (0)