Undefined behavior and optimizations
C++ programmers want compilers to produce fast code without exploiting undefined behavior. But that is impossible, so (optimizing) compilers opt to produce fast code instead. Hence, to formally verify C++ programs are correct, we must prevent programs from having UB.
This page motivates informally the need for some UB, by showing that exploiting UB is necessary even for obvious optimizations. You are welcome to skip this page, but it will give intuition for other content in this guide.
We assume you are familiar with programming in C++.
A minimal example
We show undefined behavior on an example. For clarity, we pick a small example. This example might seem artificial, but it shows issues that affect real programs.
In the snippet below, the function foo
seems equivalent to
foo_opt
1, but the latter can avoid re-reading x
from
memory, so we expect optimizers to transform the former into the latter.
int foo() {
long x = 5;
bar();
return x;
}
int foo_opt() {
bar();
return 5;
}
Indeed, whatever bar()
does, this optimization preserves program behavior, as
defined by the C++ standard, and compilers indeed perform it.
However, that optimization changes the program behavior if bar()
looks like
the following strange example:
void bar() {
long y = 0;
long* px = &y + 5; // "5" is platform-specific
*px = 4;
}
On our test platform2, px
will contain the address of x
,
so without optimizations foo()
will return 4
; but with the optimization
above, foo()
returns 5
.
We claimed our optimization preserves program behavior, yet it appears to change
foo()
’s return value; how is that possible?
At the same time, this optimization is desirable; and if this small optimization
were illegal, would we have a chance to perform more complex optimizations?
As a solution, the C++ standard ensures that bar()
contains undefined
behavior (UB), and allows programs to implement undefined behavior arbitrarily.
In particular, bar()
is UB because it takes a pointer to a variable (y
),
increments it until it has the address of another variable (x
), then writes
through this pointer.
Footnotes
- 1
Based on this example.
- 2
At least on a specific compiler, according to Godbolt.