Undefined behavior and optimizations

C++ programmers want compilers to produce fast code without exploiting undefined behavior. But that is impossible, so (optimizing) compilers opt to produce fast code instead. Hence, to formally verify C++ programs are correct, we must prevent programs from having UB.

This page motivates informally the need for some UB, by showing that exploiting UB is necessary even for obvious optimizations. You are welcome to skip this page, but it will give intuition for other content in this guide.

We assume you are familiar with programming in C++.

A minimal example

We show undefined behavior on an example. For clarity, we pick a small example. This example might seem artificial, but it shows issues that affect real programs.

In the snippet below, the function foo seems equivalent to foo_opt1, but the latter can avoid re-reading x from memory, so we expect optimizers to transform the former into the latter.

int foo() {
  long x = 5;
  bar();
  return x;
}

int foo_opt() {
  bar();
  return 5;
}

Indeed, whatever bar() does, this optimization preserves program behavior, as defined by the C++ standard, and compilers indeed perform it.

However, that optimization changes the program behavior if bar() looks like the following strange example:

void bar() {
  long y = 0;
  long* px = &y + 5; // "5" is platform-specific
  *px = 4;
}

On our test platform2, px will contain the address of x, so without optimizations foo() will return 4; but with the optimization above, foo() returns 5.

We claimed our optimization preserves program behavior, yet it appears to change foo()’s return value; how is that possible? At the same time, this optimization is desirable; and if this small optimization were illegal, would we have a chance to perform more complex optimizations?

As a solution, the C++ standard ensures that bar() contains undefined behavior (UB), and allows programs to implement undefined behavior arbitrarily. In particular, bar() is UB because it takes a pointer to a variable (y), increments it until it has the address of another variable (x), then writes through this pointer.

Footnotes

1

Based on this example.

2

At least on a specific compiler, according to Godbolt.