Does the optimization of ub code have problem

hi, I met such a test case

int val = 1;
int other_val = 1;
int *p = &other_val;

void func_main(void);
void func_bad();

void func_main() {
  int arr[8];
  if (&val == p)
    func_bad();
  else {
    if (arr[0]) {   // read uninitialized memory
      char f;
    }
  }
}
void func_bad() { *p = 0; }

(clang-17.0.6 -O2) generate the following binary for func_main:

func_main:
 movl   $0x0,0x2ed6(%rip)        # 4010 <val>
 ret
 nopl   0x0(%rax,%rax,1)

It seems that the compiler found the UB in the if-false branch, thus it concluded the if-condition to be true, i.e., &val == p. So when compiler chose to inline the invoke of func_bad, it just replaced *p with val, which is a violation of the truth.

Does this optimization cross some lines? Is there any need to preserve program semantics as much as possible faced with undefined behavior?

This is perfectly normal for UB optimizations. If you haven’t seen it already Chris Lattner wrote a good blog on how the compiler thinks about these things: What Every C Programmer Should Know About Undefined Behavior #1/3 - The LLVM Project Blog. Well worth reading.

There are a couple of things worth pointing out for this code specifically though. First, what’s happening…

What the compiler’s actually doing is noticing that if you go down the else branch then you inevitably execute UB, so the only way the program can possibly be valid is if &val == p. It marks the else path as unreachable and then the usual tidy-up optimizations remove it completely.

This kind of tidy-up is important for templated code where unused paths are designed to be eliminated, but there’s no way for the compiler to tell the two situations apart.

Second, the condition isn’t necessarily false anyway. External code could change p before func_main runs. Even if the function is actually main you can get global constructors to run first.

If you make the pointer int * const p = &other_val so that can’t happen then LLVM spots this and makes the entire function unreachable since both branches are impossible (not useful in this small example, but can lead to further optimizations in code that calls func_main).