bug or expected behaviour?

I've been chasing down a bug for a few days in some RTOS code I'm building with clang. In the end it comes down to this code:

  void external(void);
  void test(char *x)
  {
      x--;
      if (!x)
      {
          external();
      }
  }

When compiled for ARM with TOT clang (r183249) at -O1 or greater, no calls to 'external()' appear in the output code at all. I have an example project I can send out if anyone's interested, but it's just the above source file and this makefile:

  .PHONY: all optimized unoptimized clean

  all: optimized unoptimized
    grep external *.s

  optimized:
    ./clang -S -O1 example.c -o example-optimized.s

  unoptimized:
    ./clang -S -O0 example.c -o example-unoptimized.s

  clean:
    rm -f *.s

When run, I get this output:

  $ make
  ./clang -S -O1 example.c -o example-optimized.s
  ./clang -S -O0 example.c -o example-unoptimized.s
  grep external *.s
  example-unoptimized.s: bl external

As you can see, 'example-optimized.s' contains no references to the 'external()' function. Am I missing something subtle here standard-wise? Or is this an optimizer bug? If it's a bug, what's the right component to file it in?

I've attached the complete assembly output (both example-optimized.s and example-unoptimized.s) in case that's helpful. Thanks for any insight!

Oh - almost forgot - version information from the compiler:

  $ ./clang --version
  clang version 3.4 (trunk 183249)
  Target: arm-none--eabi
  Thread model: posix

example-optimized.s (273 Bytes)

example-unoptimized.s (479 Bytes)

If this were a problem with an omitted statement involving a normal variable, I’d guess you’re missing a volatile qualifier. I’m not 100% sure volatile is a valid qualifier for functions, but try it.
If RTOS stands for real time OS, then reading up on volatile would be a really good idea.

P.S. Sorry Carl, you’re going to receive this twice. I forget to CC the list.

If this were a problem with an omitted statement involving a normal variable, I'd guess you're missing a volatile qualifier. I'm not 100% sure volatile is a valid qualifier for functions, but try it.

Well, yes, if I change the signature to:

  void test(char * volatile x)

It works, but that's because I'm hamstringing the optimizers. I don't really see how that has anything to do with the question, though. If I change the signature to:

        void test(int x)

It works too... what's special about 'char *'?

If RTOS stands for real time OS, then reading up on volatile would be a really good idea.

I'm familiar with 'volatile' semantics, thanks.

P.S. Sorry Carl, you're going to receive this twice. I forget to CC the list.

No problem.

-- Carl

I was suggesting to add it to the function, like
volatile void func(…);
Theoretically, this would tell the compiler not to omit seemingly superfluous calls to func.

If this were a problem with an omitted statement involving a normal variable, I’d guess you’re missing a volatile qualifier. I’m not 100% sure volatile is a valid qualifier for functions, but try it.

Well, yes, if I change the signature to:

void test(char * volatile x)

It works, but that’s because I’m hamstringing the optimizers. I don’t really see how that has anything to do with the question, though. If I change the signature to:

void test(int x)

It works too… what’s special about ‘char *’?

If RTOS stands for real time OS, then reading up on volatile would be a really good idea.

I’m familiar with ‘volatile’ semantics, thanks.

P.S. Sorry Carl, you’re going to receive this twice. I forget to CC the list.

No problem.

– Carl

'volatile' can't apply to a function, so I'm not sure what you mean. In your example, 'volatile' modifies the 'void'. So I think "theoretically", it doesn't do anything at all.

-- Carl

I tried the “extern” specifier, which (I guess) you should use if the definition isn’t in the file; and it worked with -O3.

Input code:
extern void external(void);
void test(char *x)
{
x–;
if (!x)
{
external();
}
}

Command:
clang -S -o volatile-test.s -O3 test.c

Output attached.

The only line of interest is probably:
jmp external # TAILCALL

extern-test.s (326 Bytes)

I tried the "extern" specifier, which (I guess) you should use if the definition isn't in the file; and it worked with -O3.

That 'extern' doesn't do anything - it's implicit. Did you try without it and get different results? In my test here with the Mac OS X dev tools version of clang, I got the expected results, both for x86 and for ARM. I guess it's possible it's ARM related. What version of clang/llvm did you use for your test?

-- Carl

Hi Carl,

I don't know much about the specifics of any given optimisation, but if you do something like the following, you can at least see which optimisation pass is responsible.
At the very least perhaps that can point to the next place to investigate. (also interesting to note Apple's shipped clang doesn't exhibit this behaviour, I had to use my own recent svn tree)

$ clang -arch arm -emit-llvm -S -O0 example.c -o arm.bc
$ opt -print-after-all -O1 arm.bc

I can see where the branch test is whittled down to an 'if true' and then eliminated, but cannot comment as to why this might be.

Cheers,
-DavidM

From reading C11, I can posit a potential spec explanation:

Start with x--. Per C11:
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

The optimizer can therefore conclude that if this program has well-defined behavior, then x can never point to the null pointer constant (since the null pointer constant is not part of any array object). As a result, the "if (!x)" branch would never trigger, and is dead code.

So this doesn't look like an invalid optimization per C11, but that doesn't mean that it's necessarily desired behavior in the optimizer.

Nope - rebuilding clang/llvm to get intel support shows the same behaviour with both '-arch i386' and '-arch x86_64', too.

-- Carl

Hi Carl,

I don't know much about the specifics of any given optimisation, but if you do something like the following, you can at least see which optimisation pass is responsible.

Thanks for the example; it's going to take me a while to figure out what the output really means, though!

At the very least perhaps that can point to the next place to investigate. (also interesting to note Apple's shipped clang doesn't exhibit this behaviour, I had to use my own recent svn tree)

Yeah, I noticed that as well, so it's definitely a behaviour change.

$ clang -arch arm -emit-llvm -S -O0 example.c -o arm.bc
$ opt -print-after-all -O1 arm.bc

I can see where the branch test is whittled down to an 'if true' and then eliminated, but cannot comment as to why this might be.

I think I see that too. I'll have to stare at that output some more to see if it yields any insight.

-- Carl

Thanks Joshua,

A bit of spec reading made me think something like that was going on. In the land of embedded systems, zero is often a valid pointer, though, so being able to avoid this kind of optimization would be nice. That said, if that's the way clang is going to be, it's certainly possible to work around.

In this specific case, the OS vendor is doing something (IMHO) terrible - aliasing a 'char *' field of a queue structure via a #define statement to act as a recursive mutex counter. Something like:

    #define recursive_mutex_counter some_field_not_used_by_a_mutex
    struct queue
    {
       char *some_field_not_used_by_a_mutex;
    };

    void mutexReleaseRecursive(struct queue *q)
    {
        q->recursive_mutex_counter--;

        if (q->recursive_mutex_counter == 0)
            queue_push_back(q, some_dummy_variable);
    }

Which, as I'm sure we can all agree here is heinous. It's easily worked around without changing the vendor's funny-business overloading:

    q->recursive_mutex_counter = (char *)((uint32_t)q->recursive_mutex_counter - 1);

And I can send them a patch to that effect. I guess what I'm looking for is either for clang to make it possible to avoid this kind of optimization, to change the optimizer so it doesn't happen in the first place, or for someone on the list to give me a canonical answer that clang is going to continue behaving this way in the future, so I have something to point to when the OS guys try to push back (I've found other standard-violating behaviour that they've declined to fix).

Any input or suggestions from the list are welcome and appreciated!

The optimizer can therefore conclude that if this program has well-defined behavior, then x can never point to the null pointer constant (since the null pointer constant is not part of any array object). As a result, the "if (!x)" branch would never trigger, and is dead code.

This is correct: in C you can't create a null pointer by decrementing a valid pointer. The code in question is dangerous and wrong, and needs to be reviewed to look for other similar problems.

John

OK, cool, thanks.

Why no warning or static analyzer error? Other "this comparison is always true" or "this comparison is always false" warnings exist, right?

-- Carl

There’s no warning in the frontend, because this is not “locally” obvious – we would need (very simple, in this case) reasoning about control flow to see this, and we don’t have many such checks, because they’re significantly more expensive than local checks.

There’s no warning from the middle-end optimizers which remove the check, because we don’t like the optimizer to produce warnings (they have stability issues and generally give a very poor diagnostic experience).

The static analyzer would be a good place to warn on this, but presumably it’s just not been taught to issue this warning yet.

If anyone cares to back me up, I filed a bug and submitted a patch to the OS in question, found here:

  FreeRTOS Real Time Kernel (RTOS) / Bugs / #66 Fix pointer arithmetic misbehaviour in queue.c

-- Carl