Thumb-2 code generation error in Apple LLVM at all optimization levels

This would be best reported to Apple's Radar bug database at
http://bugreport.apple.com/ but its whole website has been down for a
while.

I have a 100% reproducible Thumb-2 code generation error that occurs
at all of the levels of optimization available in the Xcode 4.2 for
Snow Leopard build settings GUI: -O0, -O1, -O2, -O3 and -Os.

However the bad machine code only occurs in Release builds, never in
Debug builds! I tried the Debug builds at all levels of optimization
as well.

   $ xcodebuild -version
   Xcode 4.2
   Build version 4C199

   $ /Developer/Platforms/iPhoneOS.platform/Developer/usr/bin/clang --version
   Apple clang version 3.0 (tags/Apple/clang-211.9) (based on LLVM 3.0svn)
   Target: i386-apple-darwin10.8.0
   Thread model: posix

I'm not real clear where to find the part of the toolchain that emits
the Thumb-2 assembly, so I can't tell you that tool's precise version.

   $ uname -a
   Darwin frylock.local 10.8.0 Darwin Kernel Version 10.8.0:
   Tue Jun 7 16:33:36 PDT 2011;
   root:xnu-1504.15.3~1/RELEASE_I386 i386

The Xcode's iPhone and iPad Simulators run iOS Apps that on my 32-bit
MacBook Pro are built as i386 code. The iOS frameworks (shared
libraries, sort of) that simulated Apps link to are actually shims
that interface to Mac OS X's frameworks.

The i386 code for my simulated App is generated correctly at -Os for
both Release and Debug builds. That suggests that the problem is in
the Thumb-2 code generation back-end, and not in the LLVM IR.

I've seen lots of reports that the Thumb code that the Apple LLVM
compiler generates for ARMv6 is quite buggy, so that one must disable
Thumb code generation for ARMv6 targets. However my first-generation
iPad has a Cortex A8 CPU, which is ARMv7, as does my iPhone 4.

It's quite possible that disabling Thumb code generation for at least
this one source file will correct the bad machine code, but Google has
not blessed me with the insight as to how to do that. It's not done
the same way for LLVM as for GCC. Have any of you this insight to
spare?

It's going to take me a little while to cook up a minimal test case as
I was up all night <strike>trolling the Internet</strike> working on
my iOS App, so I'm pretty beat. But when I have more details for you,
I will post a more detailed report as well as a minimal test case that
builds as a complete iOS App at what is now just a placeholder page:

   Apple Xcode 4.2 LLVM Compiler Bug Reports
   http://www.dulcineatech.com/bug-reports/xcode/4.2/llvm/

My App Warp Life is so named because it goes very, very fast, with
many more optimizations coming soon. The UI has a speed control
slider whose value is scaled, then pass to the usleep() iOS system
call. usleep() suspends the process for the given number of
microseconds.

I realized just recently that calling usleep with delays that
themselves are insignificant might actually slow my App down quite a
bit, because there is all manner of overhead to making and returning
from even the most trivial system calls. After measuring my game's
frame rate at the best optimizations I could find, for various kinds
of test data, I set a threshhold of 1/250th of a second. I never call
usleep() if the configured delay setting is less than that.

The full source of the entire method, and the Release and Debug build
assembly codes are at the end of this mail. For clarity I show only
the pertinent lines of code right here:

   useconds_t usecs = (useconds_t)( self.delay * (float)500000 );
    
   if ( usecs >= 4000 ){ // ~ 1/250 sec
      usleep( usecs ); // usecs is ZERO!!!!
   }

self.delay is an Objective-C 2.0 property that holds the current value
of the speed slider. When set to maximum speed, usecs will always be
zero. Even so, the branch is ALWAYS taken, despite the source code
ensuring that the branch is only taken when usecs is greater than or
equal to four thousand.

Here is the Thumb-2 assembly for the Release build.

I think the (float)500000 delay scaling factor is meant to be held in
floating point register d8. I thought at first it might not be
initialized at all, but upon closer examination I think it may
actually be initialized from a program counter-relative 32-bit .long
constant immediately following my method's code.

  .loc 1 388 3
  ldr r0, [r5]
  ldr r1, [r4, r0]
  adds r1, #1
  str r1, [r4, r0]
  .loc 1 390 64
  mov r0, r4
  ldr r1, [r6]
  blx _objc_msgSend
  vmov s0, r0
  vmul.f32 d0, d0, d8
  vcvt.u32.f32 d0, d0
  vmov r0, s0
Ltmp272:
  .loc 1 392 9
  cmp.w r0, #4000
Ltmp273:
  .loc 1 393 13
  it hs
  blxhs _usleep

cmp.w *looks* like a 16-bit comparison with an immediate constant, but
in reality the constant is twelve bits. The ARM and Thumb instruction
sets have quite severe restrictions on the allowed ranges of immediate
values because the richness of the ARM and Thumb instruction set makes
it hard to find enough bits in the instruction words to express a
wider range of immediate values than is presently possible.

I don't know what the "it hs" instruction does. I suspect that's
where the problem lies, but "it" is a very common word, and "hs" is
quite common as well, as it is a frequent mispelling for "has".
Perhaps someone who knows Thumb-2 assembly better than I do could
comment.

The assembly for my Debug build is quite unlike that for the Release
build, for every single one of the available optimization levels.
There are quite a few instructions separating the load of the #4000
immediate into r0 and the call to usleep().

I have not yet ensured that there aren't build configuration
differences between my Debug and Release builds, but I don't recall
setting any. My guess is that the totally different machine code in
Debug is there to make source code debugging work better.

Here is my method's full Objective-C source:

- (void) cycleContinuously
{
  startDate = [[NSDate alloc] init];
  generation = 0;
  
  while ( mRunning ){
    [self cycle];
    
    ++generation;
    
    useconds_t usecs = (useconds_t)( self.delay * (float)500000 );
    
        if ( usecs >= 4000 ){ // ~ 1/250 sec
            usleep( usecs );
        }
  }
    
  NSDate *endDate = [[NSDate alloc] init];
  
  NSTimeInterval elapsed = [endDate timeIntervalSinceDate: startDate];
  
  [startDate release];
  [endDate release];
  
  printf( "Speed: %f gen/sec\n", ( (float)generation ) / elapsed );

  return;
}

The assembly for the problem area of my code is completely identical
for each available optimization setting for Release builds. I haven't
made such detailed comparisons for the Debug builds yet.

Here is the Release assembly at -Os:

  .align 2
  .code 16
  .thumb_func "-[LifeGrid cycleContinuously]"
"-[LifeGrid cycleContinuously]":
Ltmp265:
Lfunc_begin24:
  .loc 1 380 0
  .loc 1 380 1 prologue_end
  push {r4, r5, r6, r7, lr}
  add r7, sp, #12
  push.w {r8, r10, r11}
  vpush {d8}
  sub sp, #4
  .loc 1 382 2
Ltmp266:
  movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_7-(LPC24_0+4))
Ltmp267:
  mov r4, r0
Ltmp268:
  movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_7-(LPC24_0+4))
  movw r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_62-(LPC24_1+4))
  movt r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_62-(LPC24_1+4))
LPC24_0:
  add r1, pc
LPC24_1:
  add r0, pc
  ldr r1, [r1]
  ldr r0, [r0]
  blx _objc_msgSend
  movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC24_2+4))
  movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC24_2+4))
LPC24_2:
  add r1, pc
  ldr r1, [r1]
  blx _objc_msgSend
  movw r11, :lower16:(_OBJC_IVAR_$_LifeGrid.startDate-(LPC24_3+4))
  movt r11, :upper16:(_OBJC_IVAR_$_LifeGrid.startDate-(LPC24_3+4))
LPC24_3:
  add r11, pc
  ldr.w r1, [r11]
  .loc 1 383 2
  movw r5, :lower16:(_OBJC_IVAR_$_LifeGrid.generation-(LPC24_4+4))
  movt r5, :upper16:(_OBJC_IVAR_$_LifeGrid.generation-(LPC24_4+4))
LPC24_4:
  add r5, pc
  .loc 1 382 2
  str r0, [r4, r1]
  movs r1, #0
  .loc 1 383 2
  ldr r0, [r5]
  .loc 1 385 2
  movw r8, :lower16:(_OBJC_IVAR_$_LifeGrid.mRunning-(LPC24_5+4))
  movt r8, :upper16:(_OBJC_IVAR_$_LifeGrid.mRunning-(LPC24_5+4))
LPC24_5:
  add r8, pc
  .loc 1 383 2
  str r1, [r4, r0]
  .loc 1 385 2
  ldr.w r0, [r8]
  ldrb r0, [r4, r0]
  cbz r0, LBB24_3
Ltmp269:
  .loc 1 386 3
  movw r10, :lower16:(L_OBJC_SELECTOR_REFERENCES_59-(LPC24_6+4))
  vldr.32 s16, LCPI24_0
  movt r10, :upper16:(L_OBJC_SELECTOR_REFERENCES_59-(LPC24_6+4))
  .loc 1 390 64
  movw r6, :lower16:(L_OBJC_SELECTOR_REFERENCES_64-(LPC24_7+4))
  movt r6, :upper16:(L_OBJC_SELECTOR_REFERENCES_64-(LPC24_7+4))
  .loc 1 386 3
LPC24_6:
  add r10, pc
  .loc 1 390 64
LPC24_7:
  add r6, pc
LBB24_2:
Ltmp270:
  .loc 1 386 3
  ldr.w r1, [r10]
Ltmp271:
  mov r0, r4
  blx _objc_msgSend
  .loc 1 388 3
  ldr r0, [r5]
  ldr r1, [r4, r0]
  adds r1, #1
  str r1, [r4, r0]
  .loc 1 390 64
  mov r0, r4
  ldr r1, [r6]
  blx _objc_msgSend
  vmov s0, r0
  vmul.f32 d0, d0, d8
  vcvt.u32.f32 d0, d0
  vmov r0, s0
Ltmp272:
  .loc 1 392 9
  cmp.w r0, #4000
Ltmp273:
  .loc 1 393 13
  it hs
  blxhs _usleep
Ltmp274:
  .loc 1 385 2
  ldr.w r0, [r8]
  ldrb r0, [r4, r0]
  cmp r0, #0
  bne LBB24_2
LBB24_3:
Ltmp275:
  .loc 1 382 2
  movw r0, :lower16:(L_OBJC_SELECTOR_REFERENCES_7-(LPC24_8+4))
  movt r0, :upper16:(L_OBJC_SELECTOR_REFERENCES_7-(LPC24_8+4))
LPC24_8:
  add r0, pc
  .loc 1 397 41
  ldr r1, [r0]
Ltmp276:
  .loc 1 382 2
  movw r0, :lower16:(L_OBJC_CLASSLIST_REFERENCES_$_62-(LPC24_9+4))
  movt r0, :upper16:(L_OBJC_CLASSLIST_REFERENCES_$_62-(LPC24_9+4))
LPC24_9:
  add r0, pc
  .loc 1 397 41
  ldr r0, [r0]
  blx _objc_msgSend
  .loc 1 382 2
  movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_-(LPC24_10+4))
  movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_-(LPC24_10+4))
LPC24_10:
  add r1, pc
  .loc 1 397 41
  ldr r1, [r1]
  blx _objc_msgSend
  .loc 1 399 69
  movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_66-(LPC24_11+4))
  movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_66-(LPC24_11+4))
  .loc 1 397 41
  mov r6, r0
  .loc 1 399 69
  ldr.w r0, [r11]
LPC24_11:
  add r1, pc
  ldr r1, [r1]
  ldr r2, [r4, r0]
  mov r0, r6
  blx _objc_msgSend
  str r0, [sp]
  .loc 1 401 2
  movw r8, :lower16:(L_OBJC_SELECTOR_REFERENCES_68-(LPC24_12+4))
  movt r8, :upper16:(L_OBJC_SELECTOR_REFERENCES_68-(LPC24_12+4))
  ldr.w r0, [r11]
LPC24_12:
  add r8, pc
  .loc 1 399 69
  mov r10, r1
  .loc 1 401 2
  ldr.w r1, [r8]
  ldr r0, [r4, r0]
  blx _objc_msgSend
  .loc 1 402 2
  ldr.w r1, [r8]
  mov r0, r6
  blx _objc_msgSend
  .loc 1 404 2
  ldr r0, [r5]
  add r0, r4
  vldr.32 s0, [r0]
  vcvt.f32.s32 d0, d0
  .loc 1 399 69
  ldr r0, [sp]
  vmov d17, r0, r10
Ltmp277:
  .loc 1 404 2
  movw r0, :lower16:(L_.str69-(LPC24_13+4))
  movt r0, :upper16:(L_.str69-(LPC24_13+4))
  vcvt.f64.f32 d16, s0
LPC24_13:
  add r0, pc
  vdiv.f64 d16, d16, d17
  vmov r1, r2, d16
  blx _printf
Ltmp278:
  .loc 1 407 1
  add sp, #4
  vpop {d8}
  pop.w {r8, r10, r11}
  pop {r4, r5, r6, r7, pc}
Ltmp279:
  .align 2
LCPI24_0:
  .long 1223959552
Ltmp280:
Lfunc_end24:
Ltmp281:
Leh_func_end24:

Here is the Debug assembly at -Os:
  .align 2
  .code 16
  .thumb_func "-[LifeGrid cycleContinuously]"
"-[LifeGrid cycleContinuously]":
Ltmp112:
Lfunc_begin24:
  .loc 1 380 0
  push {r4, r7, lr}
  add r7, sp, #4
  sub sp, #44
  mov r4, sp
  bic r4, r4, #7
  mov sp, r4
  movs r2, #0
  movt r2, #0
  str r0, [sp, #40]
  str r1, [sp, #36]
  .loc 1 382 2 prologue_end
Ltmp113:
  ldr.n r0, LCPI24_4
LPC24_4:
  add r0, pc
  ldr r0, [r0]
  ldr.n r1, LCPI24_3
LPC24_3:
  add r1, pc
  ldr r1, [r1]
  str r2, [sp, #12]
  blx _objc_msgSend
  ldr.n r1, LCPI24_2
LPC24_2:
  add r1, pc
  ldr r1, [r1]
  blx _objc_msgSend
  ldr r1, [sp, #40]
  ldr.n r2, LCPI24_1
LPC24_1:
  add r2, pc
  ldr r2, [r2]
  add r1, r2
  str r0, [r1]
  .loc 1 383 2
  ldr r0, [sp, #40]
  ldr.n r1, LCPI24_0
LPC24_0:
  add r1, pc
  ldr r1, [r1]
  add r0, r1
  ldr r1, [sp, #12]
  str r1, [r0]
LBB24_1:
  .loc 1 385 2
  ldr r0, [sp, #40]
  movw r1, :lower16:(_OBJC_IVAR_$_LifeGrid.mRunning-(LPC24_14+4))
  movt r1, :upper16:(_OBJC_IVAR_$_LifeGrid.mRunning-(LPC24_14+4))
LPC24_14:
  add r1, pc
  ldr r1, [r1]
  ldrb r0, [r0, r1]
  movs r1, #0
  cmp r0, #0
  it ne
  movne r1, #1
  tst.w r1, #1
  beq LBB24_5
  movw r0, #4000
  movt r0, #0
  .loc 1 386 3
Ltmp114:
  ldr r1, [sp, #40]
  movw r2, :lower16:(L_OBJC_SELECTOR_REFERENCES_59-(LPC24_15+4))
  movt r2, :upper16:(L_OBJC_SELECTOR_REFERENCES_59-(LPC24_15+4))
LPC24_15:
  add r2, pc
  ldr r2, [r2]
  str r0, [sp, #8]
  mov r0, r1
  mov r1, r2
  blx _objc_msgSend
  .loc 1 388 3
  ldr r0, [sp, #40]
  movw r1, :lower16:(_OBJC_IVAR_$_LifeGrid.generation-(LPC24_16+4))
  movt r1, :upper16:(_OBJC_IVAR_$_LifeGrid.generation-(LPC24_16+4))
LPC24_16:
  add r1, pc
  ldr r1, [r1]
  mov r2, r1
  ldr r2, [r0, r2]
  adds r2, #1
  str r2, [r0, r1]
  .loc 1 390 64
  ldr r0, [sp, #40]
Ltmp115:
  movw r1, :lower16:(L_OBJC_SELECTOR_REFERENCES_64-(LPC24_17+4))
  movt r1, :upper16:(L_OBJC_SELECTOR_REFERENCES_64-(LPC24_17+4))
LPC24_17:
  add r1, pc
  ldr r1, [r1]
  blx _objc_msgSend
  vmov s0, r0
  vmov.f64 d1, d16
  vldr.32 s1, LCPI24_14
  vmov.f64 d2, d1
  vmov.f32 s4, s1
  vmov.f64 d3, d1
  vmov.f32 s6, s0
  vmul.f32 d16, d3, d2
  vmov.f64 d2, d16
  vmov.f32 s0, s4
  vmov.f32 s2, s0
  vcvt.u32.f32 d16, d1
  vmov.f64 d1, d16
  vmov.f32 s0, s2
  vmov r0, s0
  str r0, [sp, #32]
  .loc 1 392 9
  ldr r0, [sp, #32]
  ldr r1, [sp, #8]
  cmp r0, r1
  blo LBB24_4
  .loc 1 393 13
Ltmp116:
  ldr r0, [sp, #32]
  bl _usleep
  str r0, [sp, #4]
Ltmp117:
LBB24_4:
  .loc 1 395 2
  b LBB24_1
Ltmp118:
LBB24_5:
  .loc 1 397 41
  ldr.n r0, LCPI24_13
LPC24_13:
  add r0, pc
  ldr r0, [r0]
  ldr.n r1, LCPI24_12
LPC24_12:
  add r1, pc
  ldr r1, [r1]
  blx _objc_msgSend
  ldr.n r1, LCPI24_11
LPC24_11:
  add r1, pc
  ldr r1, [r1]
  blx _objc_msgSend
  str r0, [sp, #28]
  .loc 1 399 69
  ldr r0, [sp, #28]
  ldr r1, [sp, #40]
  ldr.n r2, LCPI24_10
LPC24_10:
  add r2, pc
  ldr r2, [r2]
  add r1, r2
  ldr r2, [r1]
  ldr.n r1, LCPI24_9
LPC24_9:
  add r1, pc
  ldr r1, [r1]
  blx _objc_msgSend
  vmov d16, r0, r1
  vstr.64 d16, [sp, #16]
  .loc 1 401 2
  ldr r0, [sp, #40]
  ldr.n r1, LCPI24_8
LPC24_8:
  add r1, pc
  ldr r1, [r1]
  add r0, r1
  ldr r0, [r0]
  ldr.n r1, LCPI24_7
LPC24_7:
  add r1, pc
  ldr r1, [r1]
  blx _objc_msgSend
  .loc 1 402 2
  ldr r0, [sp, #28]
  ldr.n r1, LCPI24_6
LPC24_6:
  add r1, pc
  ldr r1, [r1]
  blx _objc_msgSend
  .loc 1 404 2
  ldr r0, [sp, #40]
  ldr.n r1, LCPI24_5
LPC24_5:
  add r1, pc
  ldr r1, [r1]
  add r0, r1
  ldr r0, [r0]
  vmov s0, r0
  vcvt.f32.s32 s0, s0
  vcvt.f64.f32 d16, s0
  vldr.64 d17, [sp, #16]
  vdiv.f64 d16, d16, d17
  vmov r1, r2, d16
  movw r0, :lower16:(L_.str69-(LPC24_18+4))
  movt r0, :upper16:(L_.str69-(LPC24_18+4))
LPC24_18:
  add r0, pc
  blx _printf
  .loc 1 407 1
  str r0, [sp]
  subs r4, r7, #4
  mov sp, r4
  pop {r4, r7, pc}
  .align 2
LCPI24_0:
  .long _OBJC_IVAR_$_LifeGrid.generation-(LPC24_0+4)
  .align 2
LCPI24_1:
  .long _OBJC_IVAR_$_LifeGrid.startDate-(LPC24_1+4)
  .align 2
LCPI24_2:
  .long L_OBJC_SELECTOR_REFERENCES_-(LPC24_2+4)
  .align 2
LCPI24_3:
  .long L_OBJC_SELECTOR_REFERENCES_7-(LPC24_3+4)
  .align 2
LCPI24_4:
  .long L_OBJC_CLASSLIST_REFERENCES_$_62-(LPC24_4+4)
  .align 2
LCPI24_5:
  .long _OBJC_IVAR_$_LifeGrid.generation-(LPC24_5+4)
  .align 2
LCPI24_6:
  .long L_OBJC_SELECTOR_REFERENCES_68-(LPC24_6+4)
  .align 2
LCPI24_7:
  .long L_OBJC_SELECTOR_REFERENCES_68-(LPC24_7+4)
  .align 2
LCPI24_8:
  .long _OBJC_IVAR_$_LifeGrid.startDate-(LPC24_8+4)
  .align 2
LCPI24_9:
  .long L_OBJC_SELECTOR_REFERENCES_66-(LPC24_9+4)
  .align 2
LCPI24_10:
  .long _OBJC_IVAR_$_LifeGrid.startDate-(LPC24_10+4)
  .align 2
LCPI24_11:
  .long L_OBJC_SELECTOR_REFERENCES_-(LPC24_11+4)
  .align 2
LCPI24_12:
  .long L_OBJC_SELECTOR_REFERENCES_7-(LPC24_12+4)
  .align 2
LCPI24_13:
  .long L_OBJC_CLASSLIST_REFERENCES_$_62-(LPC24_13+4)
  .align 2
LCPI24_14:
  .long 1223959552
Ltmp119:
Lfunc_end24:
Ltmp120:
Leh_func_end24:

Man I gotta catch some ZZZs, I'm totally thrashed. I'll do my best
just to take a little nap, but the chances are pretty good I won't get
outta bed unilt Monday!

cmp.w *looks* like a 16-bit comparison with an immediate constant, but
in reality the constant is twelve bits. The ARM and Thumb instruction
sets have quite severe restrictions on the allowed ranges of immediate
values because the richness of the ARM and Thumb instruction set makes
it hard to find enough bits in the instruction words to express a
wider range of immediate values than is presently possible.

This is not quite right. It does have a 12-bit immediate field, but it is decomposed into an 8-bit base immediate and a 4-bit right-rotate value. Your example of #4000 is encoded as a base value of 0xfa and a rotate of 0xe, which is correct.

I don't know what the "it hs" instruction does. I suspect that's
where the problem lies, but "it" is a very common word, and "hs" is
quite common as well, as it is a frequent mispelling for "has".
Perhaps someone who knows Thumb-2 assembly better than I do could
comment.

The IT instruction is how you express predication in Thumb2. Unlike ARM instructions, where the predicate is part of the instruction, Thumb2 instructions use IT to set the predicates for following instructions. In this case, it applies the "hs" predicate to the subsequent call to _usleep. I'd have to double check, but I'm fairly confident that the hs condition code is equivalent to >= for integers.

--Owen

All of my regression testing so far has had my speed slider set to its
maximum, so the useconds_t has always been precisely zero.

Maybe there's something special about zero that would not be the case
for an integer ranging from 1 to 3999.

I'll check that out, but not right now, I'm about to pass right out,
but I don't want to because I am even more hungry than I am tired.

There is a pizza joint within walking distance of my apartment. I'm
going to go stuff myself silly.

I am thinking now that the Thumb-2 machine code generated by Apple's
LLVM 3.0+svn compiler is correct, but that when a Release build is
generated, the use of conditional machine instructions confuses GDB.

I boiled my apparently erroneous source down to:

- (void) mySleep: (int)sleepTime
{
    if ( sleepTime >= 4000 ){
        usleep( sleepTime );
    }

    return;
}

If I set a breakpoint on the usleep call, the breakpoint will be hit
but usleep() will not actually be called.

I verified this by passing in 5000. When I do, I can step down into
the shared libraries that lead to the actual system call.

I have also tried calling a regular subroutine of my own instead of a
system call. When sleepTime is zero, my subroutine is not called even
though GDB shows my program stepping over the subroutine call.

I would say that this is a bug in GDB's source code debugger, in that
it ought to consider the basic block of the if only to be entered if
the test succeeds. GDB's assembly code debugger does the right thing.

In any case this is not a bug in LLVM. If it is to be considered a
bug in GDB it would be really hard to fix.

I am thinking now that the Thumb-2 machine code generated by Apple's
LLVM 3.0+svn compiler is correct, but that when a Release build is
generated, the use of conditional machine instructions confuses GDB.

I boiled my apparently erroneous source down to:

- (void) mySleep: (int)sleepTime
{
   if ( sleepTime >= 4000 ){
       usleep( sleepTime );
   }

   return;
}

If I set a breakpoint on the usleep call, the breakpoint will be hit
but usleep() will not actually be called.

Debugging info in optimized build are not reliable.
The compiler reorder instructions, change the program flow, and do a lot of other transformation that alter them.

I verified this by passing in 5000. When I do, I can step down into
the shared libraries that lead to the actual system call.

I have also tried calling a regular subroutine of my own instead of a
system call. When sleepTime is zero, my subroutine is not called even
though GDB shows my program stepping over the subroutine call.

I would say that this is a bug in GDB's source code debugger, in that
it ought to consider the basic block of the if only to be entered if
the test succeeds. GDB's assembly code debugger does the right thing.

In any case this is not a bug in LLVM.

Why not ? Generating valid debug info is the role of the compiler, not the debugger.

If it is to be considered a bug in GDB it would be really hard to fix.

-- Jean-Daniel