I’ll be happy to run it for you. Do you want Intel64, x86 or both? The Intel compiler doesn’t have a –Oz option. It has –Os and –O[123].

Also, FWIW, one of the Intel compiler experts on BT will comment on this thread, and on our rules for BT usage later this afternoon.

Kevin B. Smith

Thanks Kevin, Sanjay sent me the icc out put.

icc generates testq for 0-30 and btq for 31-63.
That seems like a small bug in the bit 31 case.

Why is it generating testq rather than test for bits 0-30?
Does the assembler relax that into test, no REX prefix?

I'm mostly interested in the X86_64 (Intel64?) case.
Reading the man pages, icc and llvm -Os are similar.

icc generates testq for 0-30 and btq for 31-63.
That seems like a small bug in the bit 31 case.

You can’t use testq for bit 31, because the immediate gets sign-extended. You *can* use the 32b form, of course.

Is that a bug in the current clang then?

.globl _IsBitSet31
.align 4, 0x90
_IsBitSet31: ## @IsBitSet31
## BB#0:
pushq %rbp
.cfi_def_cfa_offset 16
.cfi_offset %rbp, -16
movq %rsp, %rbp
.cfi_def_cfa_register %rbp
movl $42, %ecx
subq %rdi, %rcx
leaq 47(%rdi), %rax
testl $-2147483648, %edi ## imm = 0xFFFFFFFF80000000
cmoveq %rcx, %rax
popq %rbp

No, because it uses the 32b form “testl”.

– Steve

‘Bug’ may be too strong a word. Their emitted code is correct
else it would have failed one of their test suites.

They could use TESTL as well for bit 31.
So I’ll be curious to see what Intel’s guidelines are.
They don’t mention this in their Optimization manual.

Right now, I’m convinced that TEST should be used when it’s available.

But even with partial flag hazards, BT is not worth avoiding for 32-63.
For the -Oz case it’s worth emitting for bits 8-63.

When targeting modern processors, it will usually be best to use “BT reg, imm” for testing bits 32-63. The alternative implementations will all use multiple instructions, will encode larger, and will usually run slower. For testing bits 0-31, “TEST reg, imm” is preferred unless you are looking to minimize code size at the expense of performance, in which case you would still want to use BT in the cases where it encodes smaller. As Fiona pointed out, there are some processors where TEST has slightly better low level performance properties than BT.

Regarding the partial EFLAGS write, modern OOO processors independently rename the carry flag, et al, so this is no longer a problem. I would have to check with the processor architects to figure out the exact processor generation where this problem was first fixed, but it was roughly a decade ago. Steve’s Agner Fog quote, “BT, BTC, BTR, and BTS change the carry flag but leave the other flags unchanged. This causes a false dependence on the previous value of the flags and costs an extra μop. Use TEST, AND, OR and XOR instead of these instructions.”, was in reference to the Pentium 4.

FWIW, the Intel compiler itself doesn’t quite get all the “test bit” sequences right either. We will use a 64-bit TEST for testing bits 0-30, but we ought to be using a 32-bit test to avoid the REX byte. Similarly, for testing bit 31, we should use a 32-bit TEST rather than the “BT reg, 31” that we will currently generate. I intend to get those cases fixed.

Also, this thread hasn’t been focusing on the “BT reg, reg”, and “BT[CSR] reg, reg” instruction forms, but these are good instructions to use where possible. The only instructions in the BT family that you really want to avoid at all costs are the memory forms. You never want to generate those. The multi-instruction expansions will almost always be faster.

David Kreitzer

IA-32/Intel64 Code Generation

Intel Compilers