Determining if it is a bug in Clang or my code

Hello,

I’ve been working on an operating-system as a hobby project for a few months now, and finally tried converting the codebase to Clang. While the code compiles fine, I now get a surprising Interrupt 0x06 “Invalid Opcode” being fired when executing some C++ code. (Running either under bochs and qemu.)

The same codebase works fine when compiled under GCC, the faulty code(?) only happens when built under Clang. The part of the code isn’t invoking any assembly (inline or otherwise), and the C++ itself is fairly straight forward. (See below.)

My questions are:

  • Is the fact this interrupt firing while executing pure C++ code proof of a compiler bug? Or is it possible to generate invalid opcodes through using undefined C++ behavior, etc.

  • How likely is it that this is actually a Clang codegen bug? I worked on the F# compiler at Microsoft, and know quite well that “I found a bug in the compiler” is latin for “I don’t understand how this language works”; though the fact the code is triggering a CPU interrupt is concerning.

  • Would it be worth while to distill my os-project down and try to produce a minimal repro? If so, where should I send it?

As for the code itself, the problem seems to be occurring in my implementation of printf. I’m using variadic template arguments to do it in a typesafe way. Is “variadic template codegen for 32-bit” a particularly rough area of the Clang/LLVM codebase?

Any insight would be appreciated.

Thanks,
-Chris

Hello,

I've been working on an operating-system as a hobby project for a few
months now, and finally tried converting the codebase to Clang. While
the code compiles fine, I now get a surprising Interrupt 0x06 "Invalid
Opcode" being fired when executing some C++ code. (Running either under
bochs and qemu.)

The same codebase works fine when compiled under GCC, the faulty code(?)
only happens when built under Clang. The part of the code isn't invoking
any assembly (inline or otherwise), and the C++ itself is fairly
straight forward. (See below.)

My questions are:

- Is the fact this interrupt firing while executing pure C++ code proof
of a compiler bug? Or is it possible to generate invalid opcodes through
using undefined C++ behavior, etc.

A call of an uninitialized function pointer might do that.

- How likely is it that this is actually a Clang codegen bug? I worked
on the F# compiler at Microsoft, and know quite well that "I found a bug
in the compiler" is latin for "I don't understand how this language
works"; though the fact the code is triggering a CPU interrupt is
concerning.

Heh.

- Would it be worth while to distill my os-project down and try to
produce a minimal repro? If so, where should I send it?

Please do. Here, and llvm.org/bugs are the right places to send it.

As for the code itself, the problem seems to be occurring in my
implementation of printf. I'm using variadic template arguments to do it
in a typesafe way. Is "variadic template codegen for 32-bit" a
particularly rough area of the Clang/LLVM codebase?

Not really.

Jon

Can you see if the the invalid opcode is ud2a? Clang sometimes emits those after encountering certain kinds of UB.

I think the most common is falling off the end of a function that is supposed to return a value. If you compile your code with -Wreturn-type (it should be on by default), you should see a warning for it, but not if the code is in a system header. There are other more obscure ways to trigger it, like passing non-POD objects through a vararg pack, but my money is on -Wreturn-type.

It seems rather more likely that the problem is in your code than the compiler - although not IMPOSSIBLE that it’s the compiler. The reason I say it’s unlikely is that it would require that the compiler actually generates an invalid opcode for your processor [aside from UD2A as described by Reid] - although one possibility is of course that your actual compiled code is built for some fancy processor, and your emulated processor is “not quite so advanced” [e.g. SSE or AVX instructions are generated, where these are not supported in the emulation]. Use the “most basic” processor model that is supported to see if that is the case (e.g. i486 or x86-64)

Of course, getting things messed up or using uninitialized data as code-pointers (vtables, function pointers, etc), returning on an overwritten stack and/or getting memory mapping wrong can easily lead to this exact scenario (and the exact effect will depend on the exact code generated, meaning different compilers will also give different results). I have worked with operating systems long enough to have seen just about every of these - my “favourite” is when I’ve messed up the arguments to memset, and the code starts to fill from len for destination bytes [and destination is the other side of the code vs. len], and eventually you overwrite the current code with the fill value - which usually leads to REALLY bizarre crashes. Or forgetting to set the A20 gate in x86-machines, so your first MB overlaps/shadows second MB (and if you cleverly put your code at 0…1MB, and the stack, heap, etc at 1…2MB, it gives really nice undecipherable crashes when you execute the just allocated memory or just allocated stack…)

Figuring out “where you came from” (call-stack) as well as “what is this instruction” would be a good place to start looking at “why is this going wrong”.

Thank you for the responses. I spent a lot of time trying to distill the kernel into a simple repro, but the problem didn’t turn out to be so complicated after all. Mats had it right with the emulated processor being “not quite so advanced”. The trick was adding -mno-mmx -mno-sse to the build. Now the clang-backed compiles work just fine.