Mapping field names to GEP indices in clang-compiled C

I'm using clang to compile functions and types written in C into LLVM
IR so that I can inline calls and avoid hand-writing the StructType
definitions. The types clang generates are packed structs instead of
ordinary structs. So for

struct Foo {
  char x;
  int* y;
};

clang produces the type <{i8, i8, i8, i8, i32*}> instead of {i8,
i32*}. To extract the 'y' field from the unpacked struct, I'd use "GEP
..., 1", and it'd be right on all architectures. To extract 'y' from
the packed struct that clang produces, I have to use "GEP ..., 4" on
x86-32, "GEP ..., 8" on x86-64, and who knows what on other
architectures.

Is there a way to get clang to emit a C++-readable mapping from field
names to GEP offsets? Or some other way to avoid special-casing these
offsets for each architecture? If not, what would be the easiest way
to add such an ability?

Thanks,
Jeffrey (and the Unladen Swallow team)

First-off, this sort of question is more appropriate for cfe-dev;
please direct any follow-up questions there.

clang's current behavior here is a bug, but it's a low priority to fix
because the generated IR isn't incorrect, just somewhat difficult to
read.

The only completely reliable solution I can think of is generating
something like "int Foo_offsets = { offsetof(struct Foo, x),
offsetof(struct Foo, y)};", then use some bitcasting to do the
arithmetic.

-Eli

Is there a way to get clang to emit a C++-readable mapping from field
names to GEP offsets? Or some other way to avoid special-casing these
offsets for each architecture? If not, what would be the easiest way
to add such an ability?

First-off, this sort of question is more appropriate for cfe-dev;
please direct any follow-up questions there.

Sorry for the mis-directed email.

clang's current behavior here is a bug, but it's a low priority to fix
because the generated IR isn't incorrect, just somewhat difficult to
read.

Oh, good. Is there a bug number for that? I'd assumed that it was
intentional to give clang more control over struct layout, but if it's
accidental I'll look forward to a fix. If I get annoyed enough at it,
I may even try to fix it myself. Do you have any pointers to code I
should look at to fix it?

The only completely reliable solution I can think of is generating
something like "int Foo_offsets = { offsetof(struct Foo, x),
offsetof(struct Foo, y)};", then use some bitcasting to do the
arithmetic.

Yeah, that's a good point. I'd lose LLVM's type checking but at least
the code would work.

I don't think there's a bug filed. The relevant code is
RecordOrganizer::layoutStructFields in lib/CodeGen/CodeGenTypes.cpp.

If I recall correctly, I originally wrote the code in question. I
wanted to make the first implementation as simple as possible, and
therefore I only wrote the general case. The way to fix this is
basically to add detection for structs where the amount of padding
LLVM would insert is never more than the necessary amount, and use an
unpacked struct in those cases.

-Eli

Instead of fixing this in clang, I took your and Chris's other advice
and used offsetof. The patch is
Google Code Archive - Long-term storage for Google Code Project Hosting. and looks
like the following. Sorry for the Python-specific code in there;
hopefully the meaning is clear for other projects.

unsigned int
_PyTypeBuilder_GetFieldIndexFromOffset(
    const llvm::StructType *type, size_t offset)
{
    static const llvm::TargetData *const target_data =
        PyGlobalLlvmData::Get()->getExecutionEngine()->getTargetData();
    const llvm::StructLayout *layout = target_data->getStructLayout(type);
    unsigned int index = layout->getElementContainingOffset(offset);
    assert(layout->getElementOffset(index) == offset &&
           "offset must be at start of element");
    return index;
}

#define DEFINE_FIELD(TYPE, FIELD_NAME) \
    static Value *FIELD_NAME(IRBuilder<> &builder, Value *ptr) { \
        assert(ptr->getType() == PyTypeBuilder<TYPE*>::get() && \
               "*ptr must be of type " #TYPE); \
        static const unsigned int index = \
            _PyTypeBuilder_GetFieldIndexFromOffset( \
                PyTypeBuilder<TYPE>::get(), \
                offsetof(TYPE, FIELD_NAME)); \
        return builder.CreateStructGEP(ptr, index, #FIELD_NAME); \
    }
...
    DEFINE_FIELD(PyListObject, ob_size)
    DEFINE_FIELD(PyListObject, ob_item)
    DEFINE_FIELD(PyListObject, allocated)
...

where PyListObject was defined as:

typedef struct {
    PyObject_VAR_HEAD /* includes ob_size along with other common fields */
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;

Thanks again for the help!
Jeffrey