Obj-C type encoding for bool larger than expected

Hi,

I’m building an Objective-C class to generate method type encodings at runtime. I’m using the Clang source (specifically ASTContext::getObjCEncodingForMethodDecl) as documentation for what those encodings mean.

The Obj-C method encoding is of the format:

{ return type } { total size of all arguments } { argument type } { argument offset }

The argument sizes are not aligned. If I have a function which takes a packed struct { uint32, uint16, uint8 }, I can get a signature like this:

v23@0:8{MyStruct=ISC}16

Where the total size of all arguments is 23. The implicit self and _cmd parameters take 16 bytes (on a 64-bit system), so the struct is down for 7 bytes, as expected.

The problem I’m having is with the C bool type (type encoding: “B”). As far as I can tell, it should take 1 byte. The ASTContext gets the bool width from the TargetInfo class, which defines single-byte booleans on everything but PPC. So, if I have a function:

- (void)doSomething: (bool)arg;

I would expect an encoding string like this: v17@0:8B16. The self and _cmd parameters take 16 bytes, and the bool takes 1 byte.
However, what I’m getting is this: v20@0:8B16. The bool seems to be taking up 4 bytes, as if it were an Int32!

I can’t find an explanation for this in the Clang source. sizeof(bool) returns 1, and Apple's NSGetSizeAndAlignment( “B”, … ) also tells me that an encoded should take 1 byte. So why is Clang’s type encoding claiming a bool takes 4 bytes? The only thing I could think of was some kind of padding to provide alignment, but type encodings don’t seem to need alignment and I can’t find that happening in the Clang source, either.

Karl

The numbers don't mean what you think they mean. The first number is the size of the argframe, the remaining numbers are the offsets in the argframe where the values are stored. So, v20@0:8B16 means:

20 byte argframe
id parameter at offset 0
SEL parameter at offset 8 (no padding after the id)
BOOL parameter at offset 16 (no padding after the SEL)
4 extra bytes because your architecture requires 4-byte aligned arg frames.

These numbers are largely meaningless on a modern architecture that passes most parameters in registers. They originate with the original Objective-C runtime for m86k, where all arguments were passed on the stack. The Apple Legacy runtime has a few forwarding hooks that deal with an argframe_t, as does the GCC runtime. Modern runtimes remove these, because they haven't done what developers expect for quite a while.

David

David,

I think you’re slightly mistaken - or Clang isn’t exactly doing what you say. I’ve been investigating a bit more; this is where clang encodes the argframe size:

CharUnits PtrSize = getTypeSizeInChars(VoidPtrTy);

// The first two arguments (self and _cmd) are pointers; account for
// their size.

CharUnits ParmOffset = 2 * PtrSize;

for (ObjCMethodDecl::param_const_iterator PI = Decl->param_begin(),
E = Decl->sel_param_end(); PI != E; ++PI) {
QualType PType = (*PI)->getType();
CharUnits sz = getObjCEncodingTypeSize(PType);
if (sz.isZero())
continue;

assert (sz.isPositive() &&
“getObjCEncodingForMethodDecl - Incomplete param type”);
ParmOffset += sz;
}
S += charUnitsToString(ParmOffset);
S += “@0:”;
S += charUnitsToString(PtrSize);

As you can see, there is no consideration for argframe alignments. As I noted before:

// 7-byte struct

struct MyStruct {
uint32_t a;
uint16_t b;
uint8_t c;
} attribute((packed));

  • (void)doSomething: (struct MyStruct)arg;

is encoded by Clang as v23@0:8{MyStruct=ISC}16.

23 is a prime number, so clearly there’s no alignment rounding going on. There’s nothing in the source to indicate that it’s dependent on my target (Intel 64-bit), either.

So having ruled out alignment padding, my next port of call was getObjCEncodingTypeSize(); that’s clearly where we must be getting a 4-byte size for a bool:

CharUnits ASTContext::getObjCEncodingTypeSize(QualType type) const {
if (!type->isIncompleteArrayType() && type->isIncompleteType())
return CharUnits::Zero();

CharUnits sz = getTypeSizeInChars(type);

// Make all integer and enum types at least as large as an int
if (sz.isPositive() && type->isIntegralOrEnumerationType())
sz = std::max(sz, getTypeSizeInChars(IntTy));
// Treat arrays as pointers, since that’s how they’re passed in.
else if (type->isArrayType())
sz = getTypeSizeInChars(VoidPtrTy);
return sz;
}

So my first thought was that bool was being treated as an integer type and its size was getting expanded to the size of an int. After some digging (I’ve never worked with the Clang source before), I found this in Type.h:

inline bool Type::isIntegralOrEnumerationType() const {
if (const BuiltinType *BT = dyn_cast(CanonicalType))
return BT->getKind() >= BuiltinType::Bool &&
BT->getKind() <= BuiltinType::Int128;

// Check for a complete enum type; incomplete enum types are not properly an
// enumeration type in the sense required here.
if (const EnumType *ET = dyn_cast(CanonicalType))
return IsEnumDeclComplete(ET->getDecl());

return false;
}

So, it appears that bool is being treated as an integer type, and hence will be expanded to the size of an int (i.e. 4 bytes on 32-bit/ILP64/LLP64 targets) when encoded. That explains the odd size of bool.

Now, my inclination is that this is a bug in clang, or some kind of legacy thing which isn’t documented. Apple’s Foundation framework has a function NSGetSizeAndAlignment which returns the size of a type encoding, and it reckons that an encoded bool should take 1 byte. So:

  • (void)doSomething: (BOOL)arg;

should be encoded as v17@0:8B16.

Is this a bug? Should Clang really be treating bools as ints?

Karl

Yes, because that's how the calling convention requires bools to be passed. This is the space in the argframe. Your packed struct, in contrast, will not be promoted.

As I said, these numbers tell you the offset in a legacy struct that is an ad-hoc variation of va_args originally designed for the m68k. If you are using them for anything after about 1995, then you're probably doing it wrong.

David