[RFC] Optimizing Code Size of objc_direct by Exposing Function Symbols and Moving Nil Checks to Thunks

Motivation

The initial implementation of __attribute__((objc_direct)) (hereafter objc_direct) was designed with an emphasis on ABI stability. While this design achieved its ABI goal, it has led to drawbacks, primarily related to code size and linkage. Currently, we have the following three main issues:

  1. Code Bloat and Poor Optimization: The nil-checking logic (and for class methods, class realization logic) is duplicated in every objc_direct method implementation, contributing to binary size. For instance methods, the callee always performs a nil check, even at non-null call sites (like self), adding unnecessary code,

  2. Poor Linkage: The hidden symbol (internal linkage) prevents developers from calling the direct method from other link units, forcing them to write manual wrapper thunks, further increasing code size,

  3. Complexity on Swift interop: We intend to implement @objcDirect on the Swift side as well. But a lot of implementation difficulties will be placed on SILGen if a direct method needs to do nil check (e.g. the exposed API have Optional<MyClass> type, and the Swift version of the nil check need to emit code to unwrap that Optional before calling the actual Swift function.

This proposal aims to change the implementation strategy and resolve the three issues above. By exposing the true implementation symbol and moving the responsibility of nil-checking to a caller-generated thunk. This way, we can reduce code size by eliminating unnecessary nil checks and make bridging to @objcDirect to Swift easier.

Design

The core of this design is to split the responsibilities for objc_direct calls. The callee will emit a public-facing, external implementation, and the caller will decide whether to call it directly or via a newly generated thunk. We plan to initially gate the feature by a new compiler flag -fobjc-direct-caller-thunks for experiment, before rolling it out to the default behavior.

This design differs slightly for instance methods and class methods.

Instance Methods

Current Design

A method -[MyClass myMethod] is emitted with a hidden symbol (e.g., @"\01-[MyClass myMethod]"). This function contains the self == nil check. All call sites call this hidden symbol, and the nil check is executed every time.

Proposed Design

The responsibility of performing nil check is split between the true implementation and a caller-side thunk.

1. True Implementation (Callee):

  • Symbol: Emitted with its public, mangled name (e.g., @"-[MyClass myMethod]").

  • Linkage: external.

  • Logic: Contains only the method’s implementation. It performs no nil check.

2. Call Site (Caller): When Clang encounters [receiver myMethod]:

  • Case 1: Receiver is Non-Null. If static analysis proves the receiver is non-null (e.g., self), Clang emits a direct call to the public symbol @"-[MyClass myMethod]".

  • Case 2: Receiver is Nullable. If the receiver may be nil, Clang emits a call to a caller-side thunk.

3. The Thunk (Generated by Caller):

  • Symbol: Generated with a suffix (e.g., @"-[MyClass myMethod]_thunk").

  • Linkage: linkonce_odr, this is so that when multiple callers in different link units generate identical thunks, linker don’t complain.

  • Logic:

    1. Performs the self == nil check. If nil, returns a zero-initialized value.

    2. If non-nil, performs a musttail call to the true implementation (@"-[MyClass myMethod]").

This musttail call is critical for ARC correctness, as it makes the thunk “invisible” to the ARC contract.

Corner Case: Variadic Methods (va_arg)

Variadic methods are excluded from this change. This is because forwarding their arguments is fundamentally incompatible with the thunk’s design: our design requires a musttail call for ARC correctness, which forbids any stack management (like va_start/va_end) in the thunk.

To maintain 100% backward compatibility, objc_direct can still be attached to a variadic method, but it will have the old, hidden ABI (hidden \01 symbol with internal nil check).

Class Methods

Class methods introduce the separate problem of class realization.

Current Design

A method +[MyClass myMethod] is emitted with a hidden symbol (e.g., @"\01+[MyClass myMethod]"). This function performs:

  1. Class Realization: It calls [self self] to ensure the class object is loaded.

  2. Nil Check: If the class object is weakly linked, it also checks if the class object exists at runtime, i.e. nil check.

Proposed Design

We will follow a similar pattern, preserving the existing logic for when to nil-check.

1. True Implementation (Callee):

  • Symbol: Emitted with its public, mangled name (e.g., @"+[MyClass myMethod]").

  • Linkage: external

  • Logic: The function does not have the nil check nor class realization. It contained only the method’s true implementation.

2. The Thunk (Generated by Caller). This logic is the same as instance methods. However, it will do a class realization before the nil check. Nil check is only carried out if we cannot reason if class object is non-null, which is only when the class is weakly linked (isWeakLinkedClass(OID))

3. Call Site (Caller): When Clang encounters a call to +[MyClass myMethod], the caller needs to reason if the class object can be null (isWeakLinkedClass(OID)), and whether the class has been realized. If both conditions are met, dispatch to the true implementation.

We need some static analysis heuristics to determine if a class object has been realized. Simple heuristics includes: If a call to a method in the same class is dominating the current call, the class object must have been realized by the previous call. Extra care needs to be applied here: even if call to [Parent foo] dominates call to [Child foo], the call to [Child foo] still needs to go through class realization to make sure Child is realized. While static types can be reasoned easily, when the type is id, things are not trivial.

Previous Approaches and Why This is Better

Previous attempts (like #126639) explored a “two-symbol” approach where the callee module would emit both the old hidden symbol (for ABI) and a new exposed symbol.

This “two-symbol” design is more complex:

  • It requires the callee to emit two versions, bloating the callee module,

  • The caller must be “smart” enough to know which of the two symbols to call,

  • The swift frontend still needs to emit a thunk.

The design proposed here improves on it in these ways:

  • Single Implementation Source: The callee only emits one function: the “true implementation” (with linkonce_odr). This is simple and clean.

  • Caller-Side Generation: The nil-checking thunk is generated by the caller and only if needed. A module that only makes non-null calls will generate zero thunks, achieving maximum optimization.

  • Better Code Size: We trade N (N = number of objc_direct methods) duplicated nil-checks for M (M = number of objc_direct methods that are actually called nullably) thunks. Since these M are strictly smaller than N, this is a significant improvement in code size and efficiency.

  • Better Swift interoperability: With this patch, making a swift attribute @objcDirect easier to implement: existing thunk generated by @objc can be reused with little change.

cc @rjmccall @sharonxu @AdamCmiel

Edit: per discussion with @rjmccall , the linkage of the true implementation doesn’t need to be linkonce_odr, only the thunk needs to be linkonce_odr
Edit 2: Update class method’s thunk and dispatch logic after discussion with John

Objective-C methods are not normally allowed to be implemented in headers. Is there a reason you’re changing that?

not normally allowed

I understand that this is a bad practice, but I don’t think there hard compiler error to guard against it. There could be code out in the wild that are defined in the headers. If we use external linkage, those code won’t be able to compile and break backward compatibility.

We could make it external, and use the trial period to figure out if it will break old code.

It’s not diagnosed by the compiler, but it is diagnosed by the linker by virtue of the function being given strong linkage. (Hidden visibility doesn’t change this — you can still only have one strong definition within a linkage unit.) This is the exact same mechanism that C uses to enforce its language rule that you can’t define functions and objects multiple times. Objective-C is not formally standardized, but it effectively has and has always had a similar rule about not defining an @implementation multiple times.

C++ does permit functions to be defined in headers, but you have to mark them explicitly inline unless they’re defined in a class definition.

Oh nice! I’ll change the design to external then.

1 Like

Okay, this is looking pretty good. I have a couple more minor suggestions.

The first is that I wouldn’t make variadic methods an exception to the ABI; I would just inline the nil check into the caller. We already have to do this sometimes for a variety of reasons, like consumed arguments under ARC and methods with results that are returned indirectly.

The second is that I think class realization should be part of the thunk for class methods. This leaves us with room for optimizations later. (As above, we’ll have to inline this into the caller when calling a variadic class method.)

I am open to that.

The second is that I think class realization should be part of the thunk for class methods.

The caller can inference the class object is non-null in most of the cases and the thunk will be most likely skipped. But the caller can’t inference whether the class object is realized. If we move the realization to the thunk, the true implementation still has to realize it just to be safe. Do you think that’s acceptable?

I’m not following why the main implementation would have to realize the class “just to be safe”. It would have a precondition of the class being realized and non-nil.

We could immediately take advantage of that when calling a method on self in a class method, and in the long run we could have a hoisting optimization when e.g. you have a class method call in a loop.

I realized where the misunderstanding is coming from. Let me clarify to see we are on the same page:

At compiler time we can fairly easily reason that a class object is non-nil (i.e. as long as the class is not weakly linked), but its not trivial to reason if a class object has been realized because of lazy realization.

Because of this nature, at current design, pre-condition check only validates if the class object is non-nil, and in most cases is it not, so the caller can call true implementation. This design disregards whether the object is realized or not, and ask the true implementation “safely” realize it all the time. Below is a sketch of what I am proposing (proposal A)

// Nil check is most likely skipped and removed
+[Foo foo]_thunk(...) {
    if (self == nil) return 0;
    else return tailcall [Foo foo](...)
}
// True implementation with class realization
// Because we can reason class object is most likely non-nil, most callers directly call this
+[Foo foo] (...) {
    [Self self];
    // True implementation
}

IIUC, what you are proposing is that we move both class realization and nil check to the precondition thunk (proposal B):

+[Foo foo]_thunk(...) {
    // Class realization
    [Self self];
#ifdef CLASS_IS_WEAKLY_LINKED
    // Nil check is most likely skipped and removed
    if (self == nil) return 0;
    else 
#endif
    // Class object is guaranteed to be non-nil at this point
    return tailcall [Foo foo](...)
}
// Because we cannot reason if a class object is realized, most callers has to call [Foo foo]_thunk instead
+[Foo foo](...) {
    // True implementation
}

Because we can’t easily reason at static time if a class object is realized, most callers have to call _thunk instead. But with the tailcall, both implementation should have similar code at assembly level.

The pro for Proposal B is that it opens up space to remove repeated realization if we have good enough static reasoning (e.g. in a loop, consecutive class method calls in the same branch).

My main concern is that this would make dispatch logic a bit complicated and fragmented: for instance methods, caller dispatch to true implementation as long as we can reason self != nil; but for class methods, caller dispatch to true implementation only if we can reason self != nil && self->isRealized().

I think both proposals are good options, I can go for B if you believe that’s better, but let me know if we are on the same page first.

Yes, I think that’s a correct summary. I do think that B is the better choice and leaves us more room for optimization in the future.

1 Like

For people interested in the feature and would like to help with the review, I’ve sent a stack of PRs to make code review easier:

#170616 Flags set up
#170617 Code refactoring to ease later reviews
#170618 Thunk generation
#170619 Optimizations, some class objects can be known to be realized