[Proposal]: Devirtualizing local methods (messages) in final class in Objective-C

Devirtualizing local methods (messages) in final class in Objective-C

Local Methods (passed to ‘self’) of a final class may be devirtualized safely. In Objective-C it is possible to define methods in a translation unit without declaring them in the interface in a header file (see: localMethod and NoDecl_localMethod in the example). The local methods are usually implemented to abstract away some functionality and is only invoked via self.

It is possible to ‘devirtualize’ calls to local methods. Devirtualization is a compiler technique where a dynamic dispatch (message passed to an object) is converted to a direct function call when the compiler can determine the callee (for a specific callsite) at compile time.
In the example shown, calls to ‘NoDecl_localMethod’, and ‘localMethod’ maybe safely devirtualized.

//------------------------------------ Header file (a.h)
attribute((objc_subclassing_restricted))
@interface MyObject : NSObject

  • (void)publicMethod:(int)i idType:(id)ExtObject;
    @end

//------------------------------------ Source file (a.mm)
@interface MyObject ()
-(void)localMethod;
@end

@implementation MyObject
// no declaration required.

  • (void)NoDecl_localMethod {
    NSLog(@“NoDecl_localMethod”);
    }

  • (void)localMethod {
    NSLog(@“localMethod”);
    }

  • (void)publicMethod:(int)i
    idType:(id)ExtObject {
    [self NoDecl_localMethod]; // devirtualize to MyObject::NoDecl_localMethod
    [self localMethod]; // devirtualize to MyObject::localMethod
    [ExtObject NoDecl_localMethod]; // don’t devirtualize
    }
    @end
    //------------------------------------

Pass organization and algorithm:
Overview of the steps: clang front-end

  • Find all the local methods (methods which are defined only in .m/.mm file
  • set the attribute of local methods in the front end to something like Attribute::ObjCLocalMethod to help identify them in llvm IR.

llvm middle-end (IPO Module level pass)

  • Get list of local methods in a module by iterating over the method list
  • Get method name (at a objc_msgSend callsite) by parsing selector name and inspecting the global data structures set up for tracking function pointer for each method.
  • Devirtualize for only those callsites which are messages to ‘self’

How to track messages to self:

  • Verify if the caller is a method (prefixed with “\01”). This seems like a hack but it is consistent with clang frontend’s translation of Objective-C declaration.
  • For a method the first argument is the pointer to self unless the method has StRet.

This is a performance optimization in sense that it avoids a call to ‘objc_msgSend’. It saves a little bit of code size in some cases. It is possible to devirtualize even more messages for ‘final’ classes but that would require some more engineering effort e.g., Inferring the type of Receiver object statically, synthesizing the declaration of the callee in the caller’s translation unit etc.

The current implementation only devirtualizes local messages passed to self, as the type-inference of receiver object is not required and the declaration+definition is readily available in the same translation unit. Moreover, if we can guarantee (via design decision) that a subclass does not override a local method of a parent class, then all local methods (invoked via self) can still be devirtualized with this approach.

I would like for Objective-C experts to point out any gotchas with the this approach or feedback for further improvement.

Note:
Final class was introduced in Objective-C in: https://reviews.llvm.org/D25993

Objective-C allows method implementations to be replaced dynamically. It is not obvious that objc_subclassing_restricted in any way prevents either this or similar techniques where a dynamic subclass is formed in order to change the implementation of a method on a particular instance.

Objective-C also allows the class of a particular instance to be changed, although I think it would be fair to restrict this to ensure that the new class is related in some reasonable way to the old class — probably that it must have the old class (ignoring dynamic subclasses) as a superclass.

There are some other concerns that apply to devirtualization in general, but not to your restricted case:

  • The receiver of a message send is generally allowed to be nil.
  • Objective-C is not strongly typed. Instance-variable accesses have C-like restrictions, but otherwise static type information is defined to only be meaningful for determining the type signature for message sends, not for determining the actual dynamic type. For example, several major libraries rely on the ability to create “proxy” objects that transparently forward most messages to another object.

Ultimately, while I think devirtualization would be very powerful in Objective-C, I think it really needs to be opted-in in some more explicit way.

John.

Objective-C allows method implementations to be replaced dynamically. It is not obvious that `objc_subclassing_restricted` in any > way prevents either this or similar techniques where a dynamic subclass is formed in order to change the implementation of a
method on a particular instance.

Can we revise the semantics of `objc_subclassing_restricted` to disallow subclassing dynamically?

Objective-C also allows the class of a particular instance to be changed, although I think it would be fair to restrict this to ensure
that the new class is related in some reasonable way to the old class — probably that it must have the old class (ignoring dynamic > subclasses) as a superclass.

Yes, I guess while utilizing some of the super-dynamic behavior of Objective-C, it may be tricky to take advantage of devirutalization. We may need to have proper documentation etc. to make the programmers aware of the expected behavior as a result of this optimization.

There are some other concerns that apply to devirtualization in general, but not to your restricted case:
- The receiver of a message send is generally allowed to be `nil`.

That was the reason why I restricted the optimization to work only for messages passed to self because, IIUC, self cannot be nil.

- Objective-C is not strongly typed. Instance-variable accesses have C-like restrictions, but otherwise static type information is
defined to only be meaningful for determining the type signature for message sends, not for determining the actual dynamic type. > For example, several major libraries rely on the ability to create "proxy" objects that transparently forward most messages to
another object.

Ultimately, while I think devirtualization would be very powerful in Objective-C, I think it really needs to be opted-in in some more
explicit way.

We can have a compiler flag to enable Objective-C devirtualization. Additionally we can introduce an __attribute__((objc_local_method)) to enable finer grained control of devirtualization.

Thanks for the review, I can start putting the patches for review if you think the overall approach seems reasonable.
-Aditya

Objective-C allows method implementations to be replaced dynamically. It is not obvious that objc_subclassing_restricted in any > way prevents either this or similar techniques where a dynamic subclass is formed in order to change the implementation of a
method on a particular instance.

Can we revise the semantics of objc_subclassing_restricted to disallow subclassing dynamically?

No. In general, we can’t take existing attributes and make them mean something much stronger.

Objective-C also allows the class of a particular instance to be changed, although I think it would be fair to restrict this to ensure
that the new class is related in some reasonable way to the old class — probably that it must have the old class (ignoring dynamic > subclasses) as a superclass.

Yes, I guess while utilizing some of the super-dynamic behavior of Objective-C, it may be tricky to take advantage of devirutalization. We may need to have proper documentation etc. to make the programmers aware of the expected behavior as a result of this optimization.

It needs to be opt-in. You can’t really fix “we changed the basic semantics of the language” with documentation.

There are some other concerns that apply to devirtualization in general, but not to your restricted case:

  • The receiver of a message send is generally allowed to be nil.

That was the reason why I restricted the optimization to work only for messages passed to self because, IIUC, self cannot be nil.

That seems reasonable. It also prevents proxy objects.

  • Objective-C is not strongly typed. Instance-variable accesses have C-like restrictions, but otherwise static type information is
    defined to only be meaningful for determining the type signature for message sends, not for determining the actual dynamic type. > For example, several major libraries rely on the ability to create “proxy” objects that transparently forward most messages to
    another object.

Ultimately, while I think devirtualization would be very powerful in Objective-C, I think it really needs to be opted-in in some more
explicit way.

We can have a compiler flag to enable Objective-C devirtualization. Additionally we can introduce an attribute((objc_local_method)) to enable finer grained control of devirtualization.

I don’t think an optimization that globally miscompiles Objective-C’s dynamic features unless an attribute is used to suppress it is ever going to be acceptable, even if it’s an opt-in flag. So given that the optimization has to be opt-in with some new attribute, I don’t think the flag adds anything.

Thanks for the review, I can start putting the patches for review if you think the overall approach seems reasonable.

I haven’t really thought about the technical approach. Assorted thoughts:- The attribute could maybe just be the “inline” keyword, which both seems appropriate and is not otherwise used on Objective-C methods.

  • You need some way to make synthesized property accessors inlinable.
  • For the specific purpose of inlining privately-declared methods of the same class from the @implementation, do you even need a middle-end pass? Can’t this just be straight-up done by the frontend when it sees a use of the method? (Incidentally, there’s yet another reason you really need this to be opt-in for it to work — Objective-C overriding is defined by whether the selectors match, so a method can be legally overridden even if it’s not declared outside the @implementation.)
  • If we can reliably inline all the callers of the method, should we even add it to the method table? Maybe it should only be in the method table if it’s declared outside the @implementation (or overrides such a method) — users can always add such a declaration in a “private” class extension if they need it in the method table for dynamic reasons but don’t want to make the method easily usable externally.
  • If a method is declared outside the @implementation, is it actually reasonable to assume that this attribute on its definition means it isn’t ever overridden?
  • Is there a viable path from this to a more traditional devirtualization optimization? Is such an optimization ever going to be feasible in Objective-C without a slew of new attributes?

Also, regardless of the technical approach, while you are welcome to work on an implementation, nobody here can actually approve adding it to Clang on our own. Clang has a policy for language extensions that you can read here:
https://clang.llvm.org/get_involved.html
It is the longstanding practice of the Clang project to treat Apple as Objective-C’s effective governing body; Objective-C is not an open language. So to add a major feature like this to Clang, you will need to propose it to Apple’s internal language committee for Objective-C. Assuming we can get this into a satisfactory shape, I would be happy to serve as your proxy on this, but I do have to warn you that it can be a slow and frustratingly opaque process.

But I do think it’s an interesting feature that would solve a significant problem for Objective-C programmers. Currently, Objective-C programmers who need inlining for performance have to rewrite their methods as static functions, which requires them to rewrite the method calls as well. Being able to easily turn those message sends into direct function calls would be great.

John.

I don’t think I agree here. One of the core design principles of Objective-C (somewhat violated by the property notation) is that new semantics must introduce new syntax. Currently if you see foo(x), then you know without looking at the definition of foo that it’s a direct C-style function call. If you see [x foo] then you know that it’s late-bound dynamic dispatch that can be overridden with subclassing or reflection, can be proxies, and so on.

That said, I have two comments on the proposal:

1. The GNUstep runtime has a mechanism that allows safe inlining and inline caching, with a cheap check that the inlined version is still valid. I prototyped an LLVM optimisation that did this automatically about 10 years ago, but given the lack of open source Objective-C codebases where message dispatch is actually the bottleneck, I didn’t work on it much. If Apple wants this feature, then it’s not tremendously difficult to add to the runtime and there’s a good 40 years of research into efficient dispatch mechanisms for Smalltalk-family languages that can be used to create a design. I picked one point in a very large trade-off space, which may or may not be the correct one.

2. A feature like this would probably make more sense as an attribute on the caller, rather than the callee. For example, something like [+x foo] or [@direct(x) foo]. That would allow callers to explicitly declare that they don’t mind being broken in the presence of reflection, without requiring the same breakage at every call site.

David