Additional annotations for static analysis (Objective C designated initializers)

It has been a while (4 years?) since I have submitted any code to
LLVM, but I like to keep up with you guys. I am constantly using
checker, and it has occurred to me there were some very valuable
checks that could be done with a small amount of additional source
code annotation. GCC already has limited support for some things, like
_attribute__((sentinel(0,1))), but the combination of clang and the
analysis framework could allow for much more interesting things. For
example, if we added an __attribute__((designated_initializer)) we
could mark the designated initializer of an objective C class in its
header in order to make sure subclasses flow through the initializers
properly.

Now, normally before I do any real work I like to get people's
opinions, but clang is such a pleasure to work with I really have no
reason not to show up to the table with something. Attached is a small
patch to clang that adds the designated_initializer attribute, extends
the ObjCInterfaceDecl to know about designated initializers, and
implements a error if a class declares more than one designated
initializer. I am also attaching quick test case I wrote for the
error, though I have not actually managed to rig it up in the test
harness yet.

Assuming people are okay with this, it seems like it should be pretty
trivial to extend checker so that it actually can evaluate the rules
for designated initializers
<http://developer.apple.com/documentation/Cocoa/Conceptual/ObjectiveC/Articles/chapter_13_section_3.html&gt;\.
While it does require annotating headers, marking one method per class
in my framework headers does not seem prohibitive, and to me this
feels like the sort of check that would like find lots and lots of
errors if the system headers were annotated;

Anyway, just thought I would check and see what people thought.

Louis

clang-designated_initializer.patch (5.62 KB)

designated-initializer-annotation.m (414 Bytes)

Designated initializer check is something I've been missing for some time, and I think this is a great addition. While marking designated initializer with an attribute is a nice, explicit way which would serve also as documentation of the code, would it be useful to infer "designatedness" of init methods in classes which do not use this attribute? Any init... method which contains a call to [super init...] could be assumed to be a designated initializer. Then same checks could be apply to those cases as well (i.e. there must be only one designated initializer, it must be invoked by every other init... method via 'self' etc).

- Nikita

Hi Louis,

This sounds like a great thing to check, but it isn't an error for a class to have multiple designated initializers. NSView's designated initializers are -initWithFrame: and -initWithCoder:, for example.

-Ken

Thanks for the various bits of feedback everyone gave me, a lot of
people seemed to like the basic idea of the patch, no one had much in
the way of implementation thoughts, I will take that either to mean my
code is perfect, or more likely most of the people who can critique
the code are busy trying to push out the next LLVM release :wink:

Anyway, having though through a few of the semantic issues:

1) Yes, multiple designated initializers are allowed, though it is
uncommon, but making it an error is wrong
2) Inference is not feasible. If you have all the source it might be,
but the analyzer has to do it against headers for libraries it does
not have the sources for. For instance, if I compile a subclass of
NSView I do not have the implementation of NSView, which means I
cannot infer NSView's designated initializers, which means I can't
make sure my subclass calls them. That makes me think annotation is
definitely the correct way to go.

Okay, so, the final rules seem to boil down to this:

1) If a class declares any designated initializers make sure all
methods named either or one of the designated initializers, or call
one of the designated initializers
2) if a class and its superclass both declare designated initializers
make sure that the classes designated initializers call the
superclasses designated initializers

Okay, so having said that, here is a patch to add the attribute. I
removed the patch to the ObjCInterfaceDecl, a dynamically generate the
the info during analysis. I could save a little work by adding one
bool to the ObjCInterfaceDecl, but something does not feel totally
clean to me about adding info that is only useful during static
analysis during all compiles. On the other hand the attribute is
always there, so that might just be in my head.

I am also attaching a patch to the analyzer to use that info and apply
the above rules. The patch simply makes sure the appropriate msg sends
appear in the appropriate methods, it does not perform any path
sensitive analysis to guarantee control flows through them, so it
might miss some potential errors even if annotated. I figured this was
a good enough start to catch some stuff, it lets us debate if adding
an attribute is okay, and I don't have to figure out how the all those
bits of the analyzer work right off the bat.

Also, since there is no path sensitive analysis I just mark the init
method that is wrong, not the particular call site that is in
violation.

Attached is a test file that correctly generates the following errors:

$ ./Debug/bin/clang -warn-objc-designated-initializer -verify
designated-initializer-annotation.m
Warnings seen but not expected:
  Line 14: The initializer method '-[TestObject init]' does not call a
designated initializer
  Line 22: The initializer method '-[TestObject init3]' does not call
a designated initializer
  Line 38: The designated initializer method '-[TestSubclass init4]'
does not call a designated initializer for superclass 'TestObject'

Anyway, let me know what you think. I think it should be stylistically
consistent with the rest of the codebase, but C++ is a little rusty
(Imagine that, someone trying to enhance the Objective C analyzer
doesn't write much C++ :wink:

Louis

designated_init_attribute.patch (3.02 KB)

designated_init_analyzer.patch (7.26 KB)

designated-initializer-annotation.m (523 Bytes)

Here's some code review! First, designated_initializer_attribute.patch:

+static void HandleDesignatedInitializerAttr(Decl *d, const AttributeList &Attr, Sema &S) {
+ // check the attribute arguments.
+ if (Attr.getNumArgs() != 0) {
+ S.Diag(Attr.getLoc(), diag::err_attribute_wrong_number_arguments,
+ std::string("0"));
+ return;
+ }

Hi Louis,

I'm having difficult parsing this sentence. Could you extrapolate a little more what you meant? I'm just trying to clearly understand the rules you wish to implement.

Ted

Here's some comments on the actual check. Comments are inline. One meta comment: the LLVM coding convention is to stay within 80 columns; there are a few cases in the patch where the lines are slightly too long.

Index: Driver/AnalysisConsumer.cpp

I should have time to make the changes in next couple of days, some
quick thoughts

Anyway, let me know what you think. I think it should be stylistically
consistent with the rest of the codebase, but C++ is a little rusty
(Imagine that, someone trying to enhance the Objective C analyzer
doesn't write much C++ :wink:

Here's some comments on the actual check. Comments are inline. One meta
comment: the LLVM coding convention is to stay within 80 columns; there are
a few cases in the patch where the lines are slightly too long.

I need to write a script to check my patches for that, I have been
spoiled by large monitors for far too long.

+ ObjCInterfaceDecl* D = ID->getClassInterface();
+ ObjCInterfaceDecl* C = D->getSuperClass();

Looks good.

+ bool hasDI = false;
+ bool superHasDI = false;

Looks good.
+
+ // Check if self or super has declared designated initializers
+ for (ObjCInterfaceDecl::instmeth_iterator I=D->instmeth_begin(),
+ E=D->instmeth_end(); I!=E; ++I) {
+
+ ObjCMethodDecl* M = *I;
+ if (M->getAttr<DesignatedInitializerAttr>()) {
+ hasDI = true;
+ break;
+ }
+ }

Looks great.

+
+ if (C) {
+ for (ObjCInterfaceDecl::instmeth_iterator I=C->instmeth_begin(),
+ E=C->instmeth_end(); I!=E; ++I) {
+
+ ObjCMethodDecl* M = *I;
+ if (M->getAttr<DesignatedInitializerAttr>()) {
+ superHasDI = true;
+ break;
+ }
+ }
+ }

I actually just threw that in at the last minute because I was trying
to be as unobtrusive as possible in my AST modifications. I think
ultimately it would be better to just have a flag in the
ObjCInterfaceDecl, and to set it if/when we encounter the attribute on
one of its methods, and just call the getter here. If I am going to
factor this out anyway I would rather switch to doing that, unless you
think it is a bad idea.

+
+ if (C) {
+ for (ObjCInterfaceDecl::instmeth_iterator I=C->instmeth_begin(),
+ E=C->instmeth_end(); I!=E; ++I) {
+
+ ObjCMethodDecl* M = *I;
+ if (M->getAttr<DesignatedInitializerAttr>()) {
+ superHasDI = true;
+ break;
+ }
+ }
+ }

I actually just threw that in at the last minute because I was trying
to be as unobtrusive as possible in my AST modifications.

Hi Louis,

I'm actually not certain what you meant by "that" in this last sentence. Could you elucidate a little more? (I'm a little tired right now, so it isn't obvious to me).

I think
ultimately it would be better to just have a flag in the
ObjCInterfaceDecl,

I'm a little confused. Why would you need a flag in ObjCInterfaceDecl? It really does seem to me that all the information you need can be stored as an attribute. The nice thing about the way we do attributes is that we only only pay a cost for them when we use them. By storing extra flags in the Decl objects that are unused 99% of the time we bloat the ASTs. There will always be other things to tack on to the ASTs; we like to keep the core data structures lean (when possible) and when we need auxiliary data we just record them in maps. The static analyzer itself assumes that the ASTs are immutable, and we should probably be passing Decl* and Stmt* values around with a 'const' qualifier (i.e., the static analyzer will never modify the ASTs, only build other data structures on top of them).

and to set it if/when we encounter the attribute on
one of its methods, and just call the getter here. If I am going to
factor this out anyway I would rather switch to doing that, unless you
think it is a bad idea.

I'm honestly a little unclear about the shape of your idea (please just clarify; again I'm a little tired right now). If the idea is to store flags in ObjCInterfaceDecl that would be twiddled by an analysis pass, my belief is that really contrary to the clean design we're going for, not only in the static analyzer, but Clang in general. Having the ASTs store information specific to an analyzer pass is a layering violation; the ASTs should have no knowledge of their consumers, and the ASTs are meant to serve many masters.

I honestly think the patch that you have has a nice clean design. What problem are you trying to solve by changing it?

Cheers,
Ted

Hi Louis,

I can see now that you want to use memoization (dynamic programming) to reduce the cost of looking at a class or a superclass, particularly if a superclass is inspected repeatedly.

That said, I don't believe that adding an attribute or bit to ObjCInterfaceDecl is the right way to go. Here's why.

The "designated_initializer" attribute, as a syntactic and semantic construct, only makes sense for method declarations. Adding it to ObjCInterfaceDecl conceptually makes little sense. In your proposed patch, it would mean that "this Objective-C class contains a method with the "designated_initializer" attribute.

Personally I don't think this is very clean; this information can easily be reconstructed later (lazily) and stored in a side map for use by clients that want it. It also makes things easier for other clients of the ASTs, e.g. refactoring clients or editors, that want to modify the existing ASTs. For example, it's not conceptually difficult to imagine a refactoring operation that removes/adds methods, and that these methods can have attributes affixed to them. By affixing the "designated_initializer" attribute to an ObjCInterfaceDecl (or by having an extra bit in the object itself) this means that all of these clients would have to update this information as well if they happen to remove or add a designated initializer. By keeping things like this lazy, we obviate these problems by design.

I think if you want to do this memoization, we should put it in the checker itself. Right now the checker interface is not designed to be stateful, but we certainly change that.

Another thing to keep in mind is the nature of your optimization and whether or not it really matters. Your particular check will be applied to every @implementation that occurs within a .m file. How many classes will that be? One? Two? Ten? For each one of those class implementations, you will scan its methods and the methods of its superclass. The cost is really insignificant.

The only way you would get any benefits of memoization is by caching this information across multiple .m files. Since we don't perform cross-translation unit analysis (yet) this just isn't possible yet, and even if we could it's not clear that it would matter.

Thoughts?

Ted