Geeze, after I sent my message I came up with a few more...
Here's the -W warnings I usually (try) to run with:
-Wmissing-prototypes -Wreturn-type -Wformat -Wunused-parameter -Wunused-variable -Wunused-value -Wuninitialized -Wshadow -Wsign-compare -Wshorten-64-to-32 -Wextra -Winit-self -Wsequence-point -Wswitch-default -Wstrict-aliasing=2 -Wundef -Wpointer-arith -Wbad-function-cast -Wstrict-prototypes -Wmissing-declarations -Wredundant-decls -Wunreachable-code -Wcast-align -Wdiscard-qual -Wcast-qual -Wstrict-overflow=5
"32 <-> 64 bit issues / potential problems".
The warning flag '-Wshorten-64-to-32' is a good start. '-Wconversion' is also useful when switching to 64 bits. clang can leverage 'meta-information' against the problem too, like using the knowledge that NS*Integer can switch sizes between 32 and 64 bits, but a 32 bit in assigned/passed to a NSInteger won't scale the same way. clang can help catch this potential gotchas early before they become difficult to undo/fix problems is these types of things can go silently undetected and default compiler warning levels. '-Wcast-align' is also something that can be checked/validated.
64 -> 32 bit issues is definitely a good source of checks, especially for code that must run for different archs. Your example about NSInteger is especially a good one.
I actually recently implemented a simple check relating to this topic relating to the use of CFNumberCreate. If one isn't careful with the use of this function, on a 64-bit architecture CFNumberCreate can actually fail to initialize some of the bits of the freshly created CFNumber because the integer size is greater than the integer provided by the programmer. I think a lot of little simple checks like these would be both (a) relatively easy to implement and (b) potentially catch a lot of subtle bugs.
The design of the static analysis library is to help make the implementation of these checks relatively straightforward without any deep program analysis knowledge. I myself won't be able to implement all of these checks, and hopefully as the tool evolves others will feel comfortable in implementing some of these checks as well.
"Cross architecture issues"
I can't think of any off the top of my head, but collecting possible cross architecture issues patterns would be helpful.
I think this basically relates to the previous issue: APIs and type definitions can have different invariants or properties on different archs. Some of these invariants could be checked readily with static analysis.
"Possible restrict and const qualification recommendations (and validation)"
This really requires deep inter-procedure analysis, but if it's available, then clang might be able to reason certain things about the inter-procedure effects and possibilities. Const can sometimes lead to better code generation, but the real wins are usually possible with restrict. If deep inspection is possible, then some degree of validation of the use of a restrict qualified pointer is probably possible as well.
I don't have a reference off the top of my head, but I do know there was some research on doing example what you suggest. Accurately inferring const and restrict may require a fairly precise points-to analysis, which gets tricky with all the messiness of C. That said, this is something that could potentially be done, at least in some localized cases.
Actually, since you're obviously deep down in the guts of the grammar and compiler interactions, maybe you can offer an opinion on the following: Historically Objective-C was effectively nothing more than a fancy pre-processor front end the C compiler. In fact, there was often a trivial one to one mapping from an Objective-C statement to a plain C statement.
@interface MyClass : ParentClass { char *buffer; } @end
becomes something like
typedef struct { /* ParentClass definitions */ char *buffer; } MyClass;When you get right down to it, there's nothing special about a 'class', it's literally nothing more than struct.
Now, object oriented programming is built on polymorphic abilities, each class inherits all of its super classes methods/ivars/etc. So if we have the following:
@interface MyClass : ParentClass { char *buffer; } @end
@interface MyMutableClass : MyClass { int mutationCount; } @endIn code, we refer to an instantion of one of these objects with:
MyClass *myClassObj;
MyMutableClass *myMutableClassObj;OO programing (and objc) allows for the following to take place:
myClassObj = myMutableClassObj;
because MyMutableClass is a subclass of MyClass. No problem, right?
I'm of the opinion that this is actually a problem. The problem has nothing to do with the (correct) OO design paradigm or any particular conceptual fault, but it has to do with C.
Objective-C was designed a long time ago, in the pre-ANSI K&R days as a matter of fact. Such assignments were possible under older K&R and (I think, but may be wrong) ANSI rules. It was frowned upon, wasn't terribly good style, but you could do it and for most architectures this isn't a problem because the compiler essentially treated all pointers as equivalents. Of course, the compiler is free to perform pointer alignment due to the assignment, but this never happened in practice (at least not for any of the main architectures that are still with us today).
The @interface definitions is literally like the following statements:
typedef struct { char *buffer; } MyClass;
typedef struct { char *buffer; int mutationCount; } MyMutableClass;Or, if we really wanted to, we could drop the typedef and use declare it as any other struct. Pointers to 'instantiated objects' in code are either identical to their Objective-C counterparts if typedefs are used, or something like the following if structs are used:
struct MyClass *myClassObj;
struct MyMutableClass *myMutableClassObj;Fast forward to C99 and consider the same statement:
myClassObj = myMutableClassObj;
In C99, this statement is expressly forbidden as 'pointers of one type may not point to a different type (except void)'. Only pointers of the same type may alias each other. This is the 'strict aliasing' rule(s).
So... there's a bit of a conflict. Such pointer aliasing is permitted under the concepts of object oriented programming, but it is expressly forbidden under C99 rules. From a purely compiler perspective, when you prototype a method as
- (NSArray *)someMethod;
you literally mean that you are returning a type of NSArray *, and not any of its subclasses.
I'm not certain that the C99 rules apply in this way to Objective-C types, since the Objective-C type system is completely outside the scope of C99. The fact that Objective-C was originally implemented as a layer above C just means the compiler had less information to go on. One can easily get around the problem you mentioned by having the C implementation of Objective-C just use void* for all Objective-C object references (or, as you point out later, simple disable strict aliasing rules for Objective-C code).
It is, in fact, an error to return a NSMutableArray in a method that's prototyped to return an NSArray due to C pointer aliasing rules. The 'id' type is the closest thing that Objective-C has to a 'generic object pointer type', so if a method wants to return a pointer to an object of more than one type, it really should declare the return type as 'id'. Again, this is due to the C pointer aliasing rules rather than any OO conceptual rules.
Again, I'm not certain how much C99's aliasing rules apply to Objective-C object references. Objective-C doesn't have a formal specification akin to C99, so the specification (if you want to call it that) is whatever the current compiler implementation allows.
There are others on this list that can comment on this particular issue with much more authority than myself.
It really starts to become a problem when you turn on the optimizer and it begins to do optimizations that are dependent on this aliasing invariant. When I realized that this could actually be a serious, very subtle problem, and started digging I found evidence to support it. For example, '-fstrict-aliasing' is disabled on Apples GCC for ObjC code.
Interesting. I think this illustrates my point that the strict aliasing rules in C99 don't really apply to Objective-C, at least in the implementation provided by GCC. This is clearly a deliberate choice, likely to avoid the issues you mentioned.
Using '-fast' on .m files causes the compiler to emit 'cc1obj: warning: command line option "-fast" is valid for C/C++ but not for ObjC'
I'm of the opinion that Objective-C and C are so closely linked together that one can not simply say 'Pointers can not aliasing to different types. Except for ObjC class type pointer, they can alias to any of their subclasses.'
It gets even more interesting when one considers categories, which allow one to implement essentially "open types" in Objective-C. The highly dynamic nature of Objective-C allows one to change the methods implemented by a class at runtime, which can essentially change the subtyping relationships between objects at runtime. In that sense, the class hierarchy is only a set of guidelines for subtyping relationships between Objective-C objects. From that observation, I'm not certain that any conservative strict aliasing assumptions could be made by the compiler concerning Objective-C objects.
It just not possible from a practical stand point, ESPECIALLY in something like GCC where it's pragmatically impossible to separate out the two languages.
I'm not an expert on the GCC IR where the optimizer does much of its work, but the GCC frontend has a notion of the Objective-C type system, and uses that information to issue warnings in some cases. For example:
#include <Cocoa/Cocoa.h>
void foo() {
NSString* s;
NSObject* o;
o = s;
s = o;
}
gcc emits warning for the assignment of 'o' to 's' because the object referred to by 'o' may not be a subclass of NSString:
/tmp/t.m:8: warning: assignment from distinct Objective-C type
If one could use the Objective-C class hierarchy information to make conservative assumptions for use with strict aliasing optimizations, I'm not certain why you think gcc couldn't use that information. The point I made above, however, means that even having the class hierarchy information available may not be enough make such assumptions.
- Ted