Libclang and Objective-C headers

Hi,

I'm trying to use libclang to gather information about Objective-C headers. Unfortunately, clang appears to default to C for .h files (probably sensible) and to refer to ignore -x options passed in to clang_createTranslationUnitFromSourceFile() and clang_parseTranslationUnit().

Is there an existing way of overriding the default language choice when parsing headers (I assume that the XCode people needed this for editing Objective-C headers?), and if so what is it?

Alternatively, would it be possible for libclang to automatically detect the language of header files? It's relatively easy to tell C[++] and Objective-C[++] apart (spot an #import or an @-keyword anywhere), but telling C and C++ apart is probably a bit harder.

David

-- Sent from my STANTEC-ZEBRA

Header files inherently have less meaning until they are actually included. How about create a fake t.m that includes the header, and analyze that?

For one thing, I want to perform syntax highlighting on the header - including it would make that vastly more complicated because I'd have to create two translation units per header and track them independently.

I'm not sure what you mean by 'less meaning' either. There is no semantic difference between a header and a one-line source file that just includes that header, as a compilation unit. Neither C, C++, nor Objective-C makes any differentiation between source files and header files - they're purely programmer a convention, as is the file extension given to source files, so libclang should not really be treating them differently. If someone chooses to name their source files .cplusplus or .objectivec then this should work as well, as long as a language is specified (although, going back to my original point, libclang apparently refuses to accept -x flags, and I don't know why).

David

For one thing, I want to perform syntax highlighting on the header - including it would make that vastly more complicated because I'd have to create two translation units per header and track them independently.

I'm not certain why this would be vastly more complicated, and why you would need two translation units per header, etc. I'm not interested in arguing with you on this point; you likely have your own design requirements that I'm not aware of.

I'm not sure what you mean by 'less meaning' either. There is no semantic difference between a header and a one-line source file that just includes that header, as a compilation unit. Neither C, C++, nor Objective-C makes any differentiation between source files and header files - they're purely programmer a convention, as is the file extension given to source files, so libclang should not really be treating them differently. If someone chooses to name their source files .cplusplus or .objectivec then this should work as well, as long as a language is specified (although, going back to my original point, libclang apparently refuses to accept -x flags, and I don't know why).

Sorry David, my point was too terse. I am fully aware that there is no semantic difference from the parser's perspective. My point was more that there may be a semantic difference because headers can have different semantic meaning depending on the context on how they are used.

For example, consider the header, "iostream":

  #include <iostream>

How do we know how to interpret this header? Since it is included in a C++ translation unit, the compiler interprets the text as C++ code, but in the absence of this context the compiler has no knowledge of why this file should be interpreted in this way. Thus context is critical to establishing the semantics of the header file.

C++ aside, consider a header file that looks like this:

#ifdef FOO
...
#else
...
#endif

Suppose FOO is defined by the including source file, or by *another header* that includes that header. Sometimes headers aren't fully self-contained; they are intended to be used in the context of other headers. One can argue whether that is good or bad practice, but it does happen. Thus context matters here as well; in this case, it can change what is actually valid source code defined by the header, and what isn't.

Another example is a header that can be included by both an Objective-C or Objective-C++ source file. In one case it is Objective-C code, and in the other case it is Objective-C++ code. The difference can really matter in some cases.

Ultimately, no source file, whether it is a header or a vanilla .c file, has any intrinsic semantics until is parsed, in the full proper context, by the compiler. That context includes all the -I and -D flags, etc. Since headers are inherently tied to the preprocessor, they can be used in all sorts of ways, and so ultimately their semantics are determined by the file that includes the header. That said, you can often skirt the issue with approximations, but if you are interested in replicating the semantics of a header that is seen by the actual developer in their project, headers need to be analyzed in the context of how they are actually used.

Despite my other comments, that particular issue is likely a bug in the driver.

David,

Back pedaling a bit, I'll freely admit that my second sentence here wasn't really a super useful suggestion on it's own. In the back of my mind were the other issues I mentioned with regards to header files, and wanted to draw then out. The main thing I can see with creating a fake t.m is that if the header relies on being #import'ed (for automatic inclusion guards) or otherwise included in a specific way it may impact in subtle ways how a header is processed. At any rate, the most accurate syntax highlighting will come from seeing how a header is actually used within a project by actual sources. That, however, might not be necessary for your application.

Hi,

I'm trying to use libclang to gather information about Objective-C headers. Unfortunately, clang appears to default to C for .h files (probably sensible) and to refer to ignore -x options passed in to clang_createTranslationUnitFromSourceFile() and clang_parseTranslationUnit().

At least for clang proper, it seems to work as expected:

% cat t.h
@interface I
- m;
@end

% cat t.c
#include "t.h"

@implementation I
- m { return 0; }
@end

% clang -x objective-c -fsyntax-only t.c
%

If it does not happen in the context of libclang use, it is probably a bug (not knowing if it is intentional for some reasons).

- fariborz