Proposed minor refactoring of the diag subsystem

Hi,

the TextDiagPrinter in libDriver causes a crash if one forgets to (or doesn't know that one has to) call setHeaderSearch() on it. A simple fix is to check if headersearch is valid in TextDiagnostics::isInSystemHeader(). However, this really is a hack: A DiagClient without a HeaderSearch is always a rogue object, since it doesn't implement isInSystemHeader() correctly (case in point: Running clang with `-html-diags=html` produces a different set of warnings as running clang without that option.

So I propose a HeaderSearch object should always be mandatory for a DiagClient (which means HeaderSearch, HeaderMap and DirectoryLookup move from Lex to Basic). Then, isInSystemHeader() can be implemented in the base DiagClient. FormatDiagnostic() should also be there; most clients need this anyways.

In return, DiagClient should be optional for Diagnostic. If the client is 0, then no diagnostics are reported (the ErrorOccurred flag is of cause still set). That this is the "right thing" is already hinted at by the change Ted landed today that change DiagClient from a reference to a pointer. This also makes constructing a Preprocessor object easier.

The attached patch implements the proposed changes. This removes some code duplication from PathDiagnostic, and with this `-html-diags` outputs the same warnings as command-line clang. If clients _want_ to see warnings in system headers, they can still override isInSystemHeader() to always return false.

Comments? :slight_smile:

Nico

"Why change one line when you can change a thousand?" :stuck_out_tongue:

diag_refactor.patch (66.5 KB)

Hi Nico,

I like the idea of where this is going. A few comments:

(a) HeaderSearch doesn't belong in libBasic.

HeaderSearch is tied to the Preprocessor, and hence belongs in libLex, not libBasic. This seems like a good design abstraction that I would like to preserve.

(b) DiagnosticClient should not always require a Preprocessor/HeaderSearch.

Consider the case where we use serialized ASTs as opposed to the original source. While all parsing/Sema related diagnostics will be emitted when we have a Preprocessor object, later tasks that operate equally well on serialized ASTs or freshly parsed source (e.g., code generation) may not have a Preprocessor or a HeaderSearch available. I'm not certain what the operating model should be here, but it seems to be that DiagnosticClients should be able to provide some functionality in the absence of HeaderSearch.

I do have a couple suggestions to address (a) and (b), and you and others are free to disagree with them. (I'm not so sure of them myself)

To address (a), we can create a new libDiagnostic that layers on top of libBasic and libLex. This would be nice because we could also move the PathDiagnostic code out of libAnalysis into libDiagnostic, as opposed to having Diagnostic and PathDiagnostic in two libraries. The potential negative here is that we get another library. I'm not certain if that is a problem; some may be concerned about a proliferation of too many libraries.

To address (b), we can have HeaderSearch be passed as an ctor argument to DiagnosticClient, but also allow for HeaderSearch to be NULL. This complicates some of the code for DiagnosticClient, but it allows DiagnosticClients to be used when such information isn't available. This means that methods like getHeaderSearch() should return a HeaderSearch* (which can be NULL) instead of a HeaderSearch&.

Thoughts?

Ted

Hi,

Hi,

the TextDiagPrinter in libDriver causes a crash if one forgets to (or doesn't know that one has to) call setHeaderSearch() on it. A simple fix is to check if headersearch is valid in TextDiagnostics::isInSystemHeader(). However, this really is a hack: A DiagClient without a HeaderSearch is always a rogue object, since it doesn't implement isInSystemHeader() correctly (case in point: Running clang with `-html-diags=html` produces a different set of warnings as running clang without that option.

So I propose a HeaderSearch object should always be mandatory for a DiagClient (which means HeaderSearch, HeaderMap and DirectoryLookup move from Lex to Basic). Then, isInSystemHeader() can be implemented in the base DiagClient. FormatDiagnostic() should also be there; most clients need this anyways.

In return, DiagClient should be optional for Diagnostic. If the client is 0, then no diagnostics are reported (the ErrorOccurred flag is of cause still set). That this is the "right thing" is already hinted at by the change Ted landed today that change DiagClient from a reference to a pointer. This also makes constructing a Preprocessor object easier.

The attached patch implements the proposed changes. This removes some code duplication from PathDiagnostic, and with this `-html-diags` outputs the same warnings as command-line clang. If clients _want_ to see warnings in system headers, they can still override isInSystemHeader() to always return false.

Comments? :slight_smile:

Hi Nico,

I like the idea of where this is going. A few comments:

(a) HeaderSearch doesn't belong in libBasic.

HeaderSearch is tied to the Preprocessor, and hence belongs in libLex, not libBasic.

Why? HeaderSearch does not use anything from the Preprocessor. To me, it's not more Lex-y than IdentifierInfo, which is in Lex, too.

(b) DiagnosticClient should not always require a Preprocessor/HeaderSearch.

Consider the case where we use serialized ASTs as opposed to the original source. While all parsing/Sema related diagnostics will be emitted when we have a Preprocessor object, later tasks that operate equally well on serialized ASTs or freshly parsed source (e.g., code generation) may not have a Preprocessor or a HeaderSearch available. I'm not certain what the operating model should be here, but it seems to be that DiagnosticClients should be able to provide some functionality in the absence of HeaderSearch.

If isInPrivateHeader() is not meaningful for all DiagClients, it should be removed from DiagClient. Perhaps Diag should have two methods, ReportWithFile(), that takes a HeaderSearch, and Report(). Then Lex, Parser, and Sema could call this method, and other clients could call the ordinary Report() function.

Then, the Report() function would not need to get a FullSourceLoc and SourceRanges either; they only make sense if a HeaderSearch is available.

With this change, DiagClient doesn't need to know about HeaderSearch at all (but Diagnotic must know about it, hence HeaderSearch would have to stay in Basic).

How does that sound?

Nico

If isInPrivateHeader() is not meaningful for all DiagClients, it
should be removed from DiagClient. Perhaps Diag should have two
methods, ReportWithFile(), that takes a HeaderSearch, and Report().
Then Lex, Parser, and Sema could call this method, and other clients
could call the ordinary Report() function.

The attached patch does the following changes:

* Remove isInSystemHeader() from DiagClient
* Instead, give Diagnostic a few more Report() overloads that take a HeaderSearch as parameter, so that clients calling Report() decide if a HeaderSearch should be used (and, in turn, warning from system headers should be ignored)
* To make this possible, move HeaderSearch, HeaderMap and DirectoryLookup to Basic
* Change Lex, Parse, and Sema so that they pass a HeaderSearch object to Report()
* Move FormatError() from TextDiagnostic up to DiagClient, remove now empty class TextDiagnostic

This fixes the following problems:

* -html-diags (and probably others) does now output the same set of warnings as console clang does
* nothing crashes if one forgets to call setHeaderSearch() on TextDiagnostic
* some code duplication is removed

Another approach would be to make HeaderSearch a (possibly NULL) member of Diagnostic instead of passing it to Report(), but then all clients of Diagnostic have to remember to set this. Now, only the clients of Report() have to do this. As Report() is mostly clang-internal, but Diagnostic is used by clients linking clang's libraries, this makes more sense to me.

Nico

diag_refactor_v2_2.patch (80.4 KB)

Hi,

HeaderSearch is tied to the Preprocessor, and hence belongs in libLex, not libBasic.

Why? HeaderSearch does not use anything from the Preprocessor. To me, it's not more Lex-y than IdentifierInfo, which is in Lex, too.

HeaderSearch implements functionality of the Preprocessor. It provides the logic to help resolve what a #include and #import does. It doesn't use the Preprocessor, but it is a piece of the preprocessor component. The whole notion of a system header, etc., is all tied to the preprocessor. That's why I don't believe HeaderSearch belongs in libBasic.

If isInPrivateHeader() is not meaningful for all DiagClients, it should be removed from DiagClient. Perhaps Diag should have two methods, ReportWithFile(), that takes a HeaderSearch, and Report(). Then Lex, Parser, and Sema could call this method, and other clients could call the ordinary Report() function.

I don't see the advantage of having Report methods that are overloaded in this manner. This would also place a burden on clients of Diagnositcs to use either method, which seems to shift some of the burden of responsibility from DiagnosticClient to the clients of Diagnostics. The clients of Diagnostic should *not* decide the policy of how diagnostics are rendered; that's the task of the DiagnosticClient. Clients should typically only care about SourceLocations; that's the beauty of the current interface. Specific instances of DiagnosticClient can then decide how diagnostics are presented by using information from HeaderSearch, etc.

With this change, DiagClient doesn't need to know about HeaderSearch at all (but Diagnotic must know about it, hence HeaderSearch would have to stay in Basic).

I agree that methods such as TextDiagnosticClient::isInSystemHeader() should be refactored for use by multiple DiagnosticClients, but I'm not convinced (yet) that putting it into Diagnostics is the answer. They current layering allows the Diagnostics class to be useful even in the complete absence of preprocessor information.

If isInPrivateHeader() is not meaningful for all DiagClients, it
should be removed from DiagClient. Perhaps Diag should have two
methods, ReportWithFile(), that takes a HeaderSearch, and Report().
Then Lex, Parser, and Sema could call this method, and other clients
could call the ordinary Report() function.

The attached patch does the following changes:

Hi Nico!

I like some parts of this patch, but others are more questionable. The nice thing about libbasic right now is that it is C-indepedent. The components in it are reusable for other languages, e.g. if you did a Java or Ada frontend. HeaderSearch is quite C specific, so I'm somewhat uncomfortable with moving it into basic.

The IdentifierTable.h comment change is fine, plz commit, the SourceManager.h change also looks fine.

* Remove isInSystemHeader() from DiagClient

isInSystemHeader really is a hack and it's been on my todo list to fix for quite awhile. However, I think it is best to turn it into a property managed by SourceManager on a per-file-id basis. I think that when the preprocessor ends up calling "SourceManager::create*FileID", that it should pass in the system-headerness into SourceManager. Then we'd have methods on SourceManager that tell you if a SourceLocation is in a system header or not.

Does this seem reasonable?

* Instead, give Diagnostic a few more Report() overloads that take a HeaderSearch as parameter, so that clients calling Report() decide if a HeaderSearch should be used (and, in turn, warning from system headers should be ignored)

If SourceManager tracked this, we wouldn't need this.

This fixes the following problems:

* -html-diags (and probably others) does now output the same set of warnings as console clang does
* nothing crashes if one forgets to call setHeaderSearch() on TextDiagnostic
* some code duplication is removed

I think that making SourceManager handle system-headerness (and having the PP pass this info down from headersearch when it is entering #includes, etc) would solve this problem in a less invasive way. Do you think this approach would work?

Thanks for working on this, this is a very needed improvement!

-Chris

Hi Chris,

I like some parts of this patch, but others are more questionable. The nice thing about libbasic right now is that it is C-indepedent. The components in it are reusable for other languages, e.g. if you did a Java or Ada frontend. HeaderSearch is quite C specific, so I'm somewhat uncomfortable with moving it into basic.

I don't _quite_ buy that, as SourceLocation can contain a MacroID, FileIDInfo has an IncludeLoc, etc -- there's C-dependence in Basic everywhere :wink:

* Remove isInSystemHeader() from DiagClient

isInSystemHeader really is a hack and it's been on my todo list to fix for quite awhile. However, I think it is best to turn it into a property managed by SourceManager on a per-file-id basis. I think that when the preprocessor ends up calling "SourceManager::create*FileID", that it should pass in the system-headerness into SourceManager. Then we'd have methods on SourceManager that tell you if a SourceLocation is in a system header or not.

Does this seem reasonable?

That's a great idea. I've done that.

This fixes the following problems:

* -html-diags (and probably others) does now output the same set of warnings as console clang does
* nothing crashes if one forgets to call setHeaderSearch() on TextDiagnostic
* some code duplication is removed

I think that making SourceManager handle system-headerness (and having the PP pass this info down from headersearch when it is entering #includes, etc) would solve this problem in a less invasive way. Do you think this approach would work?

Seems to work fine; attached.

Nico

diag_refactor_v3.patch (22.1 KB)

Hi Chris,

I like some parts of this patch, but others are more questionable. The nice thing about libbasic right now is that it is C-indepedent. The components in it are reusable for other languages, e.g. if you did a Java or Ada frontend. HeaderSearch is quite C specific, so I'm somewhat uncomfortable with moving it into basic.

I don't _quite_ buy that, as SourceLocation can contain a MacroID, FileIDInfo has an IncludeLoc, etc -- there's C-dependence in Basic everywhere :wink:

Fair enough, you caught my lie/exageration :wink:

* Remove isInSystemHeader() from DiagClient

isInSystemHeader really is a hack and it's been on my todo list to fix for quite awhile. However, I think it is best to turn it into a property managed by SourceManager on a per-file-id basis. I think that when the preprocessor ends up calling "SourceManager::create*FileID", that it should pass in the system-headerness into SourceManager. Then we'd have methods on SourceManager that tell you if a SourceLocation is in a system header or not.

Does this seem reasonable?

That's a great idea. I've done that.

Excellent! Some minor tweaks:

  unsigned SourceManager::createFileID(const ContentCache *File,
+ SourceLocation IncludePos,
+ bool isSysHeader
+ ) {

Please put the ") {" on the previous line for consistency with the rest of the code.