EXTENSION for some diags

Make some diags EXTENSION so they are errors with pedantic errors.

Downside is that non-pedantic compiles are silent.

We really need a slightly different diagnostic policy.

Neil.

diff (1.03 KB)

Make some diags EXTENSION so they are errors with pedantic errors.

Patch looks good, please commit.

Downside is that non-pedantic compiles are silent.

We really need a slightly different diagnostic policy.

You're right, we do. What do you suggest? GCC has a class of extension warnings that are emitted by default. Is this what you mean? That would give us:

1. Error.
2. Warning always (eventually enabled or disabled by other -W options)
3. Extension, defaults to a warning
4. Extension, defaults to silent

-pedantic would then map #4 to a warning. -pedantic-errors would make #3/#4 to error.

Sound reasonable?

-Chris

Chris Lattner wrote:-

You're right, we do. What do you suggest? GCC has a class of
extension warnings that are emitted by default. Is this what you
mean? That would give us:

1. Error.
2. Warning always (eventually enabled or disabled by other -W options)
3. Extension, defaults to a warning
4. Extension, defaults to silent

-pedantic would then map #4 to a warning. -pedantic-errors would
make #3/#4 to error.

Sound reasonable?

I implemented something like the following myself; I'm quite fond
of this approach and it's flexible:

a) Each diagnostic tag has a default severity suitable for normal
   compiles. Severities are, in order, suppress, remark, warning,
   soft error, hard error, fatal and maybe ice.
b) User can change the severity of any diagnostic with default
   severity <= soft error to any severity <= soft error, either
   via #pragma or on the command line.
c) In C90 and C99 pedantic mode, a table of tags that must
   be diagnosed according to the standard is scanned and the
   severity of said tag "upgraded" to warning (-pedantic or
   equivalent) or error (-pedantic-errors or equivalent) if it
   isn't already.
d) Some severities need to be different depending on compile-
   time context anyway (e.g. integer division by zero is a
   run-time error, compile-time warning except in constant
   expressions where it is compile-time error). So when
   raising a diagnostic it should be possible for the front
   end to specify a severity override anyway.

You probably feel something slightly different is appropriate
for LLVM, but there may be ideas worth using here. It avoids
the need for duplicate tags too, which is good from a user-
interface point of view if they are exposed to the user (they
probably should be for the fine-grained control of b)).

Something I've not implemented yet is diagnostic groups, so
you could give groups of tags a name and manipulate them
en-masse, e.g. portability warnings, unused warnings, etc.

Neil.

Neil Booth wrote:-

I implemented something like the following myself; I'm quite fond
of this approach and it's flexible:

[snip]

Any thoughts?

Neil.

You're right, we do. What do you suggest? GCC has a class of
extension warnings that are emitted by default. Is this what you
mean? That would give us:

1. Error.
2. Warning always (eventually enabled or disabled by other -W options)
3. Extension, defaults to a warning
4. Extension, defaults to silent

-pedantic would then map #4 to a warning. -pedantic-errors would
make #3/#4 to error.

Sound reasonable?

I implemented something like the following myself; I'm quite fond
of this approach and it's flexible:

Sorry for dropping the ball on this, it got buried in some other mail when I was playing with filtering rules. :frowning:

a) Each diagnostic tag has a default severity suitable for normal
   compiles. Severities are, in order, suppress, remark, warning,
   soft error, hard error, fatal and maybe ice.
b) User can change the severity of any diagnostic with default
   severity <= soft error to any severity <= soft error, either
   via #pragma or on the command line.
c) In C90 and C99 pedantic mode, a table of tags that must
   be diagnosed according to the standard is scanned and the
   severity of said tag "upgraded" to warning (-pedantic or
   equivalent) or error (-pedantic-errors or equivalent) if it
   isn't already.
d) Some severities need to be different depending on compile-
   time context anyway (e.g. integer division by zero is a
   run-time error, compile-time warning except in constant
   expressions where it is compile-time error). So when
   raising a diagnostic it should be possible for the front
   end to specify a severity override anyway.

You probably feel something slightly different is appropriate
for LLVM, but there may be ideas worth using here. It avoids
the need for duplicate tags too, which is good from a user-
interface point of view if they are exposed to the user (they
probably should be for the fine-grained control of b)).

I think this makes sense. You basically have a table mapping diagnostics to severity, which can be manipulated based on a variety of things.

clang currently has two distinct notions: we distinguish between what class a particular diagnostic falls into from how it is reported. These two categorizations are used by different pieces of the diagnostic handling machinery.

The first part of the diagnostics machinery is the piece that is generating the diagnostics: the parser, lexer, sema, checking system, etc. Each diagnostic itself is tagged (in DiagnosticKinds.def) with a class, which is a member of the set {NOTE,WARNING,EXTENSION,ERROR}. The important part of the classification at this level is that it is completely independent of how the diagnostic is reported and they have obvious meanings:

1. NOTEs (which aren't used and may be removed in the future) are informational messages. For example, perhaps the optimizer could emit information about which loops are vectorized etc.

2. WARNINGs are emitted for legal code that is questionable somehow.

3. EXTENSIONs are emitted for illegal code that is accepted anyway as an extension to the language.

4. ERRORs flag a malformed program where error recovery techniques must be used to continue parsing.

I believe that each of these categories is well described and simple. It is easy to determine which class any particular diagnostic falls into.

a) Each diagnostic tag has a default severity suitable for normal
   compiles. Severities are, in order, suppress, remark, warning,
   soft error, hard error, fatal and maybe ice.

In contrast to your system, we don't specify a "default mapping": that is completely client-specific. Because the library generating the diagnostic doesn't know/care about how these diagnostics are emitted, we don't need to talk about suppressed diagnostics. I'm not sure the distinction between soft/hard error in your scheme. Fatal/ICE are also interesting: we don't emit a diagnostic for these cases: these are presumably internal consistency failures, where we prefer to have the library assert/abort and die.

When diagnostics are reported by the parser (etc), they get routed through the Diagnostic class. This class is designed to do various types of mapping, which turns the class above into a concrete diagnostic level. The level is completely different from the class, but they have similar names (and are thus somewhat confusing). The Diagnostic class maps each diagnostic onto the level set, which is { Ignored, Note, Warning, Error, Fatal }.

These levels are what we expect the tool to report to the user. As you might expect, the {note,warning,extension} classes can be mapped onto any of these levels, and the {error} class can only be mapped onto {error,fatal}. Most of these are self-explanatory, but here are some interesting pieces:

1. The warning class can be mapped onto the note level if desired (or completely ignored) at the discretion of the client.

2. Fatal errors aren't property of the diagnostic, they are a level that can be mapped onto: we expect all errors to be recovered from when the diagnostic generator produces them. The Fatal level is useful for clients that want the parser to stop as soon as possible after some set of diagnostics are emitted (for example an error).

I believe that this system is fully general: any interesting policy the client wants can be implemented with this scheme (Diagnostic::setDiagnosticMapping can be used to specify a level on a per-diagnostic basis), and higher level policies (e.g. -Werror, -pedantic, -pedantic-errors etc) can be easily handled by mapping entire classes to levels.

Something I've not implemented yet is diagnostic groups, so
you could give groups of tags a name and manipulate them
en-masse, e.g. portability warnings, unused warnings, etc.

This is also something that clang is missing. In particular, the client would like to be able to manipulate groups at a coarse grain to implement controls such as the GCC options -Wextra, -Wall, -W, etc. These are basically controls that modify a whole set of diagnostics. However, it is important to note that the groupings are client specific, so this should be implemented in the driver, not in the libraries.

The original topic of discussion was what "extensions" should be warned about even when -pedantic is not specified. In contrast to my original proposal, I now don't think this should be another class of warning, I think this should be controlled with the (currently missing) diagnostic grouping mechanism in the driver.

Is this design reasonable?

-Chris

Chris Lattner wrote:-

clang currently has two distinct notions: we distinguish between what
class a particular diagnostic falls into from how it is reported.
These two categorizations are used by different pieces of the
diagnostic handling machinery.

The first part of the diagnostics machinery is the piece that is
generating the diagnostics: the parser, lexer, sema, checking system,
etc. Each diagnostic itself is tagged (in DiagnosticKinds.def) with
a class, which is a member of the set
{NOTE,WARNING,EXTENSION,ERROR}. The important part of the
classification at this level is that it is completely independent of
how the diagnostic is reported and they have obvious meanings:

1. NOTEs (which aren't used and may be removed in the future) are
informational messages. For example, perhaps the optimizer could
emit information about which loops are vectorized etc.

2. WARNINGs are emitted for legal code that is questionable somehow.

3. EXTENSIONs are emitted for illegal code that is accepted anyway as
an extension to the language.

4. ERRORs flag a malformed program where error recovery techniques
must be used to continue parsing.

I believe that each of these categories is well described and
simple. It is easy to determine which class any particular
diagnostic falls into.

So client is just gets one of these severities and it's up to
them how they handle it?

>a) Each diagnostic tag has a default severity suitable for normal
> compiles. Severities are, in order, suppress, remark, warning,
> soft error, hard error, fatal and maybe ice.

In contrast to your system, we don't specify a "default mapping":
that is completely client-specific. Because the library generating
the diagnostic doesn't know/care about how these diagnostics are
emitted, we don't need to talk about suppressed diagnostics. I'm not
sure the distinction between soft/hard error in your scheme.

To the client they're both errors. Only soft errors can have their
severities changed by the user; hard errors are always hard errors.
This was buried in my mail and probably not clear.

Fatal/ ICE are also interesting: we don't emit a diagnostic for these cases:
these are presumably internal consistency failures, where we prefer
to have the library assert/abort and die.

Yeah, ICE should probably become fatal. #error is a good example
of a fatal error - the std specifies translation should halt (something
GCC gets wrong but it's hard to fix in GCC because of its architecture).

When diagnostics are reported by the parser (etc), they get routed
through the Diagnostic class. This class is designed to do various
types of mapping, which turns the class above into a concrete
diagnostic level. The level is completely different from the class,
but they have similar names (and are thus somewhat confusing). The
Diagnostic class maps each diagnostic onto the level set, which is
{ Ignored, Note, Warning, Error, Fatal }.

These levels are what we expect the tool to report to the user. As
you might expect, the {note,warning,extension} classes can be mapped
onto any of these levels, and the {error} class can only be mapped
onto {error,fatal}.

OK. But is this mapping per-tag, or blanket for all tags? If the
former, it seems unduly inflexible, and if the latter then why
bother having a mapping (i.e. a severity at the raising side) at
all? Why not just provide a diagnostic tag and leave it to the
client what they want to do with it? I'm struggling to understand
the rationale for the two-level nature of it.

I believe that this system is fully general: any interesting

policy

the client wants can be implemented with this scheme
(Diagnostic::setDiagnosticMapping can be used to specify a level on a
per-diagnostic basis), and higher level policies (e.g. -Werror, -
pedantic, -pedantic-errors etc) can be easily handled by mapping
entire classes to levels.

This kind of answers my question above I think, but my confusion
remains.

What about diagnostics, such as for line comments, that is an
extension in one dialect but not another? Either you're duplicating
the diagnostic tag, which means presenting the user with two tags
for the same diagnostic, or you're requiring clients to know what
is and isn't an extension in various dialects.

What about integer division by zero? It should be a hard error in
some contexts and not others, even for a single dialect. How does
the client deal with that? It seems the only solution is two tags.
I can imagine situations where essentially the same diagnostic
needs to become three or four tags with multiple dialect support.

>Something I've not implemented yet is diagnostic groups, so
>you could give groups of tags a name and manipulate them
>en-masse, e.g. portability warnings, unused warnings, etc.

This is also something that clang is missing. In particular, the
client would like to be able to manipulate groups at a coarse grain
to implement controls such as the GCC options -Wextra, -Wall, -W,
etc. These are basically controls that modify a whole set of
diagnostics. However, it is important to note that the groupings are
client specific, so this should be implemented in the driver, not in
the libraries.

The original topic of discussion was what "extensions" should be
warned about even when -pedantic is not specified. In contrast to my
original proposal, I now don't think this should be another class of
warning, I think this should be controlled with the (currently
missing) diagnostic grouping mechanism in the driver.

Is this design reasonable?

I can see the reasoning behind it, but I'm not convinced it's the
best approach for the reasons above. I may be misunderstanding
how it works.

One other thing - raising a diagnostic only for it to be suppressed
at the client side can be expensive when various other information
has to be calculated just to raise it. For example, a diagnostic
indicating that one decl hides another. Depending on your symbol
table, even determining that something hides something else, or
what kind of entity it's hiding (nice to have that in the diagnostic)
can take time. Another example is coming up with the English
reprentation of types for diagnostics. You don't want to do all
that work only to find the diagnostic is suppressed anyway. How
do you intend to handle such things?

Neil.

Each diagnostic itself is tagged (in DiagnosticKinds.def) with
a class, which is a member of the set
{NOTE,WARNING,EXTENSION,ERROR}. The important part of the
classification at this level is that it is completely independent of
how the diagnostic is reported and they have obvious meanings:

So client is just gets one of these severities and it's up to
them how they handle it?

These are classifications, not severities. The Diagnostic class maps this onto a "level" which is basically a severity. The client implements the DiagnosticClient interface which is passed a level.

a) Each diagnostic tag has a default severity suitable for normal
  compiles. Severities are, in order, suppress, remark, warning,
  soft error, hard error, fatal and maybe ice.

In contrast to your system, we don't specify a "default mapping":
that is completely client-specific. Because the library generating
the diagnostic doesn't know/care about how these diagnostics are
emitted, we don't need to talk about suppressed diagnostics. I'm not
sure the distinction between soft/hard error in your scheme.

To the client they're both errors. Only soft errors can have their
severities changed by the user; hard errors are always hard errors.
This was buried in my mail and probably not clear.

Ok, so soft errors don't require recovery I assume.

Fatal/ ICE are also interesting: we don't emit a diagnostic for these cases:
these are presumably internal consistency failures, where we prefer
to have the library assert/abort and die.

Yeah, ICE should probably become fatal. #error is a good example
of a fatal error - the std specifies translation should halt (something
GCC gets wrong but it's hard to fix in GCC because of its architecture).

Interesting, good point. In clang, the client can choose to map it either to error or fatal.

When diagnostics are reported by the parser (etc), they get routed
through the Diagnostic class. This class is designed to do various
types of mapping, which turns the class above into a concrete
diagnostic level. The level is completely different from the class,
but they have similar names (and are thus somewhat confusing). The
Diagnostic class maps each diagnostic onto the level set, which is
{ Ignored, Note, Warning, Error, Fatal }.

These levels are what we expect the tool to report to the user. As
you might expect, the {note,warning,extension} classes can be mapped
onto any of these levels, and the {error} class can only be mapped
onto {error,fatal}.

OK. But is this mapping per-tag, or blanket for all tags?

The mapping is per-diagnostic.

If the
former, it seems unduly inflexible, and if the latter then why
bother having a mapping (i.e. a severity at the raising side) at
all? Why not just provide a diagnostic tag and leave it to the
client what they want to do with it? I'm struggling to understand
the rationale for the two-level nature of it.

There are three separate stages here: producing diagnostics, mapping them, and reporting them somehow. In general, the parser just unconditionally emits diagnostics in some cases without care for how it will be reported.

In other cases, computing whether the diagnostic should be emitted is expensive. For example "diag::pp_macro_not_used" requires looping over all macro definitions to determine whether any of them are unused. This warning is usually not enabled, so doing this computation is wasted work. To handle this, the parser makes a query the mapping code to see what level it will be mapped onto. If it is mapped onto ignore, the macro table isn't walked (the code is at Preprocessor.cpp:1112).

Separating the mapping from reporting thus provides two features: 1) the parser can see how diagnostics will get mapped. 2) the mapping logic is shared among all the clients.

I believe that this system is fully general: any interesting

policy

the client wants can be implemented with this scheme
(Diagnostic::setDiagnosticMapping can be used to specify a level on a
per-diagnostic basis), and higher level policies (e.g. -Werror, -
pedantic, -pedantic-errors etc) can be easily handled by mapping
entire classes to levels.

This kind of answers my question above I think, but my confusion
remains.

What about diagnostics, such as for line comments, that is an
extension in one dialect but not another?

I assume you mean "//" comments? In this case, we have logic like this (Lexer.cpp):

   if (!Features.BCPLComment) {
     Diag(BufferPtr, diag::ext_bcpl_comment);

     // Mark them enabled so we only emit one warning for this translation
     // unit.
     Features.BCPLComment = true;
   }

If // comments aren't an extension in this dialect, then the warning is never produced, so it doesn't matter how it is mapped.

Either you're duplicating
the diagnostic tag, which means presenting the user with two tags
for the same diagnostic, or you're requiring clients to know what
is and isn't an extension in various dialects.

We do sometimes have to duplicate the tag, for example if it's an extension in one dialect but a warning in another. This is actually ok though IMO, because you want the message to be different for the different cases.

What about integer division by zero? It should be a hard error in
some contexts and not others, even for a single dialect. How does
the client deal with that? It seems the only solution is two tags.
I can imagine situations where essentially the same diagnostic
needs to become three or four tags with multiple dialect support.

We'd use two tags: this makes sense because the message should be different. In general, I'd like to have many fine-grained diagnostics with extremely helpful messages than trying to squish down the number we have.

Is this design reasonable?

I can see the reasoning behind it, but I'm not convinced it's the
best approach for the reasons above. I may be misunderstanding
how it works.

Please let me know if the above helps :slight_smile:

One other thing - raising a diagnostic only for it to be suppressed
at the client side can be expensive when various other information
has to be calculated just to raise it. For example, a diagnostic
indicating that one decl hides another. Depending on your symbol
table, even determining that something hides something else, or
what kind of entity it's hiding (nice to have that in the diagnostic)
can take time. Another example is coming up with the English
reprentation of types for diagnostics. You don't want to do all
that work only to find the diagnostic is suppressed anyway. How
do you intend to handle such things?

I covered this above.

-Chris

Chris Lattner wrote:-

Please let me know if the above helps :slight_smile:

I think so. The need for classification seems weakened though,
but lets see how it goes. It'll probably be clear once more stuff
is in-place.

Neil.