[RFC] Add warning capabilities in LLVM.

Hi,

I would like to start a discussion about error/warning reporting in LLVM and how we can extend the current mechanism to take advantage of clang capabilities.

** Motivation **

Currently LLVM provides a way to report error either directly (print to stderr) or by using a user defined error handler. For instance, in inline asm parsing, we can specify the diagnostic handler to report the errors in clang.

The basic idea would be to be able to do that for warnings too (and for other kind of errors?).
A motivating example can be found with the following link where we want LLVM to be able to warn on the stack size to help developing kernels:
http://llvm.org/bugs/show_bug.cgi?id=4072

By adding this capability, we would be able to have access to all the nice features clang provides with warnings:

  • Promote it to an error.
  • Ignore it.

** Challenge **

To be able to take advantage of clang framework for warning/error reporting, warnings have to be associated with warning groups.
Thus, we need a way for the backend to specify a front-end warning type.

The challenge is, AFAICT (which is not much, I admit), that front-end warning types are statically handled using tablegen representation.

** Advices Needed **

  1. Decide whether or not we want such capabilities (if we do not we may just add sporadically the support for a new warning/group of warning/error).
  2. Come up with a plan to implement that (assuming we want it).

Thanks for the feedbacks.

Cheers,

-Quentin

Hi,

I would like to start a discussion about error/warning reporting in LLVM and
how we can extend the current mechanism to take advantage of clang
capabilities.

** Motivation **

Currently LLVM provides a way to report error either directly (print to
stderr) or by using a user defined error handler. For instance, in inline
asm parsing, we can specify the diagnostic handler to report the errors in
clang.

The basic idea would be to be able to do that for warnings too (and for
other kind of errors?).
A motivating example can be found with the following link where we want LLVM
to be able to warn on the stack size to help developing kernels:
http://llvm.org/bugs/show_bug.cgi?id=4072

By adding this capability, we would be able to have access to all the nice
features clang provides with warnings:
- Promote it to an error.
- Ignore it.

** Challenge **

To be able to take advantage of clang framework for warning/error reporting,
warnings have to be associated with warning groups.
Thus, we need a way for the backend to specify a front-end warning type.

The challenge is, AFAICT (which is not much, I admit), that front-end
warning types are statically handled using tablegen representation.

** Advices Needed **

1. Decide whether or not we want such capabilities (if we do not we may just
add sporadically the support for a new warning/group of warning/error).
2. Come up with a plan to implement that (assuming we want it).

The frontend should be presenting warnings, not the backend; adding a
hook which provides the appropriate information shouldn't be too hard.
Warnings coming out of the backend are very difficult to design well,
so I don't expect we will add many. Also, keep in mind that the
information coming out of the backend could be used in other ways; it
might not make sense for the backend to decide that some piece of
information should be presented as a warning. (Consider, for example,
IDE integration to provide additional information about functions and
loops on demand.)

-Eli

I really like this design, where essentially the frontend can query the
backend for very simple information using a nice API, and then emit the
warning itself. I'm happy for the warning to be in the CodeGen layer of the
frontend, and only be reachable when generating code for that function body.

They can also be added dynamically if necessary. See
DiagnosticsEngine::getCustomDiagID

-- Sean Silva

I think we definitely need this. In fact, I tried adding something simple earlier this year but gave up when I realized that the task was bigger than I expected. We already have a hook for diagnostics that can be easily extended to handle warnings as well as errors (which is what I tried earlier), but the problem is that it is hardwired for inline assembly errors. To do this right, new warnings really need to be associated with warning groups so that can be controlled from the front-end.

I agree with Eli that there probably won’t be too many of these. Adding a few new entries to clang’s diagnostic .td files would be fine, except that the backend doesn’t see those. It seems like we will need something in llvm that defines a set of “backend diagnostics”, along with a table in the frontend to correlate those with the corresponding clang diagnostics. That seems awkward at best but maybe it’s tolerable as long as there aren’t many of them.

—Bob

As I recall, after using these from experience, this approach doesn’t actually properly handle things like -Werror “automatically.”

I actually think this is the wrong approach, and I don't think it's quite
what Eli or I am suggestion (of course, Eli may want to clarify, I'm only
really clarifying what *I'm* suggesting.

I think all of the warnings should be in the frontend, using the standard
and existing machinery for generating, controlling, and displaying a
warning. We already know how to do that well. The difference is that these
warnings will need to query the LLVM layer for detailed information through
some defined API, and base the warning on this information. This
accomplishes two things:

1) It ensures the warning machinery is simple, predictable, and integrates
cleanly with everything else in Clang. It does so in the best way by simply
being the existing machinery.

2) It forces us to design reasonable APIs in LLVM to expose to a FE for
this information. A consequence of this will be to sort out the layering
issues, etc. Another consequence will be a strong chance of finding general
purpose APIs in LLVM that can serve many purposes, not just a warning.
Consider JITs and other systems that might benefit from having good APIs
for querying the size and makeup (at a high level) of a generated function.

A nice side-effect is that it simplifies the complexity involved for simple
warnings -- now it merely is the complexity of exposing the commensurately
simple API in LLVM. If instead we go the route of threading a FE interface
for *reporting* warnings into LLVM, we have to thread an interface with
sufficient power to express many different concepts.

-Chandler

I don’t understand what you are proposing.

First, let me try to clarify my proposal, in case there was any confusion about that. LLVMContext already has a hook for diagnostics, setInlineAsmDiagnosticHandler() et al. I was suggesting that we rename those interfaces to be more generic, add a simple enumeration of whatever diagnostics can be produced from the backend, and add support in clang for mapping those enumeration values to the corresponding clang diagnostics. This would be a small amount of work and would also be consistent with everything you wrote above about reusing the standard and existing machinery for diagnostics in clang. For the record, I had started down that path in svn commits 171041 and 171047, but I reverted those changes in 174748 and 174860, since they didn’t go far enough to make it work properly and it wasn’t clear at the time whether we really needed it.

Now let me try to understand what you’re suggesting…. You refer several times to having clang query the LLVM layer. Is this to determine whether to emit a diagnostic for some condition? How would this work? Would you have clang insert extra passes to check for various conditions that might require diagnostics? I don’t see how else you would do it, since clang’s interface to the backend just sets up the PerFunctionPasses, PerModulePasses and CodeGenPasses pass managers and then runs them. Assuming you did add some special passes to check for problems, wouldn’t those passes have to duplicate a lot of effort in some cases to find the answers? Take for example the existing warnings in IntrinsicLowering::LowerIntrinsicCall. Those badly need to be cleaned up. Would clang run a special pass to check for intrinsics that are not supported by the target? That pass would need to be implemented as part of clang so that it would have access to clang’s diagnostic machinery, but it would also need to know details about what intrinsics are supported by the target. Interesting layering problems there…. Apologies if I’m misinterpreting your proposal.

We had a similar problem with using LLVM on the GPU @ AMD as many times errors were not known until post-ISel/Resource Allocation. Our solution was to embed the errors in the resulting ISA and have the assembler/loader emit/error at that time.

So this kind of API would be useful for more than just kernel development.

Micah

I don’t understand what you are proposing.

First, let me try to clarify my proposal, in case there was any confusion about that. LLVMContext already has a hook for diagnostics, setInlineAsmDiagnosticHandler() et al. I was suggesting that we rename those interfaces to be more generic, add a simple enumeration of whatever diagnostics can be produced from the backend, and add support in clang for mapping those enumeration values to the corresponding clang diagnostics. This would be a small amount of work and would also be consistent with everything you wrote above about reusing the standard and existing machinery for diagnostics in clang. For the record, I had started down that path in svn commits 171041 and 171047, but I reverted those changes in 174748 and 174860, since they didn’t go far enough to make it work properly and it wasn’t clear at the time whether we really needed it.

Now let me try to understand what you’re suggesting…. You refer several times to having clang query the LLVM layer. Is this to determine whether to emit a diagnostic for some condition? How would this work? Would you have clang insert extra passes to check for various conditions that might require diagnostics? I don’t see how else you would do it, since clang’s interface to the backend just sets up the PerFunctionPasses, PerModulePasses and CodeGenPasses pass managers and then runs them. Assuming you did add some special passes to check for problems, wouldn’t those passes have to duplicate a lot of effort in some cases to find the answers? Take for example the existing warnings in IntrinsicLowering::LowerIntrinsicCall. Those badly need to be cleaned up. Would clang run a special pass to check for intrinsics that are not supported by the target? That pass would need to be implemented as part of clang so that it would have access to clang’s diagnostic machinery, but it would also need to know details about what intrinsics are supported by the target. Interesting layering problems there…. Apologies if I’m misinterpreting your proposal.

We can’t assume clang is the frontend or design a system that only works with clang. There are many systems that use llvm which are not even c compilers.

Evan

Hi,

Thanks all for the insight comments.

Let me sum up at a high level what proposals we actually have (sorry if I misinterpreted or missed something, do not hesitate to correct me):

  1. Make LLVM defines some APIs that exposes some internal information so that a front-end can use them to build-up some diagnostics.

  2. Make LLVM builds up a diagnostic and let a front-end maps this diagnostic to the right warning/error group.

In my opinion, both approaches are orthogonal and have different advantages based on what goal we are pursuing.

To be more specific, with the first approach, front-end people can come up with new warnings/errors and diagnostics without having to modify the back-end. On the other hand, with the second approach, back-end people can emit new warnings/errors without having to modify the front-end (BTW, it would be nice if we come up at least in clang with a consistent way to pass down options for emitting back-end warning/error when applicable, i.e., without -mllvm but with -W).

What people are thinking?

Thanks again for sharing your thoughts, I appreciate.

Cheers,

-Quentin

Hi,

As no new suggestions came in, I would like to share my plans to move forward.

— Feedbacks are very welcome! –

Like I summed up in my previous email (see below), there are two orthogonal approaches to this problem.
I have chosen the second one (extend the reporting already used for inline asm).

Regarding the first approach (front-end querying the back-end to build up diagnostic), it needs further discussion if anyone wants to go in that direction. Although I think this is interesting, I leave that for someone else.

** Proposed Plan **

Extend what already exists: back-end reports to the front-end through hooks.

The detail plan is the following (still hand wavy, I did not look into the actual code base, just had some high level discussions):

  1. LLVM: Extend the current diagnostic hook to pass:
    1.1. A kind of diagnostic (InlineAsm, StackSize, Other ).
    1.2. A boolean hinting whether the diagnostic is an error or a warning.
  2. Clang: Map the diagnostic to the proper (pre-defined) group based on the passed kind.

** Remarks on **

1.1. This is supposed to change rarely, that is why a static approach should do the trick. The ‘Other’ kind represents a fall-back cases for thing we did not anticipate (do not get too hung up on the names, they have to be defined (name, which kind of back-end diagnostic we want to expose, etc.)).
1.2. Each front-end does not have clang capabilities, thus this boolean will help the “front-end” to figure out what it should do with that diagnostic (e.g., keep going or abort). Clang may ignore them.

  1. Obviously, the kind of diagnostic has to be shared between Clang and LLVM, this is the awkward part, but again, this should rarely change.

Thanks again for all the contributions made so far and for the future ones as well!

Cheers,

-Quentin

Sorry for the delays, just haven't been able to get back around to this. My
silence isn't about not caring.

Like I summed up in my previous email (see below), there are two
orthogonal approaches to this problem.
I have chosen the second one (extend the reporting already used for inline
asm).

Regarding the first approach (front-end querying the back-end to build up
diagnostic), it needs further discussion if anyone wants to go in that
direction. Although I think this is interesting, I leave that for someone
else.

I really don't like the second approach.

Fundamentally, it increasingly couples LLVM to Clang's warning
infrastructure and I think that's a bad thing. I think we are better served
by designing good interfaces for the frontend and the backend to
communicate in order for different frontends to provide different types and
kinds of information to users based on their needs.

Bob described one potential API for this, but I think there are much better
ones. I don't think we have to be constrained by the idea of using a pass
to compute and communicate this information -- there are other approaches.
I haven't thought about them because I haven't designed such an interface.
If I had, I would just contribute it. I do think that this is the correct
design direction though, as I don't think that there is one unified way to
represent diagnostics in the backend, or if there is I don't tihnk we have
any idea what it looks like.

Inline assembly is more of a special case than anything else. I don't
really see why the right path forward is to design a generic framework
around what was originally built for its specific purpose.

If anything, I think the design you are pursuing is strictly more complex
and invasive, and without any tangible benefits as a consequence.
-Chandler

Hi Chandler,

We can write a default diagnostic handler suitable for something like
llc, where there isn't really any source code to point at.

-Eli

I disagree. The interface that Quentin sketched out isn’t coupling LLVM’s warnings to Clang at all. When used with Clang, it allows all of Clang’s nice diagnostic machinery to work well, but it isn’t tied to that.

The proposal has a generic hook that any frontend or driver can use to be notified of warnings from LLVM. That hook would provide a simple and clean interface with just the basic information:

  • severity level (warning or error)
  • an enum to indicate the kind of diagnostic (InlineAsm, StackSize, etc.)

How is that coupling us to Clang’s warning infrastructure? It’s just a hook, and it’s very simple.

I wasn’t trying to describe a potential API. I was trying to figure out what you were proposing, and I don’t think adding new passes would work well at all. I can’t think of any reasonable way to design such an interface. Can you at least sketch out an example of what you are proposing? Without that, I think there’s really only one proposal on the table.

If we’re going to add a generic framework, shouldn’t the minimum requirement be that it support the current functionality?

What complexity do you see? It seems very, very simple to me.

Hi Quentin,

I have a particular use case that I'd like this infrastructure to support, and I'm not sure if your proposal captures this or not: hints from the autovectorization passes (and other loop optimizations). Specifically, I'd like to be able to relay messages like this:

- "*This loop* cannot be optimized because the induction variable *i* is unsigned, and cannot be proved not to wrap"

- "*This loop* cannot be vectorized because the compiler cannot prove that memory read from *a* does not overlap with memory written to through *b*"

- "*This loop* cannot be vectorized because the compiler cannot prove that it is unconditionally safe to dereference the pointer *a*.

and I'd like the frontend to get enough information through the interface to turn the starred items back into higher-level-language constructs (that way they can be turned into hyperlinks, underlined, or whatever). I think this might be possible by providing references to any associated debug metadata. Maybe we can define some simple markup, and have each marked item correspond to an entry in an array of metadata poitners? What do you think?

Thanks,
Hal

Sorry, just getting caught up on an old thread. I haven't been involved in discussions of this.

I've not talked with Chandler about this, but to sketch out the way
I'd do it (which is similar):

Have the backend vend diagnostics, this can be done either with a set
of enums and messages like you mentioned, or just have a message and
location struct ala:

struct Msg {
   const char *Message;
   Location Loc;
};

that the consumer of the message can use via a handler.

Alternately a handler (and we should have a default handler) can be
passed in from the printer of the message (the frontend in the case
provided) and it can be called on the error message. Absolutely this
should be done via the LLVMContext to deal with the case of parallel
function passes.

class Handler {
   void printDiagnostic(const char *Message, Location Loc);
};

(Note that I didn't say this was a fleshed out design :wink: I think I
prefer the latter to the former and we'd just need an "diagnostic
callback handler" on the context. Though we would need to keep a set
of diagnostics that the backend handles. That said, that it provides
diagnostics based on its input language seems to make the most sense.
It can use the location metadata if it has it to produce a location,
otherwise you get the function, etc. in some sort of nicely degraded
quality.

I think this scheme could also work as a way of dealing with the
"Optimization Diary" sort of use that Hal is envisioning as well.

Keeping the separation of concerns around where the front end handles
diagnostics on what we'll call "source locations" is pretty important,
however, I agree that not every warning can be expressed this way,
i.e. the stack size diagnostic. However, leaving the printing up to
the front end is the best way to deal with this and a generic
diagnostic engine would probably help for things like llc/opt where
the backend just deals with its input language - IR.

The existing inline asm diagnostics are ... problematic and it would
definitely be nice to get a generic interface for them. Though they're
actually separated into two separate cases where, I think, we end up
with our confusion:

a) Front end diagnostics - This is an area that needs some work to be
decent, but it involves the front end querying the back end for things
like register size, valid immediates, etc and should be implemented by
the front end with an expanded set of target queries. We could use
this as a way to solidify the backend api for the MS inline asm
support as well and use some of that when parsing GNU style inline
asm.

b) Back end diagnostics - This is the stuff that the front end has no
hope of diagnosing. i.e. "ran out of registers", or "can't figure out
how to split this up into this kind of vector register". The latter
has always been a bit funny and I'm always unhappy with it, but I
don't have any better ideas. A unified scheme of communicating "help
help I'm being oppressed by the front end" in the backend would be, at
the very least, a step forward.

Thoughts?

-eric

Sorry, just getting caught up on an old thread. I haven't been involved in discussions of this.

First, let me try to clarify my proposal, in case there was any confusion about that. LLVMContext already has a hook for diagnostics, setInlineAsmDiagnosticHandler() et al. I was suggesting that we rename those interfaces to be more generic, add a simple enumeration of whatever diagnostics can be produced from the backend, and add support in clang for mapping those enumeration values to the corresponding clang diagnostics. This would be a small amount of work and would also be consistent with everything you wrote above about reusing the standard and existing machinery for diagnostics in clang.

Of all of the proposals discussed, I like this the best:

1) This is a really simple extension of what we already have.

2) The backend providing a set of enumerations for the classes of diagnostics it produces doesn't tie it to clang, and doesn't make it language specific. Clients should be able to completely ignore the enum if they want the current (unclassified) behavior, and if an unknown enum value comes through, it is easy to handle.

3) I don't see how something like the stack size diagnostic can be implemented by clang calling into the backend. First, the MachineFunction (and thus, MachineFrameInfo) is a transient datastructure used by the backend when a function is compiled. There is nothing persistent for clang to query. Second, clang would have to know about all of the LLVM IR functions generated, which is possible, but impractical to track for things like thunks and other implicitly generated entrypoints.

What is the specific concern with this approach? I don't see how this couples the backend to the frontend or causes layering violation problems.

I've not talked with Chandler about this, but to sketch out the way
I'd do it (which is similar):

Have the backend vend diagnostics, this can be done either with a set
of enums and messages like you mentioned, or just have a message and
location struct ala:

struct Msg {
  const char *Message;
  Location Loc;
};

that the consumer of the message can use via a handler.

When the consumer is clang, it's important that we have diagnostic groups to control the warnings, so the enum is important. We don't want to be comparing the message strings to decide whether a particular warning is an instance of -Wstack-size (for example).

Alternately a handler (and we should have a default handler) can be
passed in from the printer of the message (the frontend in the case
provided) and it can be called on the error message. Absolutely this
should be done via the LLVMContext to deal with the case of parallel
function passes.

class Handler {
  void printDiagnostic(const char *Message, Location Loc);
};

(Note that I didn't say this was a fleshed out design :wink: I think I
prefer the latter to the former and we'd just need an "diagnostic
callback handler" on the context. Though we would need to keep a set
of diagnostics that the backend handles. That said, that it provides
diagnostics based on its input language seems to make the most sense.
It can use the location metadata if it has it to produce a location,
otherwise you get the function, etc. in some sort of nicely degraded
quality.

This is pretty much the same as what Quentin proposed (with the addition of the enum), isn't it?

I think this scheme could also work as a way of dealing with the
"Optimization Diary" sort of use that Hal is envisioning as well.

Yes, I agree.

Keeping the separation of concerns around where the front end handles
diagnostics on what we'll call "source locations" is pretty important,
however, I agree that not every warning can be expressed this way,
i.e. the stack size diagnostic. However, leaving the printing up to
the front end is the best way to deal with this and a generic
diagnostic engine would probably help for things like llc/opt where
the backend just deals with its input language - IR.

The existing inline asm diagnostics are ... problematic and it would
definitely be nice to get a generic interface for them. Though they're
actually separated into two separate cases where, I think, we end up
with our confusion:

a) Front end diagnostics - This is an area that needs some work to be
decent, but it involves the front end querying the back end for things
like register size, valid immediates, etc and should be implemented by
the front end with an expanded set of target queries. We could use
this as a way to solidify the backend api for the MS inline asm
support as well and use some of that when parsing GNU style inline
asm.

Yes, I agree about this, too. As an example, we had a discussion a few months ago about warning about over-aligned stack variables when dynamic stack realignment is disabled, and we agreed to move that warning into the frontend. I keep poking at the frontend team every once in a while to get some help implementing that, but I haven't forgotten it.

b) Back end diagnostics - This is the stuff that the front end has no
hope of diagnosing. i.e. "ran out of registers", or "can't figure out
how to split this up into this kind of vector register". The latter
has always been a bit funny and I'm always unhappy with it, but I
don't have any better ideas. A unified scheme of communicating "help
help I'm being oppressed by the front end" in the backend would be, at
the very least, a step forward.

Thoughts?

I don't have any better ideas for this. At least if we generalize the existing inline asm diagnostic handler, it will make it less of a special case.