[RFC] New diagnostic handler for llc

Hi all,

I'd like to add a new diagnostic handler to llc. Right now, llc
doesn't have one at all, and instead just exits after the first error
that it encounters. This is very different from the behaviour of clang
and other front ends, who try to report as many errors as possible
before exiting.

I think this is very important for testing LLVM's CodeGen in a more
robust fashion. For instance, PR24071, for which I have submitted a
patch recently [1], would've been uncovered by our current tests,
instead of being reported by a clang user through a .c file.

I've uploaded a sample patch for the new diagnostic handler on
Phabricator [2]. With this patch, we currently get 27 failures in
check-all. 9 of these are PR24071 and would be fixed by patch [1]. The
other 18 [3] seem to have pretty diverse causes and I don't know if I
could propose patches for all of them.

What would be a good way forward for this?

I was thinking about leaving the old behaviour, i.e. no diagnostic
handler, under a flag and running the remaining 18 failures with that
flag enabled. This would leave all bots green and help us write better
tests by default. Then we could progressively fix these bugs and get
rid of the flag. I know people here are a bit wary of such transitions
that could leave things in between. Is there a better option?

Thanks,
Diana

[1] http://reviews.llvm.org/D20156
[2] http://reviews.llvm.org/D20202 - Note that this isn't really a
review yet, I just thought it looks better there than as a plain
attachment.
[3] LLVM :: CodeGen/AMDGPU/call.ll
    LLVM :: CodeGen/AMDGPU/dynamic_stackalloc.ll
    LLVM :: CodeGen/AMDGPU/no-hsa-graphics-shaders.ll
    LLVM :: CodeGen/AMDGPU/private-memory-broken.ll
    LLVM :: CodeGen/AMDGPU/promote-alloca-bitcast-function.ll
    LLVM :: CodeGen/ARM/2012-09-25-InlineAsmScalarToVectorConv2.ll
    LLVM :: CodeGen/BPF/many_args1.ll
    LLVM :: CodeGen/BPF/many_args2.ll
    LLVM :: CodeGen/BPF/struct_ret1.ll
    LLVM :: CodeGen/BPF/struct_ret2.ll
    LLVM :: CodeGen/MIR/Generic/invalid-jump-table-kind.mir
    LLVM :: CodeGen/MIR/Generic/llvm-ir-error-reported.mir
    LLVM :: CodeGen/MIR/Generic/machine-function-missing-function.mir
    LLVM :: CodeGen/MIR/Generic/machine-function-missing-name.mir
    LLVM :: CodeGen/MIR/Generic/machine-function-redefinition-error.mir
    LLVM :: CodeGen/MIR/X86/spill-slot-fixed-stack-object-aliased.mir
    LLVM :: CodeGen/MIR/X86/spill-slot-fixed-stack-object-immutable.mir
    LLVM :: CodeGen/MIR/X86/variable-sized-stack-object-size-error.mir

The errors that clang reports are mostly related to invalid inputs, so it helps the user to report as many of them to reduce the number of recompilations needed to find all problems.
In case of llc, on the other hand, errors usually indicate some sort of a problem with the compiler itself.

The problem is that when we try to recover from an error in the backend, the recovery itself could potentially cause further erroneous situations. For example, if you generate an "undef" value for something you cannot meaningfully create otherwise, some code down the road may assert claiming that it did not expect an "undef" at this point. That may be a completely reasonable assertion, given an assumption that invalid input should have been rejected earlier on.

The problem in PR24071 seemed to be that clang proceeded with compilation even though the inline asm was not valid. I'm not sure that there is value in trying to make the backend continue compiling code that most likely has no meaning.

-Krzysztof

The problem is that when we try to recover from an error in the backend, the
recovery itself could potentially cause further erroneous situations. For
example, if you generate an "undef" value for something you cannot
meaningfully create otherwise, some code down the road may assert claiming
that it did not expect an "undef" at this point. That may be a completely
reasonable assertion, given an assumption that invalid input should have
been rejected earlier on.

I don't think the proposal is about transforming every llc error
handling into using the Diagnostics, only those that the back-ends
already use, but llc has no way to capture today.

The fact that diagnostics don't fail immediately is a consequence of
the type of the error, not the error handling.

The problem in PR24071 seemed to be that clang proceeded with compilation
even though the inline asm was not valid. I'm not sure that there is value
in trying to make the backend continue compiling code that most likely has
no meaning.

For the particular cases we're talking about (inline asm), it's not
about continuing or not, but about getting the right context.

Errors in the inline asm can be exposed by errors in the program
itself, and having all errors printed at once, even in llc, can be
very helpful in identifying the underlying problem.

This is also very helpful when creating tests (which is the primary
case for llc), to make sure the inline asm is doing what you think it
is.

cheers,
--renato

I'm not 100% convinced that is the case here. The inline asm errors
are reported via LLVMContext::emitError, which has this documentation:

  /// emitError - Emit an error message to the currently installed error handler
  /// with optional location information. This function returns, so code should
  /// be prepared to drop the erroneous construct on the floor and "not crash".
  /// The generated code need not be correct.

Since the backend is the one calling emitError, the backend is the one
that shouldn't crash, regardless of what diagnostic handler is
installed. I think llc should help us test that this is the case. At
least that's my interpretation of things.

I agree with your interpretation. We should give llc a diagnostic
handler so that 'emitError' doesn't exit immediately when we're
testing with llc.

As is probably obvious, I think this is a good idea. llc should be
able to test the full range of backend behaviour seen by other
front-ends, and part of that is the decision to not exit immediately
on an error.

As to how to do it, I'd be OK with a temporary option personally. I'll
see if I can do anything about some of the other problems once it's
in.

Cheers.

Tim.

I don't think we should be calling emitError from the backend, except for the MachineVerifier. The verifier could ascertain the validity of the input IR, and could report as many problems as it is able to find.

After that point I'd rather have an ICE when a problem is encountered.

-Krzysztof

Ok, I agree that emitError should not cause llc to crash.

-Krzysztof