[RFC] Call ParseLangArgs for all inputs

Hi,

clang/lib/CodeGen/BackendUtil.cpp contains multiple checks for attributes of LangOpts. The option processing that sets up LangOpts is mostly done by ParseLangArgs, which is skipped for LLVM IR files. Because of this, there are code generation differences when the -save-temps command line option is used: that command line option causes LLVM IR to be emitted to file and a new process to be spawned to process that file, which does not process options the same way.

An example of this is

   typedef float __attribute__((ext_vector_type(2))) float2;
   float2 foo (float2 a, float2 b, float2 c) {
     return a * b + c;
   }

This would generate different code with clang --target=mips -mcpu=mips32r5 -mfp64 -mmsa -O3 -ffp-contract=fast depending on whether -save-temps was also present, because the -ffp-contract=fast option affects instruction selection but was ignored for LLVM IR input files.

While CompilerInvocation::CreateFromArgs contained special exceptions for a few options that were handled by ParseLangArgs that also need to be handled for LLVM IR, there are many more, and just allowing ParseLangArgs to be called seems like the simpler fix.

I have created a patch for this and uploaded it as <https://reviews.llvm.org/D86695> to get feedback. Does this look like a good approach, or should instead the individual options that would have an effect even for LLVM IR inputs continue to be listed in CompilerInvocation::CreateFromArgs?

If the approach I took is a good one, how should -std=* options behave? I decided to continue silently ignoring them for InputKind::Precompiled as I feel it is reasonable to include them, but generate an error for them for LLVM IR inputs as I feel it is unreasonable to include them. Does that make sense too, or should they continue to be silently ignored for LLVM IR inputs as well?

The patch I uploaded does not yet include tests as I am not sure what sort of tests should be included. Would a test that the -ffp-contract=fast option is now honored for LLVM IR input files suffice, or should I find more options that were also previously silently ignored, and figure out how to construct test cases for those too?

Cheers,
Harald van Dijk

ping

ping 2

From: cfe-dev <cfe-dev-bounces@lists.llvm.org> On Behalf Of Harald van
Dijk via cfe-dev
Sent: Thursday, September 10, 2020 2:31 PM
To: cfe-dev@lists.llvm.org
Subject: Re: [cfe-dev] [RFC] Call ParseLangArgs for all inputs

ping 2

> ping
>
>> Hi,
>>
>> clang/lib/CodeGen/BackendUtil.cpp contains multiple checks for
>> attributes of LangOpts. The option processing that sets up LangOpts is
>> mostly done by ParseLangArgs, which is skipped for LLVM IR files.
>> Because of this, there are code generation differences when the
>> -save-temps command line option is used: that command line option
>> causes LLVM IR to be emitted to file and a new process to be spawned
>> to process that file, which does not process options the same way.

I have to wonder whether some of those options really belong in the CodeGen
category, rather than the Lang category. FP handling and sanitizers (the
two groups that first caught my eye) really don't seem like LangOpts to me.

Moving them would be more typing, of course, but I suspect the original
reason ParseLangArgs isn't used on that path is that language-relevant
actions should already have been taken by the time we're reading IR back in.

Just a thought. Offhand I don't know who would be super invested in this.
--paulr

From: cfe-dev <cfe-dev-bounces@lists.llvm.org> On Behalf Of Harald van
Dijk via cfe-dev
Sent: Thursday, September 10, 2020 2:31 PM
To: cfe-dev@lists.llvm.org
Subject: Re: [cfe-dev] [RFC] Call ParseLangArgs for all inputs

ping 2

ping

Hi,

clang/lib/CodeGen/BackendUtil.cpp contains multiple checks for
attributes of LangOpts. The option processing that sets up LangOpts is
mostly done by ParseLangArgs, which is skipped for LLVM IR files.
Because of this, there are code generation differences when the
-save-temps command line option is used: that command line option
causes LLVM IR to be emitted to file and a new process to be spawned
to process that file, which does not process options the same way.

I have to wonder whether some of those options really belong in the CodeGen
category, rather than the Lang category. FP handling and sanitizers (the
two groups that first caught my eye) really don't seem like LangOpts to me.

Moving them would be more typing, of course, but I suspect the original
reason ParseLangArgs isn't used on that path is that language-relevant
actions should already have been taken by the time we're reading IR back in.

This is true. That may be a misunderstanding of the intent of the function, as the name can be read two ways: it can be read as processing the options that are relevant for the language handling, and it can be read as processing the options that are *only* relevant for the language handling. The FP options are definitely relevant for the language handling and later codegen both. The sanitizers I'm not sure about, they are used during language handling, but I do not really know how they are used during later codegen. I had thought it would not be worth the effort to move the options, but if that would be the preferred route, I can submit that as a patch instead. It would have to be limited to the options where I can actually verify they affect codegen though, so it may be less complete.

Just a thought. Offhand I don't know who would be super invested in this.

The reason I was looking at this was to fix an issue reported by a customer who was trying to reproduce a bug reported to them. Bug handling by them is often done on LLVM IR files, so having different behaviour between -save-temps and not -save-temps, and the possibility of bugs that only show up when -save-temps is not used, was a problem for them. I do not know how common this is in other organisations using LLVM, I would think they are not the only one doing this, but it may be true that it is only a problem for a minority of users.

Regardless, I'm willing to do the work in getting this fixed, I just need some guidance on the preferred way of fixing it.

Cheers,
Harald van Dijk

I tried to move this into CodeGenOpts as suggested but that does not work: the default FP contract mode is only the default FP contract mode, copied into FPOptions [1]. It can be overridden at the operator level with #pragma float_control(precise, on), which ends up calling setFPPreciseEnabled [2] to disallow FP contractions for individual operations. The codegen options are not available here, nor do I think they should they be.

The FP options information is well gone at instruction selection time. I think this means that the fact that MIPS uses this at instruction selection time is a bug that results in what looks like wrong code: fmadd is generated even when that pragma is specified. I now think it is correct that fp-contract is a language option, and it is a bug for targets to perform FP contraction themselves. Does that sound correct, or am I missing something about how this pragma is supposed to work?

If this is correct, I do not expect that simply removing this from the backends is going to be agreeable, as that is obviously going to hurt performance of code that does not use this pragma.

My primary concern is ensuring that -save-temps does not change the generated code, but it is now looking like there is no simple way to achieve that except with changes that are not good in the long run.

Cheers,
Harald van Dijk

[1]: https://clang.llvm.org/doxygen/LangOptions_8h_source.html#l00400
[2]: https://clang.llvm.org/doxygen/LangOptions_8h_source.html#l00483