How to disable simplifying function parameters in llvm-g++

Hello everybody,

It seems to me that there is one kind of default optimization
executed by llvm-g++, simplifying function parameters in certain
cases. Consider the following example:

Given an iterator (in the context of C++ STL) (i.e. a class containing
a pointer to another class):

  %"struct.std::_List_const_iterator<int>" = type { %"struct.std::_List_node_base"* }
  (The form in the readable LLVM language)

and a function definition (declaration):

  void _M_insert(iterator __position, const value_type& __x) { ... }

By issuing this command:

  llvm-g++ -fno-exceptions -fno-inline -emit-llvm -c ...

The compilation substitutes "__position.0" for "__position", as shown below:

  define linkonce_odr void @_ZNSt4listIiSaIiEE9_M_insertESt14_List_iteratorIiERKi(%"struct.std::list<int,std::allocator<int> >"* %this, i64 %__position.0, i32* %__x) nounwind ssp { ... }

My questions are:
Could this transformation be disabled in llvm-g++? And how to achieve this?

P.S. Without any optimizations, g++ does NOT do this.

Best,
Xiaolong

Hi Xiaolong,

The compilation substitutes "__position.0" for "__position", as shown below:

   define linkonce_odr void @_ZNSt4listIiSaIiEE9_M_insertESt14_List_iteratorIiERKi(%"struct.std::list<int,std::allocator<int> >"* %this, i64 %__position.0, i32* %__x) nounwind ssp { ... }

names like this only exist to make the LLVM IR more readable, and have no
effect on the final assembler. If you want to find out original parameter
names you need to use debug info.

P.S. Without any optimizations, g++ does NOT do this.

It's unclear to me what you mean here, since there are no names of this kind in
the assembly files produced by g++. Are you perhaps referring to the name g++
prints in tree dumps?

Ciao,

Duncan.

Thanks, Duncan.

> The compilation substitutes "__position.0" for "__position", as shown below:
>
> define linkonce_odr void @_ZNSt4listIiSaIiEE9_M_insertESt14_List_iteratorIiERKi(%"struct.std::list<int,std::allocator<int> >"* %this, i64 %__position.0, i32* %__x) nounwind ssp { ... }

names like this only exist to make the LLVM IR more readable, and have no
effect on the final assembler. If you want to find out original parameter
names you need to use debug info.

Note that the original parameter (of the function in concern) is of
type "struct.std::_List_const_iterator<int>", whereas the parameter
(after the compilation) is of type
"struct.std::_List_node_base"*. Evidently, llvm-g++ replaces the
original parameter with its sole field. This is understandable and the
LLVM output indicates this by using "__position.0" rather than
"__position". Further, llvm-g++ represents (bitcasts) the parameter
type "struct.std::_List_node_base"* as (into) i64. Though this may
be decoded by analyzing the meta data with the function, I believe
that llvm-g++ has conducted some transformations somehow. To me, the
transformation looks likes scalar replacement.

To further understand such behavior (of llvm-g++), let's image that we
augment the above type "struct.std::_List_const_iterator<int>" with
one more dummy field. As a result, llvm-g++ replaces the original
parameter with two individual parameters which are the two fields of
the original parameter, illustrated as below.

(..., i64 %__position.0, i64 %__position.1, i32 %data) nounwind ssp { ... }

> P.S. Without any optimizations, g++ does NOT do this.

It's unclear to me what you mean here, since there are no names of this kind in
the assembly files produced by g++. Are you perhaps referring to the name g++
prints in tree dumps?

Yes. My discussion only focuses on the front-end output. As to the
example in question, g++ preserves the original function parameter,
without replacing the parameter with its sole field. Is this clear
enough?

Best,
Xiaolong

Hi Xiaolong,

Note that the original parameter (of the function in concern) is of
type "struct.std::_List_const_iterator<int>", whereas the parameter
(after the compilation) is of type
"struct.std::_List_node_base"*. Evidently, llvm-g++ replaces the
original parameter with its sole field. This is understandable and the
LLVM output indicates this by using "__position.0" rather than
"__position". Further, llvm-g++ represents (bitcasts) the parameter
type "struct.std::_List_node_base"* as (into) i64. Though this may
be decoded by analyzing the meta data with the function, I believe
that llvm-g++ has conducted some transformations somehow. To me, the
transformation looks likes scalar replacement.

I think on the contrary this is llvm-g++ trying to obtain ABI conformance.
The rules on how parameters should be passed to functions (in registers, on
the stack, partly in registers, partly on the stack) can be quite complicated.
Rather than pushing this complexity into LLVM, front-ends are required to take
care of ensuring ABI conformance when they generate the LLVM IR. The kind of
transform you see looks typical of llvm-g++ trying to handle an ABI rule which
says that initial fields of a struct should be passed in registers. In short,
I don't think this is LLVM doing an optimization, it is LLVM trying to produce
correct ABI conformant code.

P.S. Without any optimizations, g++ does NOT do this.

It's unclear to me what you mean here, since there are no names of this kind in
the assembly files produced by g++. Are you perhaps referring to the name g++
prints in tree dumps?

Yes. My discussion only focuses on the front-end output. As to the
example in question, g++ preserves the original function parameter,
without replacing the parameter with its sole field. Is this clear
enough?

I think you will find that this is not really the case, it is just that g++ does
this when transforming gimple to RTL (llvm-g++ does it when transforming gimple
to LLVM IR, which is in fact quite analogous). If you look at the final
assembler generated by g++ it should be equivalent to what llvm-g++ produces.

Ciao,

Duncan.

Hi Duncan,

> Note that the original parameter (of the function in concern) is of
> type "struct.std::_List_const_iterator<int>", whereas the parameter
> (after the compilation) is of type
> "struct.std::_List_node_base"*. Evidently, llvm-g++ replaces the
> original parameter with its sole field. This is understandable and the
> LLVM output indicates this by using "__position.0" rather than
> "__position". Further, llvm-g++ represents (bitcasts) the parameter
> type "struct.std::_List_node_base"* as (into) i64. Though this may
> be decoded by analyzing the meta data with the function, I believe
> that llvm-g++ has conducted some transformations somehow. To me, the
> transformation looks likes scalar replacement.

I think on the contrary this is llvm-g++ trying to obtain ABI conformance.
The rules on how parameters should be passed to functions (in registers, on
the stack, partly in registers, partly on the stack) can be quite complicated.
Rather than pushing this complexity into LLVM, front-ends are required to take
care of ensuring ABI conformance when they generate the LLVM IR. The kind of
transform you see looks typical of llvm-g++ trying to handle an ABI rule which
says that initial fields of a struct should be passed in registers. In short,
I don't think this is LLVM doing an optimization, it is LLVM trying to produce
correct ABI conformant code.

As far as you know, is there any way to figure out the original type
of the function parameter? As to the example in concern, is it
possible to find out that the function parameter has the original type
"struct.std::_List_node_base"*, maybe from i64. The debug information
(arising from -g) seems to contain no such information.

Thanks,
Xiaolong

Hi Duncan,

> Note that the original parameter (of the function in concern) is of
> type "struct.std::_List_const_iterator<int>", whereas the parameter
> (after the compilation) is of type
> "struct.std::_List_node_base"*. Evidently, llvm-g++ replaces the
> original parameter with its sole field. This is understandable and the
> LLVM output indicates this by using "__position.0" rather than
> "__position". Further, llvm-g++ represents (bitcasts) the parameter
> type "struct.std::_List_node_base"* as (into) i64. Though this may
> be decoded by analyzing the meta data with the function, I believe
> that llvm-g++ has conducted some transformations somehow. To me, the
> transformation looks likes scalar replacement.

I think on the contrary this is llvm-g++ trying to obtain ABI conformance.
The rules on how parameters should be passed to functions (in registers, on
the stack, partly in registers, partly on the stack) can be quite complicated.
Rather than pushing this complexity into LLVM, front-ends are required to take
care of ensuring ABI conformance when they generate the LLVM IR. The kind of
transform you see looks typical of llvm-g++ trying to handle an ABI rule which
says that initial fields of a struct should be passed in registers. In short,
I don't think this is LLVM doing an optimization, it is LLVM trying to produce
correct ABI conformant code.

I checked the result spit out by Clang (I have not found good reasons
to switch to Clang until today :). It turned out that Clang preserves
more accurate type information than llvm-g++, at least in my test
cases. Back to the example in our discussion, the compiled function
prototype is:

  define linkonce_odr void @_ZNSt4listIiSaIiEE9_M_insertESt14_List_iteratorIiERKi(%"class.std::list"* %this, %"struct.std::_List_node_base"* %__position.coerce, i32* %__x) ssp align 2 { ... }

Compared with llvm-g++, the second parameter is of type
%"struct.std::_List_node_base"*, rather than a plain i64. Note that
the fact does not impair your argument on the conformance to
ABI. llvm-g++ may do better, however. :slight_smile:

Best,
Xiaolong

Hi Xiaolong,

I checked the result spit out by Clang (I have not found good reasons
to switch to Clang until today :). It turned out that Clang preserves
more accurate type information than llvm-g++, at least in my test
cases.Back to the example in our discussion, the compiled function
prototype is:

   define linkonce_odr void @_ZNSt4listIiSaIiEE9_M_insertESt14_List_iteratorIiERKi(%"class.std::list"* %this, %"struct.std::_List_node_base"* %__position.coerce, i32* %__x) ssp align 2 { ... }

Compared with llvm-g++, the second parameter is of type
%"struct.std::_List_node_base"*, rather than a plain i64. Note that
the fact does not impair your argument on the conformance to
ABI. llvm-g++ may do better, however. :slight_smile:

since the final assembler is the same, both of these approaches are
correct. However the code produced by clang is likely to be more
easily optimizable.

Ciao,

Duncan.