How to disable simplifying function parameters in llvm-g++

XiaoLong_Tang · July 26, 2010, 9:48am

Hello everybody,

It seems to me that there is one kind of default optimization
executed by llvm-g++, simplifying function parameters in certain
cases. Consider the following example:

Given an iterator (in the context of C++ STL) (i.e. a class containing
a pointer to another class):

%"struct.std::_List_const_iterator<int>" = type { %"struct.std::_List_node_base"* }
(The form in the readable LLVM language)

and a function definition (declaration):

void _M_insert(iterator __position, const value_type& __x) { ... }

By issuing this command:

llvm-g++ -fno-exceptions -fno-inline -emit-llvm -c ...

The compilation substitutes "__position.0" for "__position", as shown below:

define linkonce_odr void @_ZNSt4listIiSaIiEE9_M_insertESt14_List_iteratorIiERKi(%"struct.std::list<int,std::allocator<int> >"* %this, i64 %__position.0, i32* %__x) nounwind ssp { ... }

My questions are:
Could this transformation be disabled in llvm-g++? And how to achieve this?

P.S. Without any optimizations, g++ does NOT do this.

Best,
Xiaolong

Duncan_Sands · July 26, 2010, 10:35am

Hi Xiaolong,

The compilation substitutes "__position.0" for "__position", as shown below:

define linkonce_odr void @_ZNSt4listIiSaIiEE9_M_insertESt14_List_iteratorIiERKi(%"struct.std::list<int,std::allocator<int> >"* %this, i64 %__position.0, i32* %__x) nounwind ssp { ... }

names like this only exist to make the LLVM IR more readable, and have no
effect on the final assembler. If you want to find out original parameter
names you need to use debug info.

P.S. Without any optimizations, g++ does NOT do this.

It's unclear to me what you mean here, since there are no names of this kind in
the assembly files produced by g++. Are you perhaps referring to the name g++
prints in tree dumps?

Ciao,

Duncan.

XiaoLong_Tang · July 26, 2010, 2:12pm

Thanks, Duncan.

> The compilation substitutes "__position.0" for "__position", as shown below:
>
> define linkonce_odr void @_ZNSt4listIiSaIiEE9_M_insertESt14_List_iteratorIiERKi(%"struct.std::list<int,std::allocator<int> >"* %this, i64 %__position.0, i32* %__x) nounwind ssp { ... }

names like this only exist to make the LLVM IR more readable, and have no
effect on the final assembler. If you want to find out original parameter
names you need to use debug info.

Note that the original parameter (of the function in concern) is of
type "struct.std::_List_const_iterator<int>", whereas the parameter
(after the compilation) is of type
"struct.std::_List_node_base"*. Evidently, llvm-g++ replaces the
original parameter with its sole field. This is understandable and the
LLVM output indicates this by using "__position.0" rather than
"__position". Further, llvm-g++ represents (bitcasts) the parameter
type "struct.std::_List_node_base"* as (into) i64. Though this may
be decoded by analyzing the meta data with the function, I believe
that llvm-g++ has conducted some transformations somehow. To me, the
transformation looks likes scalar replacement.

To further understand such behavior (of llvm-g++), let's image that we
augment the above type "struct.std::_List_const_iterator<int>" with
one more dummy field. As a result, llvm-g++ replaces the original
parameter with two individual parameters which are the two fields of
the original parameter, illustrated as below.

(..., i64 %__position.0, i64 %__position.1, i32 %data) nounwind ssp { ... }

> P.S. Without any optimizations, g++ does NOT do this.

It's unclear to me what you mean here, since there are no names of this kind in
the assembly files produced by g++. Are you perhaps referring to the name g++
prints in tree dumps?

Yes. My discussion only focuses on the front-end output. As to the
example in question, g++ preserves the original function parameter,
without replacing the parameter with its sole field. Is this clear
enough?

Best,
Xiaolong

Duncan_Sands · July 26, 2010, 2:31pm

Hi Xiaolong,

Note that the original parameter (of the function in concern) is of
type "struct.std::_List_const_iterator<int>", whereas the parameter
(after the compilation) is of type
"struct.std::_List_node_base"*. Evidently, llvm-g++ replaces the
original parameter with its sole field. This is understandable and the
LLVM output indicates this by using "__position.0" rather than
"__position". Further, llvm-g++ represents (bitcasts) the parameter
type "struct.std::_List_node_base"* as (into) i64. Though this may
be decoded by analyzing the meta data with the function, I believe
that llvm-g++ has conducted some transformations somehow. To me, the
transformation looks likes scalar replacement.

I think on the contrary this is llvm-g++ trying to obtain ABI conformance.
The rules on how parameters should be passed to functions (in registers, on
the stack, partly in registers, partly on the stack) can be quite complicated.
Rather than pushing this complexity into LLVM, front-ends are required to take
care of ensuring ABI conformance when they generate the LLVM IR. The kind of
transform you see looks typical of llvm-g++ trying to handle an ABI rule which
says that initial fields of a struct should be passed in registers. In short,
I don't think this is LLVM doing an optimization, it is LLVM trying to produce
correct ABI conformant code.

P.S. Without any optimizations, g++ does NOT do this.

It's unclear to me what you mean here, since there are no names of this kind in
the assembly files produced by g++. Are you perhaps referring to the name g++
prints in tree dumps?

Yes. My discussion only focuses on the front-end output. As to the
example in question, g++ preserves the original function parameter,
without replacing the parameter with its sole field. Is this clear
enough?

I think you will find that this is not really the case, it is just that g++ does
this when transforming gimple to RTL (llvm-g++ does it when transforming gimple
to LLVM IR, which is in fact quite analogous). If you look at the final
assembler generated by g++ it should be equivalent to what llvm-g++ produces.

Ciao,

Duncan.

XiaoLong_Tang · July 26, 2010, 4:40pm

Hi Duncan,

> Note that the original parameter (of the function in concern) is of
> type "struct.std::_List_const_iterator<int>", whereas the parameter
> (after the compilation) is of type
> "struct.std::_List_node_base"*. Evidently, llvm-g++ replaces the
> original parameter with its sole field. This is understandable and the
> LLVM output indicates this by using "__position.0" rather than
> "__position". Further, llvm-g++ represents (bitcasts) the parameter
> type "struct.std::_List_node_base"* as (into) i64. Though this may
> be decoded by analyzing the meta data with the function, I believe
> that llvm-g++ has conducted some transformations somehow. To me, the
> transformation looks likes scalar replacement.

I think on the contrary this is llvm-g++ trying to obtain ABI conformance.
The rules on how parameters should be passed to functions (in registers, on
the stack, partly in registers, partly on the stack) can be quite complicated.
Rather than pushing this complexity into LLVM, front-ends are required to take
care of ensuring ABI conformance when they generate the LLVM IR. The kind of
transform you see looks typical of llvm-g++ trying to handle an ABI rule which
says that initial fields of a struct should be passed in registers. In short,
I don't think this is LLVM doing an optimization, it is LLVM trying to produce
correct ABI conformant code.

As far as you know, is there any way to figure out the original type
of the function parameter? As to the example in concern, is it
possible to find out that the function parameter has the original type
"struct.std::_List_node_base"*, maybe from i64. The debug information
(arising from -g) seems to contain no such information.

Thanks,
Xiaolong

XiaoLong_Tang · July 26, 2010, 8:35pm

Hi Duncan,

> Note that the original parameter (of the function in concern) is of
> type "struct.std::_List_const_iterator<int>", whereas the parameter
> (after the compilation) is of type
> "struct.std::_List_node_base"*. Evidently, llvm-g++ replaces the
> original parameter with its sole field. This is understandable and the
> LLVM output indicates this by using "__position.0" rather than
> "__position". Further, llvm-g++ represents (bitcasts) the parameter
> type "struct.std::_List_node_base"* as (into) i64. Though this may
> be decoded by analyzing the meta data with the function, I believe
> that llvm-g++ has conducted some transformations somehow. To me, the
> transformation looks likes scalar replacement.

I think on the contrary this is llvm-g++ trying to obtain ABI conformance.
The rules on how parameters should be passed to functions (in registers, on
the stack, partly in registers, partly on the stack) can be quite complicated.
Rather than pushing this complexity into LLVM, front-ends are required to take
care of ensuring ABI conformance when they generate the LLVM IR. The kind of
transform you see looks typical of llvm-g++ trying to handle an ABI rule which
says that initial fields of a struct should be passed in registers. In short,
I don't think this is LLVM doing an optimization, it is LLVM trying to produce
correct ABI conformant code.

I checked the result spit out by Clang (I have not found good reasons
to switch to Clang until today :). It turned out that Clang preserves
more accurate type information than llvm-g++, at least in my test
cases. Back to the example in our discussion, the compiled function
prototype is:

define linkonce_odr void @_ZNSt4listIiSaIiEE9_M_insertESt14_List_iteratorIiERKi(%"class.std::list"* %this, %"struct.std::_List_node_base"* %__position.coerce, i32* %__x) ssp align 2 { ... }

Compared with llvm-g++, the second parameter is of type
%"struct.std::_List_node_base"*, rather than a plain i64. Note that
the fact does not impair your argument on the conformance to
ABI. llvm-g++ may do better, however.

Best,
Xiaolong

Duncan_Sands · July 27, 2010, 6:55am

Hi Xiaolong,

I checked the result spit out by Clang (I have not found good reasons
to switch to Clang until today :). It turned out that Clang preserves
more accurate type information than llvm-g++, at least in my test
cases.Back to the example in our discussion, the compiled function
prototype is:

define linkonce_odr void @_ZNSt4listIiSaIiEE9_M_insertESt14_List_iteratorIiERKi(%"class.std::list"* %this, %"struct.std::_List_node_base"* %__position.coerce, i32* %__x) ssp align 2 { ... }

Compared with llvm-g++, the second parameter is of type
%"struct.std::_List_node_base"*, rather than a plain i64. Note that
the fact does not impair your argument on the conformance to
ABI. llvm-g++ may do better, however.

since the final assembler is the same, both of these approaches are
correct. However the code produced by clang is likely to be more
easily optimizable.

Ciao,

Duncan.

Topic		Replies	Views
How to generate a .ll file with functions' parameter names LLVM Dev List Archives	8	150	December 9, 2019
missed optimizations LLVM Dev List Archives	8	120	September 16, 2008
How to keep parameter's name when create the bc LLVM Dev List Archives	1	91	December 1, 2020
Parameter names in IR and debug info LLVM Dev List Archives	4	126	February 21, 2015
RFC: [DebugInfo] Improving Debug Information in LLVM to Recover Optimized-out Function Parameters LLVM Dev List Archives	16	171	March 18, 2019

How to disable simplifying function parameters in llvm-g++

Related topics