Structure transformations (splitting/peeling, field reordering) in LLVM

Hello!
I want to implement data layout optimizations, such as structure splitting/peeling, field reordering, etc. I know that such optimizations have not been implemented in LLVM (despite several research works in this field). But maybe somebody knows, whether at least transformations themselves (I mean splitting/peeling/reordering) are implemented in LLVM? I feel that some time ago I saw somewhere these transformations were implemented, maybe in some experimental part of LLVM, but now I can’t find it despite my best efforts.

Hi,

AFAIK, there is SROA pass in LLVM to split structure. But I don’t remember there are pass to reorder fields.

Thanks,
Chuanqi

1 Like

FWIW, SROA can do alloca splitting, and GlobalOpt can to global
constant splitting.
The problem with doing that on a larger scale, i.e. on the whole structure,
is basically identical to the problem of Struct-of-Arrays <-> Array-of-Structs,
i.e. you need to be sure to really get every single instance, i.e. you
basically need LTO.

That being said i'm currently tracking towards allowing SROA in presence of
pointer escapes through calls, and, more interestingly, performing
alloca splitting
in presence of variable GEP's.

Roman

1 Like

@ChuanqiXu @LebedevRI Thanks you a lot for the replies!

However, I looked into SROA and (maybe I’m wrong) seems it is not what we need. So, we are trying now to implement the aforementioned passes from scratch.
From this, I have a question. I’m thinking about whether we really need to implement the pass as an LLVM pass or it’s worth to do it on front-end level, through Clang transformation pass?

LLVM pass would be more universal, of course and would be more logically.
But I faced with a problem with passing arguments into functions.

For example, I want to peel a structure

struct A {
    double x;                                                                                         
    int y;                                                                                            
};

int f(struct A a) {                                                                                   
    return a.x + a.y;                                                                                 
}

into

struct A1 {
    double x;
};

struct A2 {
    int y;
};

int f(struct A1 a1, struct A2 a2) {
    return a1.x + a2.y;
}

To do this, I need among others to process all function and function calls to which we pass the original structure. And now we have a problem described here

1 and also here 2, 3, 4

E.g. on my machine, two snippets above are compiled (with -O0) into

%struct.A = type { double, i32 }

; Function Attrs: noinline nounwind optnone ssp uwtable
define i32 @f([2 x i64] %0) #0 {
  %2 = alloca %struct.A, align 8
  %3 = bitcast %struct.A* %2 to [2 x i64]*
  store [2 x i64] %0, [2 x i64]* %3, align 8
  %4 = getelementptr inbounds %struct.A, %struct.A* %2, i32 0, i32 0
  %5 = load double, double* %4, align 8
  %6 = getelementptr inbounds %struct.A, %struct.A* %2, i32 0, i32 1
  %7 = load i32, i32* %6, align 8
  %8 = sitofp i32 %7 to double
  %9 = fadd double %5, %8
  %10 = fptosi double %9 to i32
  ret i32 %10
}

and

%struct.A1 = type { double }
%struct.A2 = type { i32 }

; Function Attrs: noinline nounwind optnone ssp uwtable
define i32 @f([1 x double] %0, i64 %1) #0 {
  %3 = alloca %struct.A1, align 8
  %4 = alloca %struct.A2, align 4
  %5 = getelementptr inbounds %struct.A1, %struct.A1* %3, i32 0, i32 0
  %6 = bitcast double* %5 to [1 x double]*
  store [1 x double] %0, [1 x double]* %6, align 8
  %7 = getelementptr inbounds %struct.A2, %struct.A2* %4, i32 0, i32 0
  %8 = trunc i64 %1 to i32
  store i32 %8, i32* %7, align 4
  %9 = getelementptr inbounds %struct.A1, %struct.A1* %3, i32 0, i32 0
  %10 = load double, double* %9, align 8
  %11 = getelementptr inbounds %struct.A2, %struct.A2* %4, i32 0, i32 0
  %12 = load i32, i32* %11, align 4
  %13 = sitofp i32 %12 to double
  %14 = fadd double %10, %13
  %15 = fptosi double %14 to i32
  ret i32 %15
}

And as for me, it seems extremely time-consuming to implement such transformation in LLVM level. Because we have to do:

  1. Get the original parameter type. That’s possible, e.g. through the metadata, or through the analysis of the sequences of alloca-bitcast-store.
  2. Guarantee each ABI support of LLVM pass. Here we need to implement the full set of schemes of function parameter passing for every architecture’s ABI, for every calling convention. And that’s really bad news for us (due to limited human resources).

And I thought, maybe it’s really not bad idea to implement such transformations on front-end level, using Clang plugin, for example.

If somebody have an opinion about that, I’d be very appreciate.

What is an “Clang transformation pass”? AFAIC that is not a thing.

<…>

Yes, if you want to deal with a more complex case than a local non-escaping alloca,
then it will be complicated. It’s not really a bug, but a feature.

As far as i’m concerned that thing is for writing refactoring utils,
not performing optimizations during compilation.

1 Like

If you’re specifically interested in optimizing argument passing, take a look at ArgumentPromotion, which does similar transformations.

1 Like

I mean just source-to-source or AST-to-AST transformation, implemented as a library (Clang plugin)

Yeah, I see… But I’m looking for the short cut to achieve the goal.

Aha, already got it, Transformer is not a thing needed. However, maybe there are other useful things in Clang’s toolbox…


@efriedma-quic Thanks! Already looking into it.