IR with no optimization

Hi all,

I’m compiling linux kernel with clang. I want to generate IR with no optimization. However, kernel can only be compile with -O2 instead of -O0.

Here is the source code snippet:

struct zone *next_zone(struct zone *zone)

{ pg_data_t *pgdat = zone->zone_pgdat;

}

I want to know there is an assignment from “zone” to “pgdat”. I’m trying to iterate “store” instructions in IR.

When I compile with -O2, I have the following IR:

define %struct.zone* @next_zone(%struct.zone* readonly %zone) #0 !dbg !214 {

call void @llvm.dbg.value(metadata %struct.zone* %zone, i64 0, metadata !218, metadata !305), !dbg !326

%1 = getelementptr inbounds %struct.zone, %struct.zone* %zone, i64 0, i32 5, !dbg !327

%2 = load %struct.pglist_data*, %struct.pglist_data** %1, align 8, !dbg !327

call void @llvm.dbg.value(metadata %struct.pglist_data* %2, i64 0, metadata !219, metadata !305), !dbg !328 }

Store instruction has been optimized, and no variable name in IR.

When I comile with -O0, I have the following IR:

define %struct.zone* @next_zone(%struct.zone* %zone) #0 !dbg !211 {

%1 = alloca %struct.zone*, align 8

%pgdat = alloca %struct.pglist_data*, align 8

store %struct.zone* %zone, %struct.zone** %1, align 8

call void @llvm.dbg.declare(metadata %struct.zone** %1, metadata !297, metadata !265), !dbg !298

call void @llvm.dbg.declare(metadata %struct.pglist_data** %pgdat, metadata !299, metadata !265), !dbg !302

%2 = load %struct.zone*, %struct.zone** %1, align 8, !dbg !303

%3 = getelementptr inbounds %struct.zone, %struct.zone* %2, i32 0, i32 5, !dbg !304

%4 = load %struct.pglist_data*, %struct.pglist_data** %3, align 8, !dbg !304

store %struct.pglist_data* %4, %struct.pglist_data** %pgdat, align 8, !dbg !302

There is store instruction. I know there is an assignment. From this store, I backward traverse until I find variable.
For example, I go through %4->%3->%2->%1->struct.zone. I have variable name pgdat in IR as well.

Since kernel can only be compiled with -O2, IR has been optimized a lot.
Is there any way I can know the variable name and there is an assignment from “zone” to “pgdat”?

Thank you!

If you’re trying to do source level analysis (questions like “is there an assignment of a variable of this name”) it may be better to work up in Clang than down in LLVM - LLVM has no guarantees about names (indeed names on instructions are a compiler debugging feature, not a feature that should be used by any optimization, analysis, etc) or preservation of things like loads/stores.

You could dump the unoptimized IR, if you’re just trying to do some static analysis rather than an optimization - doesn’t matter to you, probably, if it doesn’t produce a valid kernel in the end.

Since kernel can only be compiled with -O2, IR has been optimized a lot.

The former is mostly irrelevant unless they use the optimizer-specific
preprocessor flags. Initial IR optimisations can be disabled with
-disable-llvm-optzns.

Is there any way I can know the variable name and there is an assignment
from "zone" to "pgdat"?

Variable names in IR are syntactical sugar. They serve no real purpose
except making it more human readable. What you want should be derived
from debug meta data. On the same line, assignment is a C language
concept that doesn't really map well to IR -- where subexpressions can
be indistinguishable from assignment.

Joerg

Thanks for the reply.

Yes I’m doing static analysis. I’m trying to do points-to analysis actually. I care about whether pointer values point to the same memory location. I’m not sure if this is better to be done by Clang or LLVM?

How to dump the unoptimized IR? By compiling with -O0?
Thank you.

Thanks for the reply.

I’m trying to do points-to analysis actually. I care about whether pointer values point to the same memory location.
I’m not sure if this is better to be done by Clang or LLVM?

I guess I can get variable name “zone” and "pgdat "from debug meta data?
But how can I know “zone” and “pgdat” point to same memory location?
Thank you.

Thanks for the reply.

Yes I'm doing static analysis. I'm trying to do points-to analysis
actually. I care about whether pointer values point to the same memory
location. I'm not sure if this is better to be done by Clang or LLVM?

That depends on what interface your analysis is going to expose. If the
users of your pointer analysis are other IR passes, LLVM IR sounds a better
place and you don't need to care about variable names or assignments.

Are you doing this to enable optimizations, or to detect bugs? If the former, dealing with LLVM IR is probably your best bet, and you may find this to be helpful: http://llvm.org/docs/AliasAnalysis.html. If the latter, you probably want to hook into clang’s StaticAnalyzer. I’m not familiar with whether or not you’d be able to do AA with it, but AFAIK it retains the full C/C++ AST (LLVM IR doesn’t), so your diagnostics would likely be a lot better with that. :slight_smile:

What George said.

If you are trying to do pointer analysis in terms of original program variables (IE to issue error messages later or whatever), you should do it at the clang level.

If you are trying to do it for optimization, you don’t need original program variables and they don’t help you :slight_smile:
You should do it at the LLVM IR level.

CFL-AA is one such pointer analysis already implemented in LLVM. There are other implementations of things like andersen’s, etc you can find.

Thank you all !
I’m trying to do pointer analysis in terms of original program variables.

I’m planning to look into Clang.