vmkit variables internal representation

Hello everyone !

I am quite new to java under LLVM. I have the following code in Java:

class MYGL {
public static int P;
public static int balance;
}

int Q;
MYGL.P=5984;
Q=4597;
MYGL.balance=Q+6094;

For the local variable Q, it seems that the compiler is optimizing and considering store i32 10691, i32* … (into balance). Do you know how I can compile the code for preserving Q?

I have the commands:
javac -Xlint -g -O Main.java
…vmjc Main
…j3 Main
…llvm-dis < Main.bc > Main_assembly

For the global variables, I have :
P = ({ i32, i32 }* @MYGL_static, i32 0, i32 1)
balance = ({ i32, i32 }* @MYGL_static, i32 0, i32 0)

Ok, in the bytecode there is no string “P”, “balance” etc. I assume they are preserved in internal globals. The most interesting looks:

@17 = internal global [82 x i32] [i32 0, i32 1179694, i32 47, i32 131118, i32 48, i32 131121, i32 3276851, i32 3276852, i32 53, i32 3538999, i32 56, i32 655406, i32 57, i32 655418, i32 524347, i32 655420, i32 3997758, i32 63, i32 64, i32 87, i32 114, i32 118, i32 33, i32 139, i32 148, i32 154, i32 161, i32 179, i32 200, i32 207, i32 216, i32 223, i32 248, i32 257, i32 282, i32 286, i32 292, i32 316, i32 323, i32 345, i32 47, i32 53, i32 367, i32 65, i32 383, i32 396, i32 1507352, i32 413, i32 436, i32 4325443, i32 68, i32 4522004, i32 4587540, i32 466, i32 71, i32 4718665, i32 496, i32 522, i32 4849739, i32 4980813, i32 5111885, i32 79, i32 5242961, i32 581, i32 588, i32 607, i32 629, i32 637, i32 660, i32 667, i32 671, i32 681, i32 700, i32 706, i32 730, i32 739, i32 787, i32 800, i32 823, i32 834, i32 856, i32 866]

I was trying to look into an ASCII table to see if “P”, “Q” or “balance” are kept, but I cannot find. Do you know where are kept the references to the variable names?

Thank you for your help !

Hi Alexandru,

For the local variable Q, it seems that the compiler is optimizing and
considering store i32 10691, i32* .... (into balance). Do you know how I can
compile the code for preserving Q?

You've probably got to convince the compiler not to optimise since
eliminating those variables is probably one of the simpler things that
goes on during optimisation. I'd expect the unoptimised code to have
"alloca" instructions inside the function which represent those
variables.

I'm afraid I don't know the javac command-line option to do that
though. Your questions may be better answered on a Java list dealing
with that compiler.

Ok, in the bytecode there is no string "P", "balance" etc. I assume they are
preserved in internal globals.

Could be. Doesn't java use UTF-16 for its strings? If so, I'd be
looking for arrays of i16 for my names. Assuming they're there at all,
of course (depends on Java API, ABI and possibly optimisations LLVM
was able to perform).

If you post a full .ll file we may be able to say more. Or someone may
be along who knows the Java LLVM compiler off the top of their head.

Cheers.

Tim.

Hi Tim,

Thank you for your answer. Tomorrow morning I will update my question after further investigation based on your advice and with the .ll.

Hello Tim,

I attached the assembly file, the java file and the running script file. I hope the variable names can be identified.
Thanks

Main.bc (11.9 KB)

Main.java (1.9 KB)

Main_assembly (57 KB)

run.sh (516 Bytes)

I attached the assembly file, the java file and the running script file. I
hope the variable names can be identified.

Well, I'm afraid I know no more about the Java ABI than you, but Java
strings can be identified by the type {i32, [N x i16]} where the first
entry is the length and the array is UTF-16. Running the attached
hacked-together script on your bitcode file gives the following:

@41 contains string "MYGL"
@42 contains string "P"
@43 contains string "balance"

How these fit into the structures defined used is left as an exercise
to the interested reader. :wink:

Tim.

tmp.py (672 Bytes)

Yes ! Thank you a lot :slight_smile:

It seems vmkit bytecode is quite close to the classic LLVM, but still adaptations to the passes have to be done

Hi Alexandru,

The Java bytecode does not give and use the name of the local variables. We can find them in an attribute used for debugging, but currently, vmkit do not use this attribute. It means that the llvm bitcode that vmjc emit does not contain these names. Finding local variables names could be useful, but we don’t plane to implement this feature for the moment… If you are interested, we can help you to implement this feature, it should not take too many time,

Gaël

PS: also, even if you can use the java names for locals, you will have to disable some of the compilation passes because they could promote these variables into machine registers.

Hi Gaël,

Thank you for clearing the problem ! If I will have time in future, I will contact you to try to implement that. However, how can I identify the local variables in the internal representation? I identified the global variables (and Tim showed me how to get their names), but how can I just identify the local variables (not by name, but by structure - internal vmkit representation)? In the example that I gave, I assume that javac is optimizing the code, so “Q” is lost, not even replaced by a constant. However, that constant is used in other computations, but not kept somewhere?

Thanks!
Alexandru