VMKit as Java frontend: fix for "Should have found a JavaMethod"

Hello,

I want to use VMKit as a Java "middle-frontend" for LLVM (Java source -> native executable) AOT compilation. (More broadly: I want *any* Java frontend for LLVM; VMKit seems like the best option). As a first olive branch to see if there's any interest in the topic of VMKit for Java AOT, here's my first "fix" for the "Should have found a JavaMethod" problem reported on this list earlier (08 Mar 2014), for which I didn't see any previous resolution -- comments or redirection kindly requested.

BACKGROUND:
VMKit doesn't work (for me) for this application out of the box, and I'd like to fix it. I have made some progress and have some remaining problems with it; I'm most fundamentally interested in the following:
* am I doing something very wrong?
* is anyone else interested in VMKit for Java AOT (Java source -> native executable)?
* is anyone willing to help me fix it? :slight_smile:

I've made some tweaks to VMKit sources and build files, and I can now build a native object from Java source using `javac` and VMKit. I'm now having trouble with (dreaded) linking and/or runtime errors, and I'm not sure how to make more progress.

Here's detail on the first fix for "Should have found a JavaMethod" as a first pay-it-forward.

SCENARIO:
I `configure`d VMKit for x86_64 using LLVM3.3 and OpenJDK1.6.0, from the current VMKit repository. I'm using the example at vmkit/tools/trainer/HelloWorld.java and doing the following to produce native assembly:
  javac HelloWorld.java
  /path/to/vmkit/Debug+Asserts/bin/vmjc -print-aot-stats HelloWorld.class
  /path/to/llvm33/bin/llc -load=/path/to/vmkit/Debug+Asserts/lib/static-gc-printer.so HelloWorld.class.bc

The above `llc` command fails (for me) with code directly from the repository. ("Should have found a JavaMethod"; Dave Brazdil reported the same problem on 08 Mar 2014; see http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-March/070995.html for more). Please tell me if I've got something botched already.

"FIX":
I figured out what's happening and a workaround (though I don't know exactly *why* as such).

The essence of the problem is that the VmkitGCPrinter.cpp::FindMetadata() is trying to locate a symbol (one that matches "JnJVM_HelloWorld_main___3Ljava_lang_String_2") and fails, because that function is doing some tokenizing/parsing/mangling of symbol names that, for some reason, isn't compatible with the name-mangling of the corresponding code generator in `vmjc` (or something like that). Essentially, FindMetadata() is expecting two double-underscores in the symbol; it wants to find "main___3Ljava_lang_String_2" (which exists) but it's looking for "main_" (which is incorrect).

So, I made a change in static-gc-printer.so (vmkit/lib/static-gc-printer/VmkitGCPrinter.cpp) to make sure that the string has two double-underscores, which is a heuristic that works for all of the situations I can find. I just put an if() statement: in VmkitGCPrinter.cpp line 243 around the last substr() call,
    if (methodName.rfind("__") - methodName.find("__") > 2) {
        methodName = methodName.substr(0, methodName.rfind("__"));
    }

This hack^H^H^H^Hchange works for anything I've tried (but, clearly, this doesn't make it "right".). It's not clear to me why the name-mangling isn't correct, but this modification is clearly not "good," even aside from the fact that it's inefficient.

Any ideas as to this apparent incompatibility or why it's happening? Perhaps it's `vmjc` that's not mangling the symbol name properly, or perhaps there's some other library that I should be using (instead of static-gc-printer.so), or perhaps it's just stale code. This *does* seem to work out of the box using `j3`, but I haven't looked into this. More detail after the fold.

Thanks,
Brian

MORE DETAIL:

`vmjc` creates LLVM bitcode from a Java class. FindMetadata (rolled into `llc` via "-load=...") seems to be locating and associating symbols "linker-style" separators between the generic bootstrap code and the bitcode from `vmjc`. FindMetadata looks for some separators ("__" and "_") for each and trims the symbol at certain points. It performs some symbol-mangling on the candidate symbols, and then compares these strings to each other. It is trying to take something like the format of
    JnJVM_(CLASSNAME)_(METHODNAME)__(MANGLEDARGLIST)__(OTHERSTUFF)
and associate this with something like the format
    (METHODNAME)__(MANGLEDARGLIST)

It constructs the second string by the following sequence, basically:
* finds the first double-underscore;
* finds the previous underscore, and begins the string there;
* finds the final double-underscore, then trims off everything after (and including) that final "__".
The remainder is the symbol name, which is compared to the candidate.

This sequence will fail in certain cases, including:
* if METHODNAME contains an underscore;
* if it doesn't have the second double-underscore adornment "__(OTHERSTUFF)".

The latter is the problem in the HelloWorld.java case for the "main()" function.

For example, when processing vmkit/tools/trainer/Debug+Asserts/BootstrapClasses-gc.bc, it seeks a symbol to match
    JnJVM_java_util_ArrayList_add__Ljava_lang_Object_2__java_util_ArrayList_Customized
It does the trimming, yielding
    add__Ljava_lang_Object_2
Then searches each symbol in the bitcode.

In the HelloWorld.java case, it seeks the symbol to match
    JnJVM_HelloWorld_main___3Ljava_lang_String_2
It does the trimming, yielding
    main_
, which is wrong. There DOES exist a function in the bitcode as
    main___3Ljava_lang_String_2
So I added the heuristic to make sure there's at least two double-underscores, which is probably not harmful based on the symbol-mangling rules.

Hi Brian,

I can help you, but I’m on holidays and I can not work on (or even see) the code before next week :slight_smile: As soon as I can, I take a look,

See you,
Gaël

As

Hello,

I want to use VMKit as a Java “middle-frontend” for LLVM (Java source → native executable) AOT compilation. (More broadly: I want any Java frontend for LLVM; VMKit seems like the best option). As a first olive branch to see if there’s any interest in the topic of VMKit for Java AOT, here’s my first “fix” for the “Should have found a JavaMethod” problem reported on this list earlier (08 Mar 2014), for which I didn’t see any previous resolution – comments or redirection kindly requested.

BACKGROUND:
VMKit doesn’t work (for me) for this application out of the box, and I’d like to fix it. I have made some progress and have some remaining problems with it; I’m most fundamentally interested in the following:

  • am I doing something very wrong?
  • is anyone else interested in VMKit for Java AOT (Java source → native executable)?
  • is anyone willing to help me fix it? :slight_smile:

I’ve made some tweaks to VMKit sources and build files, and I can now build a native object from Java source using javac and VMKit. I’m now having trouble with (dreaded) linking and/or runtime errors, and I’m not sure how to make more progress.

Here’s detail on the first fix for “Should have found a JavaMethod” as a first pay-it-forward.

SCENARIO:
I configured VMKit for x86_64 using LLVM3.3 and OpenJDK1.6.0, from the current VMKit repository. I’m using the example at vmkit/tools/trainer/HelloWorld.java and doing the following to produce native assembly:
javac HelloWorld.java
/path/to/vmkit/Debug+Asserts/bin/vmjc -print-aot-stats HelloWorld.class
/path/to/llvm33/bin/llc -load=/path/to/vmkit/Debug+Asserts/lib/static-gc-printer.so HelloWorld.class.bc

The above llc command fails (for me) with code directly from the repository. (“Should have found a JavaMethod”; Dave Brazdil reported the same problem on 08 Mar 2014; see http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-March/070995.html for more). Please tell me if I’ve got something botched already.

“FIX”:
I figured out what’s happening and a workaround (though I don’t know exactly why as such).

The essence of the problem is that the VmkitGCPrinter.cpp::FindMetadata() is trying to locate a symbol (one that matches “JnJVM_HelloWorld_main___3Ljava_lang_String_2”) and fails, because that function is doing some tokenizing/parsing/mangling of symbol names that, for some reason, isn’t compatible with the name-mangling of the corresponding code generator in vmjc (or something like that). Essentially, FindMetadata() is expecting two double-underscores in the symbol; it wants to find “main___3Ljava_lang_String_2” (which exists) but it’s looking for “main_” (which is incorrect).

So, I made a change in static-gc-printer.so (vmkit/lib/static-gc-printer/VmkitGCPrinter.cpp) to make sure that the string has two double-underscores, which is a heuristic that works for all of the situations I can find. I just put an if() statement: in VmkitGCPrinter.cpp line 243 around the last substr() call,
if (methodName.rfind("") - methodName.find("") > 2) {
methodName = methodName.substr(0, methodName.rfind("__"));
}

This hack^H^H^H^Hchange works for anything I’ve tried (but, clearly, this doesn’t make it “right”.). It’s not clear to me why the name-mangling isn’t correct, but this modification is clearly not “good,” even aside from the fact that it’s inefficient.

Any ideas as to this apparent incompatibility or why it’s happening? Perhaps it’s vmjc that’s not mangling the symbol name properly, or perhaps there’s some other library that I should be using (instead of static-gc-printer.so), or perhaps it’s just stale code. This does seem to work out of the box using j3, but I haven’t looked into this. More detail after the fold.

Thanks,
Brian

MORE DETAIL:

vmjc creates LLVM bitcode from a Java class. FindMetadata (rolled into llc via “-load=…”) seems to be locating and associating symbols “linker-style” separators between the generic bootstrap code and the bitcode from vmjc. FindMetadata looks for some separators ("" and "") for each and trims the symbol at certain points. It performs some symbol-mangling on the candidate symbols, and then compares these strings to each other. It is trying to take something like the format of
JnJVM
(CLASSNAME)_(METHODNAME)
(MANGLEDARGLIST)(OTHERSTUFF)
and associate this with something like the format
(METHODNAME)
(MANGLEDARGLIST)

It constructs the second string by the following sequence, basically:

  • finds the first double-underscore;
  • finds the previous underscore, and begins the string there;
  • finds the final double-underscore, then trims off everything after (and including) that final “__”.
    The remainder is the symbol name, which is compared to the candidate.

This sequence will fail in certain cases, including:

  • if METHODNAME contains an underscore;
  • if it doesn’t have the second double-underscore adornment “__(OTHERSTUFF)”.

The latter is the problem in the HelloWorld.java case for the “main()” function.

For example, when processing vmkit/tools/trainer/Debug+Asserts/BootstrapClasses-gc.bc, it seeks a symbol to match
JnJVM_java_util_ArrayList_add__Ljava_lang_Object_2__java_util_ArrayList_Customized
It does the trimming, yielding
add__Ljava_lang_Object_2
Then searches each symbol in the bitcode.

In the HelloWorld.java case, it seeks the symbol to match
JnJVM_HelloWorld_main___3Ljava_lang_String_2
It does the trimming, yielding
main_
, which is wrong. There DOES exist a function in the bitcode as
main___3Ljava_lang_String_2
So I added the heuristic to make sure there’s at least two double-underscores, which is probably not harmful based on the symbol-mangling rules.