clang via JNI + libclang-c

Hey.

I’m facing with very strange behaviour of clang via JNI.
I have java class and native methods call libclang-c functions.

So i have to hold CXIndex and CXTranslationUnit pointers
and hold reference in java objects to instances.

If i do everything in the same native call (create index, parse, tokenize, dispose), it’s okay.
But if i do in few native function invocations, CXIndex and CXTranslationUnit are corrupted (even if the CXIndex and CXTranslationUnit pointers are the same).

I can provide additional info and full source code if necessary.
Are there any obvious limitations or anybody with such experience?

Regards, Anton.

Yes, JNI.

JNI is a poor implementation of C code, trying to deal with the poor
definitions of Java code, at the same time as trying to be 100%
cross-platform. It fails on all three levels.

Stack corruption in JNI is surprisingly common, especially on Windows (but
also on Linux), and a few tricks must be used if you pass more than just a
char pointer. On the top of my head I remember I had to define a large
array on the stack, so that the stack pointer wouldn't segfault.

JNI is also surprisingly poor in propagating errors from the C layer to the
Java layer (if at all), so catching errors and debugging the interface is
downright nightmare.

You would hope that, after all these years, someone would come up with a
fix, or a better infrastructure for inter-operating C code with Java...

This is a long shot, but have you tried using the VMKit and sharing C and
Java code via LLVM's Execution Engine? I heard of a project that managed to
do that with Python+C, including exception handling, so it should be
possible to do that with Java.

cheers,
--renato

Well, that sounds like it will not work because of JNI?

In few words, i’m trying to invoke indexing, parsing and tokenizing from java using libclang.

// create index
native public static Index createIndex(boolean excludeDeclarationsFromPCH);

// parse translation unit from file
native public static TranslationUnit parseTranslationUnit(Index index, String filename, String[] commandLineArgs, UnsavedFile[] unsavedFiles);

// tokenize
native public static Token[] tokenize(TranslationUnit translationUnit, String filename, int filesize);

// dispose translation unit
native public static void dispose(TranslationUnit translationUnit);

// dispose index
native public static void dispose(Index index);

for this methods there is native code with according methods and java nandles invocation from java code to native:

/*

  • Class: name_antonsmirnov_clang_clang_wrapper
  • Method: createIndex
  • Signature: (Z)Lname/antonsmirnov/clang/dto/Index;
    */
    JNIEXPORT jobject JNICALL Java_name_antonsmirnov_clang_clang_1wrapper_createIndex
    (JNIEnv *env, jobject obj, jboolean excludeDeclarationsFromPCH)
    {


}

/*

  • Class: name_antonsmirnov_clang_clang_wrapper
  • Method: parseTranslationUnit
  • Signature: (Lname/antonsmirnov/clang/dto/Index;Ljava/lang/String;[Lname/antonsmirnov/clang/dto/UnsavedFile;[Ljava/lang/String;)Lname/antonsmirnov/clang/dto/TranslationUnit;
    */
    JNIEXPORT jobject Java_name_antonsmirnov_clang_clang_1wrapper_parseTranslationUnit
    (JNIEnv *env, jobject obj, jobject jindex, jstring jfilename, jobjectArray commandLineArgs, jobjectArray unsavedFiles)
    {


}

so i map classes from java to native code and vice versa.
In native methods i invoke libclang-c methods.

The problem is that CXIndex and CXTranslationUnit should be passed in the next methods so i have to keep them in memory in native code,
pass pointer to java as long and then return back with casting long as pointer.
I’m absolutely sure that pointers are packed/unpacked (long <–> pointer) correctly.

The problem is that if i do in the same invocation from java (f.e. in single Java_name_antonsmirnov_clang_clang_1wrapper_tokenize() function) - it works good.
For example if i just remember parameters and then create index, parse, tokenize (clang-c invocations) within tokenize() method it’s okay.

If i do everything in separate native methods (clang-c invocation in separate native calls from java) (as it’s designed to do) pointers are corrupted.
So unpacked CXIndex in parse() invocation is corrupted even if it was packed/unpacked correctly and the pointer is the same.

Any thoughts? Is is libclang problem, jni problem or i’m doing smth wrong?

It seems to be libclang or libclang+JNI problem as i’m experienced with JNI and i checked packing/unpacking using my concrete class.

This looks a lot like standard JNI stack corruption to me. This is the
exact same behaviour I've seen in JNI 10 years ago, and I'm not surprised
they're still there...

cheers,
--renato

i tried to hold CXIndex as static variable in native code (array of pointers to be more detailed) and pass just index to java back and then pass index to native code in order to hold variable in native code without passing.

Still the same result.
I can check it easily using clang_getTranslationUnitSpelling().
Before passing it returns filename and ater packing/unpacking (even using index) it returns nothing.

So, IIRC, it's not the stack itself (stack variables), but the return
value, that on Intel is at the bottom of the stack. So it doesn't matter
where your native objects live, if you try to return them (or a pointer to
them), and the stack itself is too small, it'd subtract too much from the
stack pointer and then wouldn't find the correct return value because the
offset would be negative, and you get corrupted pointers.

Adding a "int a[1024]" to each function fixed the problem because even
subtracting a lot, it'd never be negative when compared to the return
value, and going back to the return address was still possible. Give it a
try, at least to see if that's the problem we're looking for.

Be wary that that was 10 years ago, JNI may have bred some new bugs since
then...

cheers,
--renato

I’m not sure that it’s JNI stack problem.
Let me show you log with comments:

// invoke index() from java to native

10-11 15:35:15.463: ERROR/CLANG_DEBUG(2491): env: [0x40194ce8]
10-11 15:35:15.463: ERROR/CLANG_DEBUG(2491): createIndex();

// &index = 0xbeb06704
10-11 15:35:15.463: ERROR/CLANG_DEBUG(2491): call mapIndex [0xbeb06704]
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): found IndexClass
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): found IndexConstructor
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): found IndexPointerField

// hold CXIndex in array in [0] in native code
// return 0 to java
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): <0x6564805c> hold to index=0 (0x6564805c) → 0xbeb06704
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): env (parse): [0x40194ce8]

// invoke parse from java and pass 0 as index
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): unmapIndex()
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): <0x6564805c> get from index=0 (0x6564805c) → 0xbeb06704

// unmapped correctly as index pointer is still 0xbeb06704
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): restored index 0xbeb06704

// now invoking clang parse
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): argument filename: [./testfile.cpp]
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): args = 0
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): unsaved files = 1
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): found UnsavedFileClass
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): found UnsavedFileConstructor
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): found UnsavedFileFilenameField
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): found UnsavedFileSourceField
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): unsaved file:
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): ./testfile.cpp
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): unsaved file content:
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): { int a = 10; }
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): parsing: index=0xbeb06704 filename=./testfile.cpp args_count=0 files_count=1
file=./testfile.cpp

// oops, where is filename ? (probably CXIndex at 0xbeb06704 is corrupted for some reason)
10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): after parse filename: []
10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): mapTranslationUnit()
10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): found TranslationUnitClass
10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): found TranslationUnitConstructor
10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): found TranslationUnitPointerField

// pack translation unit and return index to java
10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): hold translationUnit to index=0 (0x656480dc) → 0xbeb066c8

// invoke tokenize() from java with tu index = 0
10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): tokenize: length = 17
10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): get tu from index=0 (0x656480dc) → 0xbeb066c8
10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): unmapped tu 0xbeb066c8
// unpakced ok (the same tu pointer 0xbeb066c8)

10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): checking unboxed…
10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): before tokenize filename: []
10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): <0> tokens

It seems that CXIndex instance was corrupted (during return and next invocation) somehow as if i invoke clang methods within one native code method it’s okay.

As i said i can try to pack and unpack within one native code method and it’s still okay, so packing/unpacking does not hurt.

JNI env pointer is the same in all java->native code invocations.

I’m in stuck.

10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): parsing: index=0xbeb06704
filename=./testfile.cpp args_count=0 files_count=1
        file=./testfile.cpp

So, here, the object address was correct, in the array.

// oops, where is filename ? (probably CXIndex at 0xbeb06704 is corrupted

for some reason)
10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): after parse filename: []

Is this from another call? Can you print the address/contents of the array?
It might be that the array is pointing to the wrong place, or the array's
own address has somehow changed, thus it's not CXIndex that is corrupted,
but the array representation.

// pack translation unit and return index to java

10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): hold translationUnit to
index=0 (0x656480dc) -> 0xbeb066c8

Is this still the array? The address is different than you use to have
earlier.

cheers,
--renato

10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): parsing: index=0xbeb06704
filename=./testfile.cpp args_count=0 files_count=1
        file=./testfile.cpp

So, here, the object address was correct, in the array.

// oops, where is filename ? (probably CXIndex at 0xbeb06704 is corrupted

for some reason)
10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): after parse filename: []

Is this from another call? Can you print the address/contents of the array?

yes, index instance address is within <>:
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): <0x6564805c> hold to index=0
(0x6564805c) -> 0xbeb06704

It might be that the array is pointing to the wrong place, or the array's
own address has somehow changed, thus it's not CXIndex that is corrupted,
but the array representation.

no, the same array pointer after unmap (after passing long from java to
native code):

10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): <0x6564805c> get from index=0
(0x6564805c) -> 0xbeb06704

if array pointer is different then array[0] will be different most likely,
but it's exactly the same:
0xbeb06704

// pack translation unit and return index to java

10-11 15:35:15.503: ERROR/CLANG_DEBUG(2491): hold translationUnit to
index=0 (0x656480dc) -> 0xbeb066c8

Is this still the array? The address is different than you use to have
earlier.

0xbeb066c8 is CXTranslationUnit instance address, not CXIndex instance
address. While parsing right index instance address is passed (
index=0xbeb06704):
10-11 15:35:15.473: ERROR/CLANG_DEBUG(2491): parsing: index=0xbeb06704
filename=./testfile.cpp args_count=0 files_count=1
        file=./testfile.cpp

cheers,
--renato

My idea is not CXInstance pointer address is corrupted (it's exactly the
same), but memory for this address.

I can provide code output where all the clang invocations are done with one
single native invocation and it works (tokens are found).

My idea is not CXInstance pointer address is corrupted (it's exactly the
same), but memory for this address.

This is also possible, JNI is *also* famous for heap corruption.

I can provide code output where all the clang invocations are done with one

single native invocation and it works (tokens are found).

If it works from C and works on pure-C from Java, than the only answer is
that JNI is corrupting the memory. It's either that, or your C compiler has
a serious bug. To make sure it's not the compiler, try different versions.
Search Google for "jni heap corruption" or "jni stack corruption" and
you'll see what I mean.

Something similar to the problem I had, with a similar fix:

http://stackoverflow.com/questions/5305079/how-to-debug-jni-heap-corruption-problems

cheers,
--renato

Well, complicated things can work in unexpected way but sometimes it’s the developer who just does not know
how to make it working in correct way :wink:

Are there any obvious reasons for it for not working (f.e. thread-safe issues, any limitations or smth else)?

Anyway I will search for JNI problems definitely.
BTW. It’s working on my mac without any problems, but i’m going to make it working on Android.

Thanks for the help, Renato!