Module::getOrInsertFunction determinism

Hi All,

I’ve got a question about Module::getOrInsertFunction().
I got an impression that it is not deterministic where exactly in the bit code module the new function will be inserted.
Is there a way to make it deterministic by explicitly specifying some offset at which the function should be inserted?
Please correct me if I’m wrong.

Thank you very much in advance for help,

Best regards,
Tomek

Hi Tomek,

I’ve got a question about Module::getOrInsertFunction().
I got an impression that it is not deterministic where exactly in the bit
code module the new function will be inserted.

Looking at the code (not exhaustively), it seems a new function will
always be added to the end of a module.

Documenting that probably wouldn't be a terrible idea, but it
shouldn't affect anything except the human-readability of LLVM IR. Are
you trying to do something where it is actually causing problems?

Cheers.

Tim.

Hi Tim,

Thank you very much for quick response. What happens in my case is that
mixing calls to getOrInsertFunction
with linking another bit code module to the current module seems to be
sometimes producing different outputs.
Difference might be with the ordering of functions in the file (I¹m
looking at LLVM IR representation), with numbers
used for constant variables, numbers used in the ³dbg² metadata. Since
sometimes the diff is not trivial to analyse I
just wasn¹t sure if I only get these sort of differences and not
differences like actual instructions being reordered.
Do you know what might be the cause of these differences?

Thank you,

Tomek

I would argue in favor of leaving this explicitly undocumented. I don't see the use case in knowing where in a module it got added, and it restricts future implementations in ways we can't predict. Documenting that it must be deterministic is fine. Documenting where it decides to place it is not.

Tomek, could you spell out why you need the position? Maybe there's another option here.

Philip

Hi Tomek,

Thank you very much for quick response. What happens in my case is that
mixing calls to getOrInsertFunction with linking another bit code module to
the current module seems to be sometimes producing different outputs.

Even with exactly the same call sequence in each case? That's probably
not intended.

As you say, it makes looking for actual changes difficult; and for
purely Clang-related reasons we want link-time optimized binaries to
be comparable with "diff" to verify staging works.

The usual cause of these issues is something being sorted by pointer
or hash rather than name or some other prior order. I can't see any
obvious data structures like that in llvm::Module though.

Difference might be with the ordering of functions in the file (I¹m
looking at LLVM IR representation), with numbers
used for constant variables, numbers used in the ³dbg² metadata.

Do you mean you're not sure where the difference is there, or that
you've seen all of those differences in practice?

Cheers.

Tim.

Hi Philip,

Thank you very much for the reply. I need to add one important detail -
currently I¹m working on an outdated version of
LLVM, so I will need to first make sure that it also appears on the
current one.

The use case that I have in mind is to be able to uniquely identify
instruction in the bit code.
For example let¹s say we have an ³add² instruction in some function and I
want to know that this is the same
instruction in two separate runs. If the function is put in some different
place as a result of inserting function and linking in the second run,
how would I identify that instruction? It seems to me that cannot rely on
the offset of the instruction within the binary and I cannot rely on the
instruction count.
The problem in my case was that for the same source code base, I was
getting different resulting LLVM IR when doing the same
sequence of getOrInsertFunction and linking with other bit code modules on
each of runs.

Thank you,
Tomek

Hi Tim,

Hi Tomek,

Thank you very much for quick response. What happens in my case is that
mixing calls to getOrInsertFunction with linking another bit code
module to
the current module seems to be sometimes producing different outputs.

Even with exactly the same call sequence in each case? That's probably
not intended.

Yes, that was for the same code base. However, as I mentioned in my e-mail
to Philip, this might also
be related to my outdated version of LLVM - I should check it on some
newer one to make sure.

As you say, it makes looking for actual changes difficult; and for
purely Clang-related reasons we want link-time optimized binaries to
be comparable with "diff" to verify staging works.

The usual cause of these issues is something being sorted by pointer
or hash rather than name or some other prior order. I can't see any
obvious data structures like that in llvm::Module though.

Difference might be with the ordering of functions in the file (I©öm
looking at LLVM IR representation), with numbers
used for constant variables, numbers used in the ©ødbg©÷ metadata.

Do you mean you're not sure where the difference is there, or that
you've seen all of those differences in practice?

I¡¯ve seen different naming for string constants and different ordering of
functions in the LLVM IR.

Cheers.

Tim.

Thanks a lot,
Tomek

Hi Tim,

You¡¯ve mentioned that sometimes the root cause of the problem is that a
pointer or a hash
is used instead of the name in sorting. I thought I will give it a try
with ASLR turned off.
Surprisingly now the output seems to be consistent. As far as I understand
ASLR runs on OS level rather than on the
compiler level, which may suggest that probably somewhere this order of
pointers matters.
I¡¯m using LLVM 2.9 with KLEE symbolic execution engine.

I¡¯m not sure how much this problem is specific to my particular use or
LLVM version though.

Thank you,
Tomek

Hi Philip,

I would like to ask a follow-up question about code generation.
Do you know if is expected that if we take the same bit code modules and
link them together in the same order (programatically), but on different
machines (assuming the same version of LLVM and
roughly the same OS), the output may differ with respect to order of
function definitions inside module?

Thank you very much in advance for help,

Best regards
Tomasz Kuchta

Hi Philip,

I would like to ask a follow-up question about code generation.
Do you know if is expected that if we take the same bit code modules and
link them together in the same order (programatically), but on different
machines (assuming the same version of LLVM and
roughly the same OS), the output may differ with respect to order of
function definitions inside module?

This falls in to the category of "not really surprising". There's enough
variation between machines that it's not uncommon for an otherwise
mostly-deterministic process to behave slightly differently. I wouldn't
call the change in order "expected", but it also doesn't sound like an
obvious bug.

Also, from what you said previously, you're working with a very old
version of LLVM. Most of my own experience is with newer versions so it
may be my expectations are badly off. I'd really recommend you upgrade.
You'll get much more applicable advice. :slight_smile:

Thank you very much in advance for help,

Best regards
Tomasz Kuchta

Hi Philip,

Thank you very much for the reply. I need to add one important detail -
currently I©öm working on an outdated version of
LLVM, so I will need to first make sure that it also appears on the
current one.

The use case that I have in mind is to be able to uniquely identify
instruction in the bit code.
For example let©ös say we have an ©øadd©÷ instruction in some function and I
want to know that this is the same
instruction in two separate runs. If the function is put in some different
place as a result of inserting function and linking in the second run,
how would I identify that instruction? It seems to me that cannot rely on
the offset of the instruction within the binary and I cannot rely on the
instruction count.

Have you considered trying either metadata or using an intrinsic
function as a marker for the interesting instruction? Depending on your
use case, either might work. (f.y.i. I believe Metadata has changed
radically since earlier versions.)

Hi Philip,

Thank you very much for your comments.
I think I¡¯ve discovered a root cause. The problem was in linking bit code
archive files with the module.
At some point, std::set<Module*> is used and iterated over. I believe this
was the reason why e.g. It worked consistently with
ASLR turned off and produced non-deterministic output otherwise. I changed
that bit to use vector instead and now it seems to
be working as expected. I wouldn¡¯t be surprised if this is already fixed /
changed in some newer version of LLVM and I agree I should upgrade :slight_smile:

Thank you very much for help,
Best regards
Tomek

Hi Philip,

Thank you very much for your comments.
I think I¡¯ve discovered a root cause. The problem was in linking bit code
archive files with the module.
At some point, std::set<Module*> is used and iterated over. I believe this
was the reason why e.g. It worked consistently with
ASLR turned off and produced non-deterministic output otherwise. I changed
that bit to use vector instead and now it seems to
be working as expected. I wouldn¡¯t be surprised if this is already fixed /
changed in some newer version of LLVM and I agree I should upgrade :slight_smile:

Would you mind checking to see if the current TOT uses the set
representation? If so, please file a bug with an explanation of the
ordering problem involving ASLR. Linking should be stable.

If there is a std::set<Module*> this sounds like the old linking code
(Linker::LinkInFile). That code was removed in LLVM3.3 and it is no
longer possible to link archives of bitcode modules directly using the
LLVM API.

I implemented a hacky replacement[1] in upstream KLEE. So if you
upgrade you might find this works fine Tomek.

However I fundamentally believe KLEE's approach is wrong here because
I think linking and most optimisations should be done outside of KLEE
but I'm not really interested in fixing this right now.

[1] https://github.com/klee/klee/blob/master/lib/Module/ModuleUtil.cpp#L345

Thanks,

Hi Philip, Dan,

Thank you very much for you comments.

The exact place was:
Archive::findModulesDefiningSymbols() function, that is called from
Linker::LinkInArchive.
I am using LLVM version 2.9.

I couldn¹t find either of these two functions in LLVM3.4, but as Dan
suggests,
this functionality was removed in 3.3, so that should be correct now.

Thanks!

Tomek