LLVMContext: Suggestions for API Changes

Dear LLVMers,

LLVM is currently undergoing a change in its programming API, most of
which has to do with the transition to using an LLVMContext. I've been
trying to keep two LLVM projects, poolalloc and SAFECode, up-to-date
with LLVM mainline, but it has become frustrating because:

1) The API changes occur frequently. Each morning I come in and have to
fix something because the API changed.

2) The necessary changes aren't advertised on llvmdev; I have to go
figure out for myself the new way to do whatever it is that my code does
(adding constant NULL pointers, adding constant integers, etc.).

When making API changes, could you please consider doing the following:

1) If technically possible, add the new API first, get it working, email
llvmdev describing the old and new APIs, provide some lead time for
people to change over, and then remove the old APIs. This makes it
easier to plan when I fix problems due to LLVM API changes and when I
can work on our own bugs. :slight_smile:

2) When an API change is going to be made/has been made, please send a
short email to llvmdev stating what has/will be changed and how LLVM
users should update their code. If you can write a script that makes
the change automatically, that would be even better, but I realize that
some changes are complicated enough that automating the change is too
much effort. It saves time when I know what is going to break and how
to fix it; scrounging around the LLVM source and the commit mailing list
is wasted time when a two sentence email will tell me everything I need
to know.

-- John T.

The high-level change was already described and discussed on LLVMdev.

The summary is that the static factory methods on Constant and Type are going away, and becoming instance methods on LLVMContext. All of the instance methods are already in place (though many just forward to the static implementations for the moment). I'm currently in the process of moving the implementations over, removing the static versions, and updating the LLVM tree for it.

For basic constant-fetching API changes, which make up the bulk of the changes, the update should be trivial: switching to the corresponding instance method on LLVMContext. In general, I've tried to make the mapping of static function names to instance method names obvious, so it should be straightforward.

Slightly more tricky are the API changes made incidentally to moving to LLVMContext, like the GlobalValue constructor change. These occurred because it was necessary to pass a context in where it wasn't available before. I'll try to provide more notice about these, but, again, they should be relatively limited in scope.

--Owen

Owen Anderson wrote:

  

1) If technically possible, add the new API first, get it working,
email
llvmdev describing the old and new APIs, provide some lead time for
people to change over, and then remove the old APIs. This makes it
easier to plan when I fix problems due to LLVM API changes and when I
can work on our own bugs. :slight_smile:
    
The high-level change was already described and discussed on LLVMdev.
  

First, did you discuss it or announce it? There's a difference.

With its high traffic volume, I don't read all llvmdev mails anymore. I
pick and choose based on their subject lines. A subject entitled "API
Change: LLVMContext" will probably get my attention while
"Multi-threading support for LLVM" will not.

Second, the only emails I remember on the topic was an email that
provided a short description of LLVMContext and some followup emails on
questions/commit messages from me. The first message announcing
LLVMContext was too vague; it required that the reader think about the
LLVM implementation and logically deduce what was going to change. The
text you've written below is much more clear: you say methods x, y, and
z are moving from class a to b. That information and doxygen is usually
all I need.

What I'm suggesting is that you should announce a change in a way that
makes it easy for LLVM users to understand what they need to do to
understand their code. The announcement can't assume that the reader
knows about LLVM internals (like how uniquing is implemented), and it
can't be part of some larger discussion on how the change should be made.

What you've written below is perfect; I think it would have been better
if you had sent it earlier. Perhaps you did, and I missed it, or
perhaps I'm being too picky. In that case, I apologize.

I think I've said enough on the matter. My points are meant as
constructive criticism; use them in whatever way you feel is best.

-- John T.

How about a subject line “[LLVMdev] MAJOR API CHANGE: LLVMContext”?

–Owen

You mean like this subject ?

http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-June/023505.html

Notice the sender line on that email… :wink:

–Owen

Owen Anderson wrote:

Owen Anderson wrote:

1) If technically possible, add the new API first, get it working,
email
llvmdev describing the old and new APIs, provide some lead time for
people to change over, and then remove the old APIs. This makes it
easier to plan when I fix problems due to LLVM API changes and when I
can work on our own bugs. :slight_smile:

The high-level change was already described and discussed on LLVMdev.

First, did you discuss it or announce it? There's a difference.

With its high traffic volume, I don't read all llvmdev mails anymore. I
pick and choose based on their subject lines. A subject entitled "API
Change: LLVMContext" will probably get my attention while
"Multi-threading support for LLVM" will not.

How about a subject line "[LLVMdev] MAJOR API CHANGE: LLVMContext"?

Yes, that particular message fits the first criterion: it announces the
change in the subject line. I thought you might have been referring to
other emails about LLVMContext that I didn't see, but you weren't; you
were referring to the above message.

However, IMHO, the message from June doesn't clearly state what is to
change. It tells me static methods will be removed, but it doesn't tell
me where they are re-implemented. It assumes I can infer that by
understanding how LLVM uniques values.

-- John T.

Wasn’t this tips not enough ?

"To ease this transition, I have added a getGlobalContext() API. If

you're only ever planning to use LLVM on a single thread, it will be  
completely safe to simply pass this value to every API that takes an  
LLVMContext."


Owen Anderson wrote:

You mean like this subject ?

http://lists.cs.uiuc.edu/pipermail/llvmdev/2009-June/023505.html

Notice the sender line on that email... :wink:

Yes, you indeed announced that change, but as John rightfully remarked,
the announcement gave little detail. For LLVM users like me, who just,
well, *use* LLVM, this wasn't enough. Maybe it's not time to document
the changes yet, and maybe everybody but the LLVM developers should stop
using LLVM from svn for a while until the API stabilizes again. That's
fine with me. I can only politely ask that at some point someone please
provide us with the information that we need to port our stuff to the
new API so that we don't have to spend days digging through the LLVM
source code. :wink:

Thanks,
Albert

http://llvm.org/docs/ReleaseNotes-2.6.html#whatsnew

See the "Major Changes and Removed Features" section, which is obviously still a work in progress until 2.6 ships.

--Owen

See Owen's email about docs for the 2.6 release, but it's really not
that hard to keep up with trunk. I recently merged trunk LLVM into
Unladen Swallow, and the changes I needed to make are at
Google Code Archive - Long-term storage for Google Code Project Hosting.. You get
some compiler errors saying that an LLVMContext parameter is missing;
you grep the source for LLVMContext, finding
http://llvm.org/viewvc/llvm-project/llvm/trunk/include/llvm/LLVMContext.h?view=markup;
you read that and find the "/// FOR BACKWARDS COMPATIBILITY" line at
the bottom; and you pass getGlobalContext() all over the place until
the compiler's happy.

The InitializeNativeTarget() change was harder to discover, but it
wasn't that hard to search the list for.

Albert Graef wrote:

Yes, you indeed announced that change, but as John rightfully remarked,
the announcement gave little detail. For LLVM users like me, who just,
well, *use* LLVM, this wasn't enough. Maybe it's not time to document
the changes yet, and maybe everybody but the LLVM developers should stop
using LLVM from svn for a while until the API stabilizes again. That's
fine with me. I can only politely ask that at some point someone please
provide us with the information that we need to port our stuff to the
new API so that we don't have to spend days digging through the LLVM
source code. :wink:

I think a timeline would help. That is, if the developers made a point of
saying "we expect the API on trunk to be volatile for the next two weeks",
then users would be able to make their own decisions about whether to
try to track the changes. It also makes it easier for developers to document
how to adapt to the new API, because you can make one guide when you
know how it all works instead of maintaining (or failing to maintain) a
constantly-shifting document of momentary kludges and workarounds.

The downside, of course, is that it would encourage users to not stay
current for a few weeks, which hurts all the other active development.

John.

Jeffrey Yasskin wrote:

See Owen's email about docs for the 2.6 release, but it's really not
that hard to keep up with trunk. I recently merged trunk LLVM into
Unladen Swallow, and the changes I needed to make are at
Google Code Archive - Long-term storage for Google Code Project Hosting..

Thanks Jeffrey, that was really very helpful! I have Pure working with
both the LLVM 2.6 release branch and the trunk now.

One thing I noticed is that writing LLVM assembler code (print()
methods) seems to be horribly slow now (some 4-5 times slower than in
LLVM 2.5). This is a real bummer for me, since Pure's batch compiler
uses those methods to produce output code which then gets fed into llvmc.

Is this a known problem? Will it be fixed until the 2.6 release?

Albert

One thing I noticed is that writing LLVM assembler code (print()
methods) seems to be horribly slow now (some 4-5 times slower than in
LLVM 2.5). This is a real bummer for me, since Pure's batch compiler
uses those methods to produce output code which then gets fed into llvmc.

Is this a known problem?

Are you printing to stderr or errs()? If so, be aware that it's no longer buffered, so it isn't well suited for bulk output. Stdout and normal file output are still buffered though.

Otherwise, no. Can you describe how you're doing the printing, and anything else that might be relevant?

Will it be fixed until the 2.6 release

It depends on the specifics. Output file performance is important.

Dan

Dan Gohman wrote:

Are you printing to stderr or errs()? If so, be aware that it's no
longer buffered, so it isn't well suited for bulk output. Stdout and
normal file output are still buffered though.

I'm using raw_fd_ostream. It gets initialized like this:

  string error;
  llvm::raw_ostream *codep = file_target?
    new llvm::raw_fd_ostream(target.c_str(), error,
           llvm::raw_fd_ostream::F_Force):
    new llvm::raw_stdout_ostream();
  if (!error.empty()) {
    std::cerr << "Error opening " << target << '\n';
    exit(1);
  }
  llvm::raw_ostream &code = *codep;

(Yeah, it's a bit convoluted, since I'm allowing output to either stdout
or a disk file here.)

Lateron I then iterate over the global variables and functions of my
module, decide which ones to keep, and output code for these using
something like f->print(code).

Anything wrong with that?

Albert

Albert Graef wrote:

One thing I noticed is that writing LLVM assembler code (print()
methods) seems to be horribly slow now (some 4-5 times slower than in
LLVM 2.5). This is a real bummer for me, since Pure's batch compiler
uses those methods to produce output code which then gets fed into llvmc.

Let me follow up with some concrete figures. Unfortunately, I don't have
a minimal C++ example, but the effect is easy to reproduce with Pure
0.31 from http://pure-lang.googlecode.com/ and the attached little Pure
script. (I'm quite sure that this is not a bug in the Pure interpreter,
as exactly the same code runs more than a magnitude faster with LLVM 2.5
than with LLVM 2.6/2.7svn.)

The given figures are user CPU times in seconds, as given by time(1), so
they are not really that accurate, but the effect is so prominent that
this really doesn't matter. (See below for the details on how I obtained
these figures.)

                LLVM 2.5 LLVM 2.6 LLVM 2.7(svn)

execute 1.752s 2.272s 2.256s
compile 2.316s 24.458s 24.834s

codegen 0.564s 22.186s 22.578s
(compile ./. execute)

To measure the asmwriting times, I first ran the script without
generating LLVM assembler output code ("execute", pure -x hello.pure)
and then again with LLVM assembler output enabled ("compile", pure -o
hello.ll -c -x hello.pure). The difference between the two figures
("codegen") gives a rough estimate of the net asmwriting times. (That's
really all that pure -c does; at the end of script execution it takes
the IR that's already in memory and just spits it out by iterating over
the global variables and the functions of the IR module and using the
appropriate print() methods.)

The resulting LLVM assembler file hello.ll was some 5.3 MB for LLVM
2.6/2.7svn (4.4 MB for LLVM 2.5; the assembler programs are exactly the
same, though, the size differences are apparently due to formatting
changes in LLVM 2.6/2.7svn). Note that the code size is quite large
because the function definitions compiled from Pure's prelude are all
included.

The tests were performed on an AMD Phenom 9950 4x2.6GHz with 4GB RAM
running Linux x86-64 2.6.27. The following configure options were used
to compile LLVM (all versions) and Pure 0.31, respectively:

LLVM: --enable-optimized --disable-assertions --disable-expensive-checks
--enable-targets=host-only --enable-pic

Pure: --enable-release

So the effect is actually much *more* prominent than I first made it out
to be. This is just one data point, of course, but I get an easily
noticable slowdown with every Pure script I tried. In fact it's so much
slower that I consider it unusable.

I'm at a loss here. I'd have to debug the LLVM asmwriter code to see
where exactly the bottleneck is. I haven't done that yet, but I ruled
out an issue with the raw_ostream buffer sizes by trying different sizes
from 256 bytes up to 64K; it doesn't change the results very much.

So my question to fellow frontend developers is: Has anyone else seen
this? Does anyone know a workaround?

Any help will be greatly appreciated.

TIA,
Albert

hello.pure (181 Bytes)

Before we get too far into this, I'd like to point out that there's a ready solution
for the problem of the AsmPrinter being slow: Bitcode. If you want IR
reading and writing to be fast, you should consider bitcode files rather than
assembly (text) files anyway. Bitcode is smaller and faster. And the API is
similar, so it's usually easy to change from assembly to bitcode.

That said, I've done testing of the AsmPrinter performance myself and seen
only moderate slowdowns due to the formatting changes; nothing of the
magnitude you're describing. I'm hoping to try out Pure to see if I can
reproduce what you're seeing.

As a first step, would it be possible for you to use strace -etrace=write to
determine if buffering is somehow not happening?

One other question the occurs to me: is Pure dumping the whole Module
at once, or is it manually writing out the IR in pieces?

Dan

Dan Gohman wrote:

One other question the occurs to me: is Pure dumping the whole Module
at once, or is it manually writing out the IR in pieces?

Well, you hit the nail on the head with that one. :wink: In fact, I just
had the same idea. So, instead of selecting and emitting individual
globals and functions on the fly, I rewrote the .ll writer in Pure so
that it just erases unwanted stuff from the module and then emits the
entire module at once. Well, you guessed it, it runs a lot faster now. I
also see the minor slowdowns compared to LLVM 2.5 you mentioned, but
those I can easily live with.

Problem solved, thanks! (And sorry for wasting bandwidth with this.)

Thank you also for the hint about bitcode reading/writing. I'm aware of
this, but I actually prefer the .ll output because it's human-readable,
which is great for debugging purposes. I might add a bitcode writer some
time, but it's not a high priority for me right now.

What I'm really looking forward to is the .o writer, though
(http://wiki.llvm.org/Direct_Object_Code_Emission). That will make
things *much* easier for Pure users, as they won't have to install the
entire LLVM suite any more if all they want to do is batch-compile Pure
programs.

Albert

Isn't there a tool to turn bitcode into human readable text? I have been
reading about the internals of LLVM lately (I'm planning on making a new
backend) and the whole infrastructure reminds me a lot of the Amsterdam
Compiler Kit (ACK). They had a program to turn binary EM code into ASCII,
making it easy to dump the output to |more or less during debugging.

--Ivo

Yes, llvm-dis is documented on the LLVM web site. Of course it does not run in zero time, which may be a consideration.