patch for llc/ARM: added mechanism to move switch tables from .text -> .data; also cleanup and documentation

Hi:

This is my first patch submission. Hopefully, this is the proper the protocol.
Attached is a patch for the llc ARM backend:

Added mechanism to generate switch table in a data section
rather than having it interleaved with the code.
This is controlled by command line flags and off by default.
Also, tried to document and improve the code where I modified it.

Robert

llc.patch.txt (13.3 KB)

This is my first patch submission. Hopefully, this is the proper the protocol.

This is fine, although it's usually better to submit patches to llvm-commits.

Added mechanism to generate switch table in a data section
rather than having it interleaved with the code.
This is controlled by command line flags and off by default.

When would you want the flag on? When would you want the flag off?

-Eli

This is my first patch submission. Hopefully, this is the proper the protocol.

This is fine, although it's usually better to submit patches to llvm-commits.

Added mechanism to generate switch table in a data section
rather than having it interleaved with the code.
This is controlled by command line flags and off by default.

When would you want the flag on? When would you want the flag off?

-Eli

The disadvantage of "outlining" the switch tables is the extra
constant pool load
incurred for loading the start address of the table into the register

The advantages are:
* cleaner code both on the llvm side and the generated side
which may be beneficial for the ARM JIT (though I have not looked at this).
* no PIC problems in the text section as we just move the relocatable
entries elsewhere.
* better instruction density in the text section as jumptables can
grow quite large subsequently and may cause certain offsets to overflow
* also AFAIK jumptables are the only "bigger than wordsize" data item in text.
The constant pools usually just contain 32bit quantities. In a past
life this made
it hard for me to binary rewrite arm executables.

I will have a look at this in a couple of days...

Evan

+cl::optstd::string FlagJumpTableSection(“jumptable-section”,

  • cl::init(“.data.jtab”));

+cl::opt<std::string> FlagJumpTableSection("jumptable-section",
+ cl::init(".data.jtab"));
+

I thought it would be nice to group all the jumptables together.
But as long as it stays configurable, I am fine to change the default
to ".data".

Is this necessary? Why not just put it in the normal data section?
Also "outline" jumptable seems like a strange term. Can you think of
something else?

Yes, that is a tough one. How about "NonInline" instead.

Robert

+cl::opt<std::string> FlagJumpTableSection("jumptable-section",
+ cl::init(".data.jtab"));
+

I thought it would be nice to group all the jumptables together.
But as long as it stays configurable, I am fine to change the default
to ".data".

There is already a TargetAsmInfo::JumpTableDataSection. Why not just use that?

Is this necessary? Why not just put it in the normal data section?
Also "outline" jumptable seems like a strange term. Can you think of
something else?

Yes, that is a tough one. How about "NonInline" instead.

Or "OutOfLine"?

Please add a test case as well. Thanks,

Evan

+cl::opt<std::string> FlagJumpTableSection("jumptable-section",
+ cl::init(".data.jtab"));
+

I thought it would be nice to group all the jumptables together.
But as long as it stays configurable, I am fine to change the default
to ".data".

There is already a TargetAsmInfo::JumpTableDataSection. Why not just
use that?

Nice find. I will use that and possible change the current setting,
".const", if it does not work,

Is this necessary? Why not just put it in the normal data section?
Also "outline" jumptable seems like a strange term. Can you think of
something else?

Yes, that is a tough one. How about "NonInline" instead.

Or "OutOfLine"?

That works for me.

Please add a test case as well. Thanks,

I am not sure how to go about testing.
I have a script that compiles a bunch of test
programs (gnu c torture test, etc.) and then runs the executables
on qemu. I run this script with and without my flags and
make sure that I do not introduce any new problems
-- there are currently plenty of vararg issues.

I was thinking of sending this script to the list
or maybe checking it into the llvm tree.

The key is that whatever tests there are they should just
be run with and without the new flag.
How do you run backend tests?

Robert

+cl::opt<std::string> FlagJumpTableSection("jumptable-section",
+ cl::init(".data.jtab"));
+

I thought it would be nice to group all the jumptables together.
But as long as it stays configurable, I am fine to change the default
to ".data".

There is already a TargetAsmInfo::JumpTableDataSection. Why not just
use that?

Nice find. I will use that and possible change the current setting,
".const", if it does not work,

Is this necessary? Why not just put it in the normal data section?
Also "outline" jumptable seems like a strange term. Can you think of
something else?

Yes, that is a tough one. How about "NonInline" instead.

Or "OutOfLine"?

That works for me.

Please add a test case as well. Thanks,

I am not sure how to go about testing.

For this please just add a test case to the llvm dejagnu tests.

thanks,

Evan

Evan:

Sorry for the late follow up, I was out of town last week.
Enclosed please find the updated patch including all
your suggestions and a dejagnus test.

Robert

llc.patch.2.txt (16 KB)

Hi Robert,

Evan asked me to review this patch, and I have some questions about it. I apologize for not following the discussion earlier and for hitting you with questions after you’ve already gone through several revisions.

LLVM provides some default behavior for handling jump tables, with the tables emitted separately from the code that uses them. The ARM backend provides its own custom handling of jump tables so that it can emit them in-line with the text. It seems to me that your patch changes the ARM backend to optionally handle jump tables in a way that is at least very similar to the default behavior, but you have implemented it without using the existing code. Did you consider implementing it using the LLVM defaults?

Using TargetAsmInfo::JumpTableDataSection is one piece of that, but there is more. As a start, you could change ARMJITInfo::hasCustomJumpTables and the “Custom” argument to setOperationAction for ISD::BR_JT in the ARMTargetLowering constructor to be conditional on your new flag. I’m sure there is more required than that, but maybe that will get you started.

Otherwise, there are some problems with your custom handling for out-of-line jump tables. The most serious of these is in ARMAsmPrinter::printJTBlockOperand: for out-of-line jump tables, you emit a “.text” directive to switch back to the text section. If the current section is not “.text”, however, this will break. I believe you ought to use SwitchToTextSection instead. Earlier in that same method, I think you should also use SwitchToDataSection and EmitAlignment. But, again, rather than tweaking this code, I would prefer to find a way to use the existing LLVM code for this.

Why did you change the default value for JumpTableDataSection? Jump tables really should not go into .data. That is a security hole. They should be read-only.

When you create global symbols for the jump tables, you specify a “.T” name:

new ARMConstantPoolValue(“.T”, Num, ARMCP::CPDataSegmentJumpTable);

Why “.T”?

And, by the way, I noticed that you removed a “blank” comment line in ARMISelLowering.h. I think I have done that myself, but I recently discovered that the extra lines are intentional. Doxygen recognizes C++ comments beginning with 3 slashes but you have to have at least 2 lines of them or doxygen will not recognize them.

Hi Robert,

Evan asked me to review this patch, and I have some questions about it. I apologize for not following the discussion earlier and for hitting you with questions after you’ve already gone through several revisions.

LLVM provides some default behavior for handling jump tables, with the tables emitted separately from the code that uses them. The ARM backend provides its own custom handling of jump tables so that it can emit them in-line with the text. It seems to me that your patch changes the ARM backend to optionally handle jump tables in a way that is at least very similar to the default behavior, but you have implemented it without using the existing code. Did you consider implementing it using the LLVM defaults?

I spend over a day trying to follow your suggestion. In the end I was not successful. Here is what Iearned:

After setting

ARMJITInfo::hasCustomJumpTables → true
setOperationAction for ISD::BR_JT → Expand

I needed to add a “brind” definition to ARMInstrInfo.td
I picked “bx” but to do a proper job one would have to take older architectures
into account…
The next thing was to to set
setOperationAction for ISD::JumpTable → Custom
and implement the corresponding code.
This code is very similar to my “outline” code as it has to materialize
the table start address.

I never quite got this to work, though.

Presumably what the generic code generator
would to is to use this table start multiple the “switch index” by 4 load the target address and jump there. Not a very big saving in my mind.
And it is offset by having to touch ARMInstrInfo.td.
Also note, that the generic code generator must make some assumptions
about the format of each jumptable entry (e.g. 32bit absolute addresses) so this cannot be changed easily afterwards.

Having the default jumtables and the out of line version diverge too much
and have different code path may also not be such a good idea.

Using TargetAsmInfo::JumpTableDataSection is one piece of that, but there is more. As a start, you could change ARMJITInfo::hasCustomJumpTables and the “Custom” argument to setOperationAction for ISD::BR_JT in the ARMTargetLowering constructor to be conditional on your new flag. I’m sure there is more required than that, but maybe that will get you started.

Otherwise, there are some problems with your custom handling for out-of-line jump tables. The most serious of these is in ARMAsmPrinter::printJTBlockOperand: for out-of-line jump tables, you emit a “.text” directive to switch back to the text section. If the current section is not “.text”, however, this will break. I believe you ought to use SwitchToTextSection instead. Earlier in that same method, I think you should also use SwitchToDataSection and EmitAlignment. But, again, rather than tweaking this code, I would prefer to find a way to use the existing LLVM code for this.

I changed all of these as per your request.

Why did you change the default value for JumpTableDataSection? Jump tables really should not go into .data. That is a security hole. They should be read-only.

I changed that too so the data goes into the default .rodata now.
I am not sure where .rodata lives. if it lives in the text segment
you get better protection but the code wont be PIC if .rodata
goes into the data segment it is the other way round.

When you create global symbols for the jump tables, you specify a “.T” name:

new ARMConstantPoolValue(“.T”, Num, ARMCP::CPDataSegmentJumpTable);

Why “.T”?

T as in Table, I do not care much about the naming. If you have a better proposal I will go with that.

And, by the way, I noticed that you removed a “blank” comment line in ARMISelLowering.h. I think I have done that myself, but I recently discovered that the extra lines are intentional. Doxygen recognizes C++ comments beginning with 3 slashes but you have to have at least 2 lines of them or doxygen will not recognize them.

  • fixed and added artificial line break to keep this from happening again.

Happy holidays!
Robert

patch.2009-07-01 (15.4 KB)

I spend over a day trying to follow your suggestion. In the end I was not successful. Here is what Iearned:

After setting

ARMJITInfo::hasCustomJumpTables → true
setOperationAction for ISD::BR_JT → Expand

I needed to add a “brind” definition to ARMInstrInfo.td
I picked “bx” but to do a proper job one would have to take older architectures
into account…
The next thing was to to set
setOperationAction for ISD::JumpTable → Custom
and implement the corresponding code.
This code is very similar to my “outline” code as it has to materialize
the table start address.

I never quite got this to work, though.

Sorry for the slow response. Since it wasn’t clear what problems you ran into, and since I wasn’t very familiar with LLVM’s handling of jump tables, I wanted to have a look at it myself to see what the issues were.

I was able to make it work after fixing one problem: the branch folder removes “dead” jump tables and it doesn’t scan the constant pool for jump table references. I had to add some code to the branch folder and the MachineConstantPoolValue class to fix that. I haven’t yet done much testing of this but it looks like it’s basically working.

Since I had to write the code to try this out, I’m sending my revised patch back to you. There are still a few issues to sort out before committing this:

  • I followed your lead and added a “brind” pattern using “bx”. I think this works on older architectures, too. What issue were you thinking of with regard to supporting older architectures? I didn’t even look at Thumb or Thumb2, but we’ll need something for them.

  • I saw some earlier discussion about the command-line option name. You had “outline-jumptables” and Evan had suggested “OutOfLine”. I don’t particularly like either one (sorry, Evan). How about “no-inline-jumptables”? I prefer that because it captures the sense that this option is disabling an ARM-specific feature (the inline jumptables) and changing back to the default.

  • I didn’t look at the JIT code emitter at all. That will need to be fixed.

  • It would be great to change the testcase to use the new FileCheck format.

I’ve attached my revised patch to show you how I made this work. I’ve also attached a copy of your last patch with embedded comments (search for “bob>”). Let me know what you think and if you can take a look at addressing some of the open issues above.

Presumably what the generic code generator
would to is to use this table start multiple the “switch index” by 4 load the target address and jump there. Not a very big saving in my mind.
And it is offset by having to touch ARMInstrInfo.td.
Also note, that the generic code generator must make some assumptions
about the format of each jumptable entry (e.g. 32bit absolute addresses) so this cannot be changed easily afterwards.

Having the default jumtables and the out of line version diverge too much
and have different code path may also not be such a good idea.

I’m not so much concerned about savings in the quantity of code as I am in keeping the ARM backend from diverging too far from the other LLVM backends. Sometimes we have to do that but it is an ongoing maintenance burden, and in this case, it seems like we can easily avoid it.

It also helps leverage the existing code. For example, I didn’t look too closely, but with my patch, compiling with -relocation-model=pic seemed to do the right thing.

Why did you change the default value for JumpTableDataSection? Jump tables really should not go into .data. That is a security hole. They should be read-only.

I changed that too so the data goes into the default .rodata now.
I am not sure where .rodata lives. if it lives in the text segment
you get better protection but the code wont be PIC if .rodata
goes into the data segment it is the other way round.

If you compile with -relocation-model=pic, then the jump table contains offsets relative to the start of the table. The tables live in .rodata (or even in .text itself, but not inline in the code) which is in the same segment as .text and thus the code is still PIC.

jumptable-bob.diff (12.6 KB)

jumptable-comments (14.6 KB)

I spend over a day trying to follow your suggestion. In the end I was not successful. Here is what Iearned:

After setting

ARMJITInfo::hasCustomJumpTables → true
setOperationAction for ISD::BR_JT → Expand

I needed to add a “brind” definition to ARMInstrInfo.td
I picked “bx” but to do a proper job one would have to take older architectures
into account…
The next thing was to to set
setOperationAction for ISD::JumpTable → Custom
and implement the corresponding code.
This code is very similar to my “outline” code as it has to materialize
the table start address.

I never quite got this to work, though.

Sorry for the slow response. Since it wasn’t clear what problems you ran into, and since I wasn’t very familiar with LLVM’s handling of jump tables, I wanted to have a look at it myself to see what the issues were.

I was able to make it work after fixing one problem: the branch folder removes “dead” jump tables and it doesn’t scan the constant pool for jump table references. I had to add some code to the branch folder and the MachineConstantPoolValue class to fix that. I haven’t yet done much testing of this but it looks like it’s basically working.

Since I had to write the code to try this out, I’m sending my revised patch back to you. There are still a few issues to sort out before committing this:

  • I followed your lead and added a “brind” pattern using “bx”. I think this works on older architectures, too. What issue were you thinking of with regard to supporting older architectures? I didn’t even look at Thumb or Thumb2, but we’ll need something for them.

  • I saw some earlier discussion about the command-line option name. You had “outline-jumptables” and Evan had suggested “OutOfLine”. I don’t particularly like either one (sorry, Evan). How about “no-inline-jumptables”? I prefer that because it captures the sense that this option is disabling an ARM-specific feature (the inline jumptables) and changing back to the default.

  • I didn’t look at the JIT code emitter at all. That will need to be fixed.

  • It would be great to change the testcase to use the new FileCheck format.

I’ve attached my revised patch to show you how I made this work. I’ve also attached a copy of your last patch with embedded comments (search for “bob>”). Let me know what you think and if you can take a look at addressing some of the open issues above.

Presumably what the generic code generator
would to is to use this table start multiple the “switch index” by 4 load the target address and jump there. Not a very big saving in my mind.
And it is offset by having to touch ARMInstrInfo.td.
Also note, that the generic code generator must make some assumptions
about the format of each jumptable entry (e.g. 32bit absolute addresses) so this cannot be changed easily afterwards.

Having the default jumtables and the out of line version diverge too much
and have different code path may also not be such a good idea.

I’m not so much concerned about savings in the quantity of code as I am in keeping the ARM backend from diverging too far from the other LLVM backends. Sometimes we have to do that but it is an ongoing maintenance burden, and in this case, it seems like we can easily avoid it.

It also helps leverage the existing code. For example, I didn’t look too closely, but with my patch, compiling with -relocation-model=pic seemed to do the right thing.

Why did you change the default value for JumpTableDataSection? Jump tables really should not go into .data. That is a security hole. They should be read-only.

I changed that too so the data goes into the default .rodata now.
I am not sure where .rodata lives. if it lives in the text segment
you get better protection but the code wont be PIC if .rodata
goes into the data segment it is the other way round.

If you compile with -relocation-model=pic, then the jump table contains offsets relative to the start of the table. The tables live in .rodata (or even in .text itself, but not inline in the code) which is in the same segment as .text and thus the code is still PIC.

Bob:

Thanks for cleaning this up. I like the new patch much better than the old one.
Teaching the (abstract) ConstantValue class about jumptable indices is a little
bit ugly but I do not see any better solution without massive refactoring.
I have added TODOs here and elsewhere and plan to address them in future
patches. I also added a test and pcleaned up a few things.

The new patch is attached.
I also uploaded it to

http://codereview.appspot.com/96065/show

which is more convenient for browsing.
(you will have to register with an arbitrary pair
of email address and password --if you have a gmail
account you can use that)
Cheers,

Robert

j.patch (16.2 KB)

Maybe I'm missing something but it looks like this is basically the same as the sample code that I sent back to you. Aside from the TODO comments, it looks like all you changed was to move the new code in BranchFolding into a new UpdateLiveSetUsingConstantPoolAndMapIndices function, and I don't particularly like that change -- the new code is not that big and it is essentially the same code as in the preceding loop. If anything, maybe it would be possible to refactor those two loops to share some code? I don't see the point in just moving one of them out to a separate function with a big name.

As I wrote before, there are a few things that need to be addressed before we can take the patch. I'll repeat them here for clarity:

* It needs to work on Thumb and Thumb2. I'm not even sure if the "brind" pattern I added is the right thing for ARM mode.

* The JIT code emitter is definitely broken with this. Unless there is a fundamental reason why this is hard, it should be fixed.

* The testcase should use the new FileCheck format. (I think I phrased this as a suggestion before, but upon looking more closely at the test, it seems like the sort of test that is not well-suited to using grep.)

You didn't respond to my comment about the command-line option name. What do you think of "no-inline-jumptables"?

How do you plan to test this? The simple testcase is a good start. Beyond that, I'd like to see you (temporarily) enable this by default and run through the llvm testsuite.

I also see you added a comment: "TODO: this does not currently work for PIC mode: the code works correctly but the table still ends up in the .text section." I don't remember seeing that behavior, but maybe I missed it. Please investigate why this is happening.

Thanks!

Bob:

a new patch is attached. I also uploaded it to

http://codereview.appspot.com/96065/show

more info inline

Bob:

Thanks for cleaning this up. I like the new patch much better than
the old one.
Teaching the (abstract) ConstantValue class about jumptable indices
is a little
bit ugly but I do not see any better solution without massive
refactoring.
I have added TODOs here and elsewhere and plan to address them in
future
patches. I also added a test and pcleaned up a few things.

Maybe I’m missing something but it looks like this is basically the
same as the sample code that I sent back to you. Aside from the TODO
comments, it looks like all you changed was to move the new code in
BranchFolding into a new UpdateLiveSetUsingConstantPoolAndMapIndices
function, and I don’t particularly like that change – the new code is
not that big and it is essentially the same code as in the preceding
loop. If anything, maybe it would be possible to refactor those two

  • undid my refactoring.

loops to share some code? I don’t see the point in just moving one of
them out to a separate function with a big name.

As I wrote before, there are a few things that need to be addressed
before we can take the patch. I’ll repeat them here for clarity:

  • It needs to work on Thumb and Thumb2. I’m not even sure if the
    “brind” pattern I added is the right thing for ARM mode.

added a brind instruction for thumb, but it seems that jump tables are broken
on thumb even with “inlined tables”

  • The JIT code emitter is definitely broken with this. Unless there
    is a fundamental reason why this is hard, it should be fixed.

as per our private conversation did not address this

  • The testcase should use the new FileCheck format. (I think I
    phrased this as a suggestion before, but upon looking more closely at
    the test, it seems like the sort of test that is not well-suited to
    using grep.)

I made small change to look for “section” rather than “rodata” to make the test more portable.
As per our private conversation did not switch to “FileCheck format” because it does
not seem to support multiple passes over the same file.

You didn’t respond to my comment about the command-line option name.
What do you think of “no-inline-jumptables”?

  • command line naming is following your proposal

How do you plan to test this? The simple testcase is a good start.
Beyond that, I’d like to see you (temporarily) enable this by default
and run through the llvm testsuite.

  • did a lot testing using gnu torture test, overall I ran 3x1000 small c programs on qemu

I also see you added a comment: “TODO: this does not currently work
for PIC mode: the code works correctly but the table still ends up in
the .text section.” I don’t remember seeing that behavior, but maybe
I missed it. Please investigate why this is happening.

  • reworded this but it seems to agree with your initial findings.

llvm-jt.patch (17.5 KB)