Custom assembler subset

Hello all,

I would like to restrain the compiler that I build on my local box from picking all but a particular set of opcodes. Is there a way to accomplish this in a straightforward way? I’m pretty sure that there is a list of opcodes to semantics mappings.

In addition, is there a way to look at an associative mapping of LLVM IR to opcode, and/or vice versa?

Hello, this is just a bump, in case no one has seen this or had the time to answer. I’m sure that there are others that have referenced llvm’s instruction semantics in order to add some. What I saw regarding instruction semantics was that it was supposed to be in an extremely convoluted and hard to understand file. But I feel sure that that’s not right, it would have to be that I looked in the wrong place. Is there a file that has an associative mapping from opcode to semantics or vice versa that can be edited to remove certain opcodes that the target compiler will generate?

-llvmdev@cs.uiuc.edu, that list isn't in use anymore.

Hello all,

I would like to restrain the compiler that I build on my local box from
picking all but a particular set of opcodes. Is there a way to accomplish
this in a straightforward way?

Can you elaborate a bit on what you're really trying to achieve?

One starting point could be to introduce new subtarget features, and
use them as predicates on the instructions you want (or, probably,
those you don't want). See, e.g., http://reviews.llvm.org/D18802.

Another approach could be to use the (also complex) SelectionDAG
legalization machinery to convert the operations you don't want into
"legal" ones (that'd be in <target>ISelLowering.cpp, and
Legalize*.cpp).

I'm pretty sure that there is a list of
opcodes to semantics mappings.

There isn't, because nothing LLVM does needs it, and, more
importantly, because you can't express every ISA in LLVM IR (and vice
versa).

In addition, is there a way to look at an associative mapping of LLVM IR to
opcode, and/or vice versa?

There is a mapping of sorts, but, as you saw, it's convoluted:

- SelectionDAG (optimized instruction selection) has its own set of
opcodes. Some are generic and inspired by IR (ISDOpcodes.h), but some
are target-specific lower level constructs (<target>ISelLowering.h).
Instructions are sometimes associated with these opcodes
(ISDOpcodes.h; look for '[(' in the various <target>Instr*.td files)

- alternatively, FastISel (fast instruction selection) does a more
direct mapping from LLVM IR to machine instruction (look for BuildMI
calls in <target>FastISel.cpp files)

Shameless plug: I do have an out-of-tree project that tries to infer
such a mapping from the various .td files, if you do end up needing it.

-Ahmed

-llvmdev@cs.uiuc.edu, that list isn't in use anymore.

> Hello all,
>
> I would like to restrain the compiler that I build on my local box from
> picking all but a particular set of opcodes. Is there a way to accomplish
> this in a straightforward way?

Can you elaborate a bit on what you're really trying to achieve?

Well, I need to construct a corpora. I would like a way to restrain the
output binaries and trust that the generated binaries use an assembler
subset that I can work with to be able to more precisely measure the
results of my work.

One starting point could be to introduce new subtarget features, and
use them as predicates on the instructions you want (or, probably,
those you don't want). See, e.g., http://reviews.llvm.org/D18802.

Another approach could be to use the (also complex) SelectionDAG
legalization machinery to convert the operations you don't want into
"legal" ones (that'd be in <target>ISelLowering.cpp, and
Legalize*.cpp).

> I'm pretty sure that there is a list of
> opcodes to semantics mappings.

There isn't, because nothing LLVM does needs it, and, more
importantly, because you can't express every ISA in LLVM IR (and vice
versa).

Ok, thank you for your kind reply. I wondered if perhaps I just hadn't
found what I was looking for, because I know for certain that I looked some
time ago.

> In addition, is there a way to look at an associative mapping of LLVM IR
to
> opcode, and/or vice versa?

There is a mapping of sorts, but, as you saw, it's convoluted:

- SelectionDAG (optimized instruction selection) has its own set of
opcodes. Some are generic and inspired by IR (ISDOpcodes.h), but some
are target-specific lower level constructs (<target>ISelLowering.h).
Instructions are sometimes associated with these opcodes
(ISDOpcodes.h; look for '[(' in the various <target>Instr*.td files)

- alternatively, FastISel (fast instruction selection) does a more
direct mapping from LLVM IR to machine instruction (look for BuildMI
calls in <target>FastISel.cpp files)

Shameless plug: I do have an out-of-tree project that tries to infer
such a mapping from the various .td files, if you do end up needing it.

Ok, I'm pretty sure that may be the files that I was reading that was so
convoluted. Yes, I would be happy if anybody were to please point me in the
direction of any tool that will allow me to know the semantics of an
opcode. Any IR will do, churning through such convoluted representation is
not a very effective use of time. I question if the cpu manual wouldn't be
a better way to go about it.

-llvmdev@cs.uiuc.edu, that list isn't in use anymore.

> Hello all,
>
> I would like to restrain the compiler that I build on my local box from
> picking all but a particular set of opcodes. Is there a way to
> accomplish
> this in a straightforward way?

Can you elaborate a bit on what you're really trying to achieve?

Well, I need to construct a corpora. I would like a way to restrain the
output binaries and trust that the generated binaries use an assembler
subset that I can work with to be able to more precisely measure the results
of my work.

Yeah, depending on how restrictive you want to be, that can be a lot
of work; I don't think you can avoid delving into the various pieces
of ISel.

One starting point could be to introduce new subtarget features, and
use them as predicates on the instructions you want (or, probably,
those you don't want). See, e.g., http://reviews.llvm.org/D18802.

Another approach could be to use the (also complex) SelectionDAG
legalization machinery to convert the operations you don't want into
"legal" ones (that'd be in <target>ISelLowering.cpp, and
Legalize*.cpp).

> I'm pretty sure that there is a list of
> opcodes to semantics mappings.

There isn't, because nothing LLVM does needs it, and, more
importantly, because you can't express every ISA in LLVM IR (and vice
versa).

Ok, thank you for your kind reply. I wondered if perhaps I just hadn't found
what I was looking for, because I know for certain that I looked some time
ago.

> In addition, is there a way to look at an associative mapping of LLVM IR
> to
> opcode, and/or vice versa?

There is a mapping of sorts, but, as you saw, it's convoluted:

- SelectionDAG (optimized instruction selection) has its own set of
opcodes. Some are generic and inspired by IR (ISDOpcodes.h), but some
are target-specific lower level constructs (<target>ISelLowering.h).
Instructions are sometimes associated with these opcodes
(ISDOpcodes.h; look for '[(' in the various <target>Instr*.td files)

- alternatively, FastISel (fast instruction selection) does a more
direct mapping from LLVM IR to machine instruction (look for BuildMI
calls in <target>FastISel.cpp files)

Shameless plug: I do have an out-of-tree project that tries to infer
such a mapping from the various .td files, if you do end up needing it.

Ok, I'm pretty sure that may be the files that I was reading that was so
convoluted. Yes, I would be happy if anybody were to please point me in the
direction of any tool that will allow me to know the semantics of an opcode.
Any IR will do, churning through such convoluted representation is not a
very effective use of time. I question if the cpu manual wouldn't be a
better way to go about it.

Here you go: https://github.com/repzret/dagger
Which generates a table (still using those ISD opcodes, and then some)
in, e.g., build/lib/Target/X86/X86GenSema.inc

Let me know if you need help,
-Ahmed

>
>
>>
>> -llvmdev@cs.uiuc.edu, that list isn't in use anymore.
>>
>> > Hello all,
>> >
>> > I would like to restrain the compiler that I build on my local box
from
>> > picking all but a particular set of opcodes. Is there a way to
>> > accomplish
>> > this in a straightforward way?
>>
>> Can you elaborate a bit on what you're really trying to achieve?
>
>
> Well, I need to construct a corpora. I would like a way to restrain the
> output binaries and trust that the generated binaries use an assembler
> subset that I can work with to be able to more precisely measure the
results
> of my work.

Yeah, depending on how restrictive you want to be, that can be a lot
of work; I don't think you can avoid delving into the various pieces
of ISel.

Well, the instruction subset that is being targeted is pretty substantial.

>>
>>
>> One starting point could be to introduce new subtarget features, and
>> use them as predicates on the instructions you want (or, probably,
>> those you don't want). See, e.g., http://reviews.llvm.org/D18802.
>>
>> Another approach could be to use the (also complex) SelectionDAG
>> legalization machinery to convert the operations you don't want into
>> "legal" ones (that'd be in <target>ISelLowering.cpp, and
>> Legalize*.cpp).
>>
>> > I'm pretty sure that there is a list of
>> > opcodes to semantics mappings.
>>
>> There isn't, because nothing LLVM does needs it, and, more
>> importantly, because you can't express every ISA in LLVM IR (and vice
>> versa).
>>
>
> Ok, thank you for your kind reply. I wondered if perhaps I just hadn't
found
> what I was looking for, because I know for certain that I looked some
time
> ago.
>
>>
>> > In addition, is there a way to look at an associative mapping of LLVM
IR
>> > to
>> > opcode, and/or vice versa?
>>
>> There is a mapping of sorts, but, as you saw, it's convoluted:
>>
>> - SelectionDAG (optimized instruction selection) has its own set of
>> opcodes. Some are generic and inspired by IR (ISDOpcodes.h), but some
>> are target-specific lower level constructs (<target>ISelLowering.h).
>> Instructions are sometimes associated with these opcodes
>> (ISDOpcodes.h; look for '[(' in the various <target>Instr*.td files)
>>
>> - alternatively, FastISel (fast instruction selection) does a more
>> direct mapping from LLVM IR to machine instruction (look for BuildMI
>> calls in <target>FastISel.cpp files)
>>
>>
>> Shameless plug: I do have an out-of-tree project that tries to infer
>> such a mapping from the various .td files, if you do end up needing it.
>
>
> Ok, I'm pretty sure that may be the files that I was reading that was so
> convoluted. Yes, I would be happy if anybody were to please point me in
the
> direction of any tool that will allow me to know the semantics of an
opcode.
> Any IR will do, churning through such convoluted representation is not a
> very effective use of time. I question if the cpu manual wouldn't be a
> better way to go about it.

Here you go: https://github.com/repzret/dagger
Which generates a table (still using those ISD opcodes, and then some)
in, e.g., build/lib/Target/X86/X86GenSema.inc

Thanks! I'll check it out.