Tablegen question

I want to add a set of "properties" to each instruction in my
instruction set, and want to be able to query that in my
machine-specific optimizations. My Insts.td file looks as follows :

class InstProperty;

def propX : InstProperty;
def propY : InstProperty;
def propZ : InstProperty;

class myInst<..., list<InstProperty> props> : Instruction {
  ...
  ...
  list<InstProperty> Properties=props;
}

def i1 : myInst<..., [propX]>;
def i2 : myInst<..., [propX, propZ]>;
def i3 : myInst<..., []>;

I want to add a Tablegen backend that would look for Instructions
derived from myInst, look for Properties field, and output a
opcode->bitfield map as follows:

PropMap[myInst::i1] = propX;
PropMap[myInst::i2] = propX | propZ;
PropMap[myInst::i3] = 0;

where propX,propY,and propZ will be defined as appropriate enums.
Now, I looked at the current Tablegen backends, and I didn't find any
that are specific to a particular architecture backend, say, to X86 or
Sparc etc.. That got me wondering if I am breaking some conventions
and not doing things in the LLVM-way. Is what I am trying to do an
intended use case for Tablegen? Why aren't there Tablegen backends
specific to some architecture backends? I would be grateful for any
advice from the community.

Manjunath

Manjunath,
I asked this question recently, but instead of telling you to search the archive
I'm going to take it as a chance to recall how to do it (because I'll do so anyway
and even telling tablegen to use an enum for instructions is not as trivial to do
as you might think : )

I wrote it up in the wiki at
http://wiki.llvm.org/HowTo:_Add_arbitrary_properties_to_instructions

Regards,
Christian

Christian,

Thanks for your reply and the wiki entry. I did search the archives,
but evidently I didn't search for the right thing. My bad. Anyways, I
am still wondering about the other part of my question. Why aren't
there Tablegen backends specific to some architecture backends? Let me
describe a different scenario. Suppose my architecture has vector and
scalar units, and suppose I want to write a machine function pass that
takes some work from the vector units and gives it to the scalar
units, for load balancing purposes. In that pass, I need to convert
some vector instructions in to a sequence of scalar instructions. For
this, I need to know what scalar instruction to generate for a vector
instruction. For example, myInst::AddV4 should become 4 myInst::Add,
etc. I need this opcode->opcode map, that can provide me this
information for all vector instructions. One way I can think of is,
put this information along with the vector instruction description in
the .td file.

class vecInst<...,myInst s> : myInst<...> {
  myInst scalarVersion=s;
}

def AddV4 : vecInst<..., Add>;
def SubV2 : vecInst<..., Sub>;
...
def ORV4 : vecInst<..., OR>;

Now, I can write a tablegen backend that can output a opcode->opcode
map, which can be used by the pass. Is this and intended use case for
Tablegen? Or, are tablegen backends supposed to be generic to all
architecture backends?

Manjunath

In general, it's a design goal to structure CodeGen features as
target-independent code parameterized with target-specific data.
The degree to which this goal is met in LLVM CodeGen features today
varies, but features that have been implemented using TableGen have
been relatively successful.

Dan

Dan,

In general, it's a design goal to structure CodeGen features as
target-independent code parameterized with target-specific data.
The degree to which this goal is met in LLVM CodeGen features today
varies, but features that have been implemented using TableGen have
been relatively successful.

Can you give an example of a relatively successful instance where
Tablegen was used to implement something specific for a target?

Manjunath

All of the tablegen backends work this way. As you mentioned,
there are no target-specific tablegen backends at present.

The underlying observation here is that features are never
fundamentally "specific for a target". For example, a mapping
between vector opcodes and associated scalar opcodes could
reasonably be made on many architectures. Even
load-balancing between functional units on a processor is a
target-independent concept, with details like the number and
nature of the functional units being target-dependent.

Dan

All of the tablegen backends work this way. As you mentioned,
there are no target-specific tablegen backends at present.

The underlying observation here is that features are never
fundamentally "specific for a target". For example, a mapping
between vector opcodes and associated scalar opcodes could
reasonably be made on many architectures. Even
load-balancing between functional units on a processor is a
target-independent concept, with details like the number and
nature of the functional units being target-dependent.

Sorry to be such a pest, but I am still trying to understand the usage
model for tablegen. Are you saying it is not a good idea to write a
tablegen backend for something very specific to a target? The examples
I gave happen to be applicable to many targets. But the usage depends
on AN implementation of codegen for a target, no? I mean, I could
choose to put the related scalar instruction in a field with a
specific name in the myInst class in the .td file, and would want to
populate a data structure with a specific name in my C++ code. The
tablegen backend should "know" the names of both the field in the .td
file and the name of the data structure. How can I make this generic?

Thanks,
Manjunath

All of the tablegen backends work this way. As you mentioned,

there are no target-specific tablegen backends at present.

The underlying observation here is that features are never

fundamentally "specific for a target". For example, a mapping

between vector opcodes and associated scalar opcodes could

reasonably be made on many architectures. Even

load-balancing between functional units on a processor is a

target-independent concept, with details like the number and

nature of the functional units being target-dependent.

Sorry to be such a pest, but I am still trying to understand the usage
model for tablegen. Are you saying it is not a good idea to write a
tablegen backend for something very specific to a target?

The underlying observation here is that features are never
fundamentally "specific for a target".

The examples
I gave happen to be applicable to many targets. But the usage depends
on AN implementation of codegen for a target, no? I mean, I could
choose to put the related scalar instruction in a field with a
specific name in the myInst class in the .td file, and would want to
populate a data structure with a specific name in my C++ code. The
tablegen backend should "know" the names of both the field in the .td
file and the name of the data structure. How can I make this generic?

It's hard to say without knowing more details, but in general the
way to do this is to make the data-types used in your C++ code
target-independent. Obviously the actual data would be
target-dependent. Then the code that uses the data structures and
the tablegen backend could both be target-independent.

In general, when features are designed in this way, it is easier
to reuse the code for new targets.

Dan

Hi all,

Greetings. I'm a Ph.D. student in UIUC. Now I'm working on SAFECode, a
research compiler based on LLVM which insert necessary runtime checks
to guarantee memory-safety of programs. SAFECode needs to insert checks
into the programs (say, please check this load instruction for me).

Currently SAFECode inserts these checks as normal call instructions. It
would be great that LLVM can treat them as first-class intrinsics (like
"llvm.ctz"), which have additional semantics and could be lowered as
ordinary function calls in subsequent passes.

This would be very useful because 1) It simplifies the analysis logic 2)
LLVM can apply out-of-box compiler optimization technique way more
easily on these programs (for example, SAFECode has special hacks to
teach the LICM pass understand these runtime checks) 3) It completely
avoid the naming conflicts between the tool and the program.

Based on my observation, there are a number of research tools might have
the same requirement above. For instance, Automatic Pool
Allocation(PLDI'05), KLEE(OSDI'08) and SoftBound(PLDI'09) all insert
special intrinsics into programs to perform domain-specific tasks.
Having pluggable, first-class intrinsics would simplify the tasks a lot.

I'm glad to dig in and implement it if you guys are interested. It seems
to me that simply making llvm::CallInst inheritable would be enough.

Comments and suggestions are highly appreciated.

Thanks.

Haohui

Greetings. I'm a Ph.D. student in UIUC. Now I'm working on SAFECode, a
research compiler based on LLVM which insert necessary runtime checks
to guarantee memory-safety of programs. SAFECode needs to insert checks
into the programs (say, please check this load instruction for me).

Hi.

Currently SAFECode inserts these checks as normal call instructions. It
would be great that LLVM can treat them as first-class intrinsics (like
"llvm.ctz"), which have additional semantics and could be lowered as
ordinary function calls in subsequent passes.

I was just about to recommend using normal function calls :).

This would be very useful because 1) It simplifies the analysis logic 2)
LLVM can apply out-of-box compiler optimization technique way more
easily on these programs (for example, SAFECode has special hacks to
teach the LICM pass understand these runtime checks) 3) It completely
avoid the naming conflicts between the tool and the program.

I don't follow. Why does it simplify the analysis logic? Also, aren't function attributes like "readonly" enough to teach the optimizer about your functions?

-Chris

> Greetings. I'm a Ph.D. student in UIUC. Now I'm working on SAFECode, a
> research compiler based on LLVM which insert necessary runtime checks
> to guarantee memory-safety of programs. SAFECode needs to insert
> checks
> into the programs (say, please check this load instruction for me).

Hi.

> Currently SAFECode inserts these checks as normal call instructions.
> It
> would be great that LLVM can treat them as first-class intrinsics
> (like
> "llvm.ctz"), which have additional semantics and could be lowered as
> ordinary function calls in subsequent passes.

I was just about to recommend using normal function calls :).

> This would be very useful because 1) It simplifies the analysis
> logic 2)
> LLVM can apply out-of-box compiler optimization technique way more
> easily on these programs (for example, SAFECode has special hacks to
> teach the LICM pass understand these runtime checks) 3) It completely
> avoid the naming conflicts between the tool and the program.

I don't follow. Why does it simplify the analysis logic? Also,
aren't function attributes like "readonly" enough to teach the
optimizer about your functions?

In fact, they are not really "readonly" functions. These checking
functions manipulate some metadata. If they are marked as readonly,
ADCE will kill them. :slight_smile: