Hi all,
I wanted to summarise some discussion on llvm-commits [0,1] as an RFC, as
I felt it demanded wider circulation.
Our support for references to absolute symbols is not very good. The
symbol will be resolved accurately in non-PIC code, but suboptimally: the
symbol reference cannot currently appear as the immediate operand of an
instruction, and the code generator cannot make any assumptions about the
value of the symbol (so for example, it could not use a R_X86_64_8
relocation if the value is known to be in the range 0..255).
In PIC mode, if the reference is not known to be DSO-local, the value is
loaded from the GOT (or a synthetic GOT entry), which again means
suboptimal code. If the reference is known to be DSO-local, the symbol will
be referenced with a PC relative relocation and therefore cannot be
resolved properly to an absolute value (c.f.
https://reviews.llvm.org/D19844). The latter case in particular would
seem to indicate that a representational change is required for correctness
to distinguish references to absolute symbols from references to regular
symbols.
The specific change I have in mind is to allow !range metadata on
GlobalObjects. This would
be similar to existing !range metadata, but it would apply to the
"address" of the attached GlobalObject, rather than any value loaded from
it. Its presence on a GlobalObject would also imply that the address of the
GlobalObject is "fixed" at link time. Alongside !range we could potentially
use other sources of information, such as the relocation model, code model
and visibility, to identify "fixed" globals, although that can be done
separately.
Ok, I think I understand the use-case.
I have been experimenting with a number of approaches to representation in
SDAG, and I have found one that seems to work best, and would be the least
intrusive (unfortunately most approaches to this problem are somewhat
intrusive).
Specifically, I want to:
1) move most of the body of ConstantSDNode to a new class,
ConstantIntSDNode, which would derives from ConstantSDNode. ConstantSDNode
would act as the base class for immediates-post-static-linking. Change
most references to ConstantSDNode in C++ code to refer to
ConstantIntSDNode. However, "imm" in tblgen code would continue to match
ConstantSDNode.
2) introduce a new derived class of ConstantSDNode for references to
globals with !range metadata, and teach SDAG to use this new derived class
for fixed address references
ConstantSDNode is poorly named, and renaming it to ConstantIntSDNode is
probably the right thing to do independently of the other changes.
That said, I don’t understand why you’d keep ConstantSDNode around and
introduce a new derived class of it. This seems like something that a new
“imm" immediate matcher would handle: it would match constants in a certain
range, or a GlobalAddressSDNode known-to-be-small.
To begin with: I'm not sure that GlobalAddressSDNode is the right node to
use for these types of immediates. It seems that we have two broad classes
of globals here: those with a fixed-at-link-time address (e.g. regular
non-PIC symbols, absolute symbols) and those where the address needs to be
computed (e.g. PC-relative addresses, TLS variables). To me it seems like
the first class is much more similar to immediates than to the second
class. That suggested to me that there ought to be two separate
representations for global variables, where the former are "morally"
immediates, and the latter are not (i.e. the existing GlobalAddressSDNode).
I went over a couple of approaches for representing "moral" immediates in
my llvm-commits post. The first one seems to be more like what you're
suggesting:
- Introduce a new opcode for absolute symbol constants. This intuitively
seemed like the least risky approach, as individual instructions could "opt
in" to the new absolute symbol references. However, this seems hard to fit
into the existing SDAG pattern matching engine, as the engine expects each
"variable" to have a specific opcode. I tried adding special support for
"either of the two constant opcodes" to the matcher, but I could not see a
good way to do it without making fundamental changes to how patterns are
matched.
- Use the ISD::Constant opcode for absolute symbol constants, but
introduce a separate class for them. This also seemed problematic, as there
is a strong assumption (both in existing SDAG code and in generated code)
of a many-to-one mapping from opcodes to classes.
We can solve part of the problem with the second approach with a base class
for ISD::Constant. As I worked on that approach, I found that it did turn
out to be a good fit overall: in many cases we're already adhering to a
principle that an unrestricted immediate maps onto potentially relocatable
bytes in the output file. The X86 and ARM backends illustrate this quite
well: the X86 instruction set generally uses power-of-2 wide immediate
forms that neatly map onto instruction bytes, and ARM generally uses
compressed immediate forms (e.g. "mod_imm") which would naturally match
only real constant integers. Using that principle, we can restrict (e.g.)
ImmLeaf to constant integers (see https://reviews.llvm.org/D25355). In
cases where this mapping isn't quite right, we can use more restrictive
matchers.
I'm still a little uneasy about the second approach, and would be
interested in my first approach, but I'm not sure if it would be practical.
Thanks,