Manipulating global address inside GlobalAddress SDNode in (RISCV) LLVM backend

Hello,

Brief background: We are trying to support 64 bit pointers in RISCV 32 bit backend
http://lists.llvm.org/pipermail/llvm-dev/2019-June/132805.html

To pass the legalizer we plan to break the 64 bit GlobalAddress into 32 bit GlobalAddress having the other 32 bit glued to the node. We could not find a direct way to convert the 64 bit GlobalAddress Node into a 32 bit node.

For a GlobalAddress node say i64 = GlobalAddress<0xHighLow> we want to convert it into i32 = GlobalAddress<0xLow>.

[The below part is in reference with the RISCV LLVM backend]
If there is no direct way to do this, we plan to fall back on a backup plan to convert the GlobalAddress node into the required LUI and ADDI pair but that would require the addition of two new target flag in RISCVII namespace.

  • Reshabh

For a GlobalAddress node say i64 = GlobalAddress<0xHighLow> we want to convert it into i32 = GlobalAddress<0xLow>.

I think you'd have to convert it into a custom RISCVGlobalAddressLow
and RISCVGlobalAddressHigh pair because the type of GlobalAddress is
fixed to pointer type in TargetSelectionDAG.td (that's not 100% set in
stone, but I wouldn't violate it lightly). And that might well be
functionally equivalent to making an LUI/ADDI pair directly.

If there is no direct way to do this, we plan to fall back on a backup plan to convert the GlobalAddress node into the required LUI and ADDI pair but that would require the addition of two new target flag in RISCVII namespace.

I don't think there's a real shortage of those, but I confess I'm not
sure why that's related. You'd need a representation for the LUI and
ADDI after instruction selection anyway.

Cheers.

Tim.

I don’t think there’s a real shortage of those, but I confess I’m not
sure why that’s related. You’d need a representation for the LUI and
ADDI after instruction selection anyway.

Yeah at the end we need a representation for LUI and ADDI. We were trying to break the 64 bit address from GlobalAddress node into two i32 register. We will add custom load/store which will generate the address using values from two registers. We thought LUI and ADDI pair will be good to store the values in a i32 register. If we could transform GlobalAddress<0xHighLow> directly to GlobalAddress<0xLow>, we could use the present RISCVII::MO_HI and MO_LO as they only exact the 32 high bits. What do you think?
Many thanks for your reply :slight_smile:

We thought LUI and ADDI pair will be good to store the values in a i32 register.

With you so far, I think. To be explicit, to materialize a full 64-bit
pointer you'd need 4 instructions:

    lui rLO32, addr:MO_LO32_LO
    addi rLO32, rLO32, addr:MO_LO32_HI
    lui rHI32, addr:MO_HI32_LO
    addi rHI32, rLO32, addr:MO_LO32_HI

or some variation for PIC etc.

If we could transform GlobalAddress<0xHighLow> directly to GlobalAddress<0xLow>, we could use the present RISCVII::MO_HI and MO_LO as they only exact the 32 high bits. What do you think?

I still recommend against reusing GlobalAddress as-is with an i32
type, but that's probably a minor detail. The only way I can see to
reuse the existing modifiers unambiguously would be to modify the
above sequence to:

    lui rLO32, addr:MO_LO
    addi rLO32, rLO32, addr:MO_LO
    lui rHI32, addr:MO_HI
    addi rHI32, rLO32, addr:MO_HI

It kind of works, but personally I think it's stretching the
understood semantics of MO_LO and MO_HI too far -- I'd add new ones if
it was me. But I'm not an active RISC-V maintainer so take my opinions
with a grain of salt.

Cheers.

Tim.

We thought LUI and ADDI pair will be good to store the values in a i32 register.

With you so far, I think. To be explicit, to materialize a full 64-bit
pointer you’d need 4 instructions:

lui rLO32, addr:MO_LO32_LO
addi rLO32, rLO32, addr:MO_LO32_HI
lui rHI32, addr:MO_HI32_LO
addi rHI32, rLO32, addr:MO_LO32_HI

or some variation for PIC etc.

If we could transform GlobalAddress<0xHighLow> directly to GlobalAddress<0xLow>, we could use the present RISCVII::MO_HI and MO_LO as they only exact the 32 high bits. What do you think?

I still recommend against reusing GlobalAddress as-is with an i32
type, but that’s probably a minor detail. The only way I can see to
reuse the existing modifiers unambiguously would be to modify the
above sequence to:

lui rLO32, addr:MO_LO
addi rLO32, rLO32, addr:MO_LO
lui rHI32, addr:MO_HI
addi rHI32, rLO32, addr:MO_HI

It kind of works, but personally I think it’s stretching the
understood semantics of MO_LO and MO_HI too far – I’d add new ones if
it was me. But I’m not an active RISC-V maintainer so take my opinions
with a grain of salt.

Ah now I could see it more clearly. I was not sure that should I add them (MO_LO32_LO and MO_LO32_HI), btw this was backup plan. Probably for now we are going with this. I implemented them today and they seem to work well.

Many thanks,
Reshabh

By the way, I probably should have said sooner but most targets with
64-bit pointers don't (at least in the default mode) materialize
64-bit absolute pointers as we've been discussing.

x86 requires all global variables live with code in the low 2GB of
memory, which allows direct use of %rip-relative addressing-modes.
AArch64 requires all globals & code to be within 4GB of each other at
an arbitrary location in memory.

If you adopted similar constraints for RISC-V you could probably use
the existing code virtually unchanged.

Cheers.

Tim.

Ah now I could see it more clearly. I was not sure that should I add them (MO_LO32_LO and MO_LO32_HI), btw this was backup plan. Probably for now we are going with this. I implemented them today and they seem to work well.

By the way, I probably should have said sooner but most targets with
64-bit pointers don’t (at least in the default mode) materialize
64-bit absolute pointers as we’ve been discussing.

x86 requires all global variables live with code in the low 2GB of
memory, which allows direct use of %rip-relative addressing-modes.
AArch64 requires all globals & code to be within 4GB of each other at
an arbitrary location in memory.

Correct me if I understood it wrong, keeping global variables at some arbitrary location in memory with a limit of 4GB so that they can be addressed in 32 bits?

If you adopted similar constraints for RISC-V you could probably use
the existing code virtually unchanged.

We are trying to support 4GB+ memory in address space 1 using 64 bit pointers in that address space, I guess then this might not apply? What do you think?

Correct me if I understood it wrong, keeping global variables at some arbitrary location in memory with a limit of 4GB so that they can be addressed in 32 bits?

Yes, that's right. The concept is called a "code model". You can play
with -mcmodel=small or large to see how it affects codegen on x86 and
AArch64. Both have "small" as the default, which allows all globals
(and other related things like vtables, literal strings etc) to be
addressed pretty much as if in 32-bit mode.

If you adopted similar constraints for RISC-V you could probably use
the existing code virtually unchanged.

We are trying to support 4GB+ memory in address space 1 using 64 bit pointers in that address space, I guess then this might not apply? What do you think?

That's mostly making sure 64-bit pointers don't get truncated and get
applied correctly (they can still come from something like mmap, or
even malloc). But you still get to choose layout constraints if you
want.

I think the main caveat is that JITs often require support for "large"
(i.e. the full 64-bit addressing you've been implementing up to now),
precisely because they might be handed large pointers by mmap.

BTW, since you're running under 32-bit RISC-V, how do you actually
plan to load from a 64-bit pointer? Isn't the hardware just missing?

Cheers.

Tim.

Correct me if I understood it wrong, keeping global variables at some arbitrary location in memory with a limit of 4GB so that they can be addressed in 32 bits?

Yes, that’s right. The concept is called a “code model”. You can play
with -mcmodel=small or large to see how it affects codegen on x86 and
AArch64. Both have “small” as the default, which allows all globals
(and other related things like vtables, literal strings etc) to be
addressed pretty much as if in 32-bit mode.

Thanks, that’s cool.

If you adopted similar constraints for RISC-V you could probably use
the existing code virtually unchanged.

We are trying to support 4GB+ memory in address space 1 using 64 bit pointers in that address space, I guess then this might not apply? What do you think?

That’s mostly making sure 64-bit pointers don’t get truncated and get
applied correctly (they can still come from something like mmap, or
even malloc). But you still get to choose layout constraints if you
want.

I think the main caveat is that JITs often require support for “large”
(i.e. the full 64-bit addressing you’ve been implementing up to now),
precisely because they might be handed large pointers by mmap.

BTW, since you’re running under 32-bit RISC-V, how do you actually
plan to load from a 64-bit pointer? Isn’t the hardware just missing?

We are working on a RV32 GPU ISA for an open source RISCV based GPGPU (http://bjump.org/manycore/). So I can easily get the minimal required hardware support for such features :slight_smile: