[cfe-commits] [PATCH] [llvm+clang] memset for non-8-bit bytes

Please start a thread on llvmdev about this functionality, and outline what other intrinsics will have to change to add non-8-bit byte support.

Well, memset is the only we have seen so far (our back-end is ~50% finished for an initial release). We have our own front-end as well (we are currently not using the clang front-end), and currently don't use many llvm intrinsics (only llvm.stacksave/llvm.stackrestore). The memset intrinsic is generated by opt.

This isn't the sort of feature that we just add without understanding the full impact.

The large impact we have seen is in other parts, regarding the assumtion that a byte is 8 bits. You can see a diffstat of our current patch below. However, this patch for non 8-bit bytes is a more clean approach to core changes:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120702/146050.html

It is on my todo-list to merge that patch with our changes. Do you want that kind of changes to be pushed first?

Regards,
Patrik Hägglund

include/llvm/CodeGen/SelectionDAG.h | 2 +-
include/llvm/CodeGen/ValueTypes.h | 43 ++++++++++++++++++++++++++++++++-----------
include/llvm/DataLayout.h | 22 ++++++++++++++++------
include/llvm/IRBuilder.h | 16 ++++++++++++++++
lib/Analysis/ConstantFolding.cpp | 74 ++++++++++++++++++++++++++++++++++++++++----------------------------------
lib/Analysis/ValueTracking.cpp | 22 +++++++++++++---------
lib/CodeGen/AsmPrinter/AsmPrinter.cpp | 50 ++++++++++++++++++++++++++++++++++++++------------
lib/CodeGen/AsmPrinter/AsmPrinterDwarf.cpp | 6 ++++--
lib/CodeGen/AsmPrinter/DIE.cpp | 20 ++++++++++++--------
lib/CodeGen/AsmPrinter/DwarfDebug.cpp | 7 ++++---
lib/CodeGen/MachineFunction.cpp | 2 +-
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 58 +++++++++++++++++++++++++++++++++-------------------------
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp | 63 ++++++++++++++++++++++++++++++++++-----------------------------
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp | 4 ++--
lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp | 18 +++++++++---------
lib/CodeGen/SelectionDAG/LegalizeTypes.cpp | 2 +-
lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp | 12 ++++++------
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp | 14 +++++++-------
lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 89 +++++++++++++++++++++++++++++++++++++++++++----------------------------------------------
lib/CodeGen/SelectionDAG/TargetLowering.cpp | 6 +++---
lib/Transforms/InstCombine/InstCombineCalls.cpp | 6 ++++--
lib/Transforms/Scalar/GVN.cpp | 48 ++++++++++++++++++++++++++++++------------------
lib/Transforms/Scalar/SROA.cpp | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------------
lib/Transforms/Scalar/ScalarReplAggregates.cpp | 18 +++++++++++-------
lib/VMCore/DataLayout.cpp | 65 ++++++++++++++++++++++++++++++++++++++++++++++-------------------
lib/VMCore/IRBuilder.cpp | 23 ++++++++++++++++++-----
lib/VMCore/ValueTypes.cpp | 2 ++
lib/VMCore/Verifier.cpp | 2 ++
test/CodeGen/PowerPC/structsinmem.ll | 8 ++++----
test/CodeGen/PowerPC/structsinregs.ll | 14 +++++++-------
test/CodeGen/X86/memcpy-2.ll | 4 ++--
test/CodeGen/X86/pr11985.ll | 6 +++---
test/CodeGen/X86/unaligned-load.ll | 10 +++++++++-
33 files changed, 506 insertions(+), 326 deletions(-)

I'm a bit confused by this concept. I'm aware of the archaic meaning of the word byte, but it has meant 8 bits for the last 30 years. There's even an ISO/IEC standard.

I know of architectures like Texas' C55x DSPs that address 16 bits at a time, but even their data sheets state:

• 256K Bytes Zero-Wait State On-Chip RAM, Composed of:
  • – 64K Bytes of Dual-Access RAM (DARAM), 8 Blocks of 4K x 16-Bit
  • – 192K Bytes of Single-Access RAM (SARAM), 24 Blocks of 4K x 16-Bit

Perhaps you could begin by defining more accurately what you're talking about?

/jakob

I'm assuming he means an architecture where CHAR_BIT > 8.

-Eli

AFAIK, CHAR_BIT isn't a property of the architecture, but of the C implementation. One can imagine having two different (non-ABI-compatible) C implementations for the same ISA that define CHAR_BIT differently.

--Owen

Fine, then a *target* where CHAR_BIT > 8.

-Eli

I'm a bit confused by this concept.

For the term byte, I use the "archaic" definition in the C (and C++) standard (section 3.6):

  addressable unit of data storage large enough to hold any member of the basic character
  set of the execution environment

/Patrik Hägglund

That definition isn't really relevant to LLVM, though. You can define char to be (say) 16 bits, and your frontend (clang?) just needs to set CHAR_BIT properly, and generate code with i16 whenever you wrote char.

I suspect what you want to talk about, and the part that is relevant to LLVM as opposed to clang, is supporting architectures where the minimum addressable unit is not 8 bits in size.

--Owen

I'm a bit confused by this concept.

For the term byte, I use the "archaic" definition in the C (and C++) standard (section 3.6):

addressable unit of data storage large enough to hold any member of the basic character
set of the execution environment

That definition isn't really relevant to LLVM, though. You can define char to be (say) 16 bits, and your frontend (clang?) just needs to set CHAR_BIT properly, and generate code with i16 whenever you wrote char.

That's not true; SimplifyLibCalls, for example, would perform all
sorts of bad optimizations if CHAR_BIT is not 8.

I suspect what you want to talk about, and the part that is relevant to LLVM as opposed to clang, is supporting architectures where the minimum addressable unit is not 8 bits in size.

There's also this.

-Eli

You can define char to be (say) 16 bits, and your
frontend (clang?) just needs to set CHAR_BIT properly, and
generate code with i16 whenever you wrote char.

Sorry, but this is naive. As a starting point (many more changes are needed), I suggest you take a look at this patch (not provided by me) refered in my originial email: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120702/146050.html

/Patrik Hägglund

That definition isn't really relevant to LLVM, though.
[...] the part that is relevant to LLVM as opposed to clang, is
supporting architectures where the minimum addressable unit is
not 8 bits in size.

The C standard don't make any distinction between front-end and back-end parts of the implementation. Therefore, this is not excluded.

Having a different byte size in the C implementation than supported by the ISA is mostly of theoretical value, and not what I intended here.

/Patrik Hägglund