[cfe-commits] [PATCH] [llvm+clang] memset for non-8-bit bytes

Patrik_Hagglund · October 19, 2012, 9:24am

Please start a thread on llvmdev about this functionality, and outline what other intrinsics will have to change to add non-8-bit byte support.

Well, memset is the only we have seen so far (our back-end is ~50% finished for an initial release). We have our own front-end as well (we are currently not using the clang front-end), and currently don't use many llvm intrinsics (only llvm.stacksave/llvm.stackrestore). The memset intrinsic is generated by opt.

This isn't the sort of feature that we just add without understanding the full impact.

The large impact we have seen is in other parts, regarding the assumtion that a byte is 8 bits. You can see a diffstat of our current patch below. However, this patch for non 8-bit bytes is a more clean approach to core changes:
http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120702/146050.html

It is on my todo-list to merge that patch with our changes. Do you want that kind of changes to be pushed first?

Regards,
Patrik Hägglund

include/llvm/CodeGen/SelectionDAG.h | 2 +-
include/llvm/CodeGen/ValueTypes.h | 43 ++++++++++++++++++++++++++++++++-----------
include/llvm/DataLayout.h | 22 ++++++++++++++++------
include/llvm/IRBuilder.h | 16 ++++++++++++++++
lib/Analysis/ConstantFolding.cpp | 74 ++++++++++++++++++++++++++++++++++++++++----------------------------------
lib/Analysis/ValueTracking.cpp | 22 +++++++++++++---------
lib/CodeGen/AsmPrinter/AsmPrinter.cpp | 50 ++++++++++++++++++++++++++++++++++++++------------
lib/CodeGen/AsmPrinter/AsmPrinterDwarf.cpp | 6 ++++--
lib/CodeGen/AsmPrinter/DIE.cpp | 20 ++++++++++++--------
lib/CodeGen/AsmPrinter/DwarfDebug.cpp | 7 ++++---
lib/CodeGen/MachineFunction.cpp | 2 +-
lib/CodeGen/SelectionDAG/DAGCombiner.cpp | 58 +++++++++++++++++++++++++++++++++-------------------------
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp | 63 ++++++++++++++++++++++++++++++++++-----------------------------
lib/CodeGen/SelectionDAG/LegalizeFloatTypes.cpp | 4 ++--
lib/CodeGen/SelectionDAG/LegalizeIntegerTypes.cpp | 18 +++++++++---------
lib/CodeGen/SelectionDAG/LegalizeTypes.cpp | 2 +-
lib/CodeGen/SelectionDAG/LegalizeTypesGeneric.cpp | 12 ++++++------
lib/CodeGen/SelectionDAG/LegalizeVectorTypes.cpp | 14 +++++++-------
lib/CodeGen/SelectionDAG/SelectionDAG.cpp | 89 +++++++++++++++++++++++++++++++++++++++++++----------------------------------------------
lib/CodeGen/SelectionDAG/TargetLowering.cpp | 6 +++---
lib/Transforms/InstCombine/InstCombineCalls.cpp | 6 ++++--
lib/Transforms/Scalar/GVN.cpp | 48 ++++++++++++++++++++++++++++++------------------
lib/Transforms/Scalar/SROA.cpp | 96 +++++++++++++++++++++++++++++++++++++++++++++++++++++-------------------------------------------
lib/Transforms/Scalar/ScalarReplAggregates.cpp | 18 +++++++++++-------
lib/VMCore/DataLayout.cpp | 65 ++++++++++++++++++++++++++++++++++++++++++++++-------------------
lib/VMCore/IRBuilder.cpp | 23 ++++++++++++++++++-----
lib/VMCore/ValueTypes.cpp | 2 ++
lib/VMCore/Verifier.cpp | 2 ++
test/CodeGen/PowerPC/structsinmem.ll | 8 ++++----
test/CodeGen/PowerPC/structsinregs.ll | 14 +++++++-------
test/CodeGen/X86/memcpy-2.ll | 4 ++--
test/CodeGen/X86/pr11985.ll | 6 +++---
test/CodeGen/X86/unaligned-load.ll | 10 +++++++++-
33 files changed, 506 insertions(+), 326 deletions(-)

Jakob_Stoklund_Olese · October 19, 2012, 4:27pm

I'm a bit confused by this concept. I'm aware of the archaic meaning of the word byte, but it has meant 8 bits for the last 30 years. There's even an ISO/IEC standard.

I know of architectures like Texas' C55x DSPs that address 16 bits at a time, but even their data sheets state:

• 256K Bytes Zero-Wait State On-Chip RAM, Composed of:
• – 64K Bytes of Dual-Access RAM (DARAM), 8 Blocks of 4K x 16-Bit
• – 192K Bytes of Single-Access RAM (SARAM), 24 Blocks of 4K x 16-Bit

Perhaps you could begin by defining more accurately what you're talking about?

/jakob

Eli_Friedman1 · October 19, 2012, 5:47pm

I'm assuming he means an architecture where CHAR_BIT > 8.

-Eli

resistor · October 19, 2012, 6:04pm

AFAIK, CHAR_BIT isn't a property of the architecture, but of the C implementation. One can imagine having two different (non-ABI-compatible) C implementations for the same ISA that define CHAR_BIT differently.

--Owen

Eli_Friedman1 · October 19, 2012, 6:17pm

Fine, then a *target* where CHAR_BIT > 8.

-Eli

Patrik_Hagglund · October 19, 2012, 6:43pm

I'm a bit confused by this concept.

For the term byte, I use the "archaic" definition in the C (and C++) standard (section 3.6):

addressable unit of data storage large enough to hold any member of the basic character
set of the execution environment

/Patrik Hägglund

resistor · October 19, 2012, 7:45pm

That definition isn't really relevant to LLVM, though. You can define char to be (say) 16 bits, and your frontend (clang?) just needs to set CHAR_BIT properly, and generate code with i16 whenever you wrote char.

I suspect what you want to talk about, and the part that is relevant to LLVM as opposed to clang, is supporting architectures where the minimum addressable unit is not 8 bits in size.

--Owen

Eli_Friedman1 · October 19, 2012, 7:59pm

I'm a bit confused by this concept.

For the term byte, I use the "archaic" definition in the C (and C++) standard (section 3.6):

addressable unit of data storage large enough to hold any member of the basic character
set of the execution environment

That definition isn't really relevant to LLVM, though. You can define char to be (say) 16 bits, and your frontend (clang?) just needs to set CHAR_BIT properly, and generate code with i16 whenever you wrote char.

That's not true; SimplifyLibCalls, for example, would perform all
sorts of bad optimizations if CHAR_BIT is not 8.

I suspect what you want to talk about, and the part that is relevant to LLVM as opposed to clang, is supporting architectures where the minimum addressable unit is not 8 bits in size.

There's also this.

-Eli

Patrik_Hagglund · October 19, 2012, 8:38pm

You can define char to be (say) 16 bits, and your
frontend (clang?) just needs to set CHAR_BIT properly, and
generate code with i16 whenever you wrote char.

Sorry, but this is naive. As a starting point (many more changes are needed), I suggest you take a look at this patch (not provided by me) refered in my originial email: http://lists.cs.uiuc.edu/pipermail/llvm-commits/Week-of-Mon-20120702/146050.html

/Patrik Hägglund

Patrik_Hagglund · October 20, 2012, 5:20am

That definition isn't really relevant to LLVM, though.
[...] the part that is relevant to LLVM as opposed to clang, is
supporting architectures where the minimum addressable unit is
not 8 bits in size.

The C standard don't make any distinction between front-end and back-end parts of the implementation. Therefore, this is not excluded.

Having a different byte size in the C implementation than supported by the ISA is mostly of theoretical value, and not what I intended here.

/Patrik Hägglund

Topic		Replies	Views
n-bit bytes for clang/llvm LLVM Dev List Archives	11	89	March 18, 2015
Support for byte sizes larger than 8 bits LLVM Dev List Archives	1	101	August 18, 2015
array initialization with memcpy vs memset Clang Frontend	0	126	July 27, 2018
[cfe-commits] [PATCH] Add PNaCl ABIInfo Clang Frontend	2	99	September 27, 2011
[PATCH][RFC] Add llvm.codegen Intrinsic To Support Embedded LLVM IR Code Generation LLVM Dev List Archives	1	115	May 15, 2012

[cfe-commits] [PATCH] [llvm+clang] memset for non-8-bit bytes

Related topics