[PATCH][RFC] HSAIL Target

Hi,

AMD would like to propose including an LLVM backend for the HSAIL target. Patches for review are attached and can also be found at https://github.com/HSAFoundation/HLC-HSAIL-Development-LLVM/ on the hsail-review branch. Most of the recent work is visible on the hsail-1.0f branch, which is based on an LLVM commit approximately 1 month before 3.6 branched. The hsail-review branch is the content of hsail-1.0f with the minimal changes required to port it to trunk in the approximate squashed form I plan on eventually committing to the LLVM repository.

First, background and a description of the backend:

HSAIL is a virtual target machine similar in spirit to AMDIL and NVPTX. A language compiler produces HSAIL text (or the binary format, called BRIG) and an HSA implementation provides an HSAIL finalizer which produces a binary for execution on a physical device. This backend implements the HSA 1.0 standard published at http://www.hsafoundation.com/standards/. The AMD OpenCL 2.0 implementation uses HSAIL via LLVM when targeting any GPU device, supporting Sea Islands GPUs and later. This backend is unrelated to the R600 / AMDGPU target which is already in tree. This HSAIL backend is multivendor, and capable of supporting multiple generations of ISA from any one vendor. About 8 months ago, the HSAIL target was forked off of the internal version and I’ve been working since then to catch it up to LLVM trunk and into a state acceptable for upstreaming. To expedite this process, various features (such as image support, debug info, and a few optimizations) were dropped, so this version is currently not as full featured as the internal version. The intention is to gradually merge that functionality back into this version once upstream, and eventually base AMD’s internal version on the upstream target and be minimally divergent. There is still some work I believe needs to be done before it is ready to be committed, but at this point I think it is ready for others to start providing feedback on areas where I should be focusing to get there.

With the current set of features this HSAIL backend is able to execute C++AMP programs by following the directions here: https://bitbucket.org/multicoreware/cppamp-driver-ng/wiki/HSA%20Support%20Status. Over time, we intend to evolve to a full C++17 compiler on the LLVM HSAIL back end with the ability to target the GPU for parallel regions. Other languages will also use this path over time.

One noteworthy difference from how other targets are structured is there are two code emission paths in the backend. The first path, which is the original used by the internal OpenCL compiler, uses a third party library (libHSAIL) for code emission plugged into the AsmPrinter. The reason for this is partially legacy, and partially because the HSA specification defines its own object format, BRIG, which is unlike any of the supported object formats such as ELF. Supporting BRIG in MC would be a challenge (largely because it is not really streamable), but attempting to emit it using the standard infrastructure may be a consideration in the future. This path supports emitting object files and text output via libHSAIL’s disassembler. The second path which I’ve implemented over the past few months uses a normal AsmPrinter emitting HSAIL text using the standard MC infrastructure, and does not support object output. The two paths have similar pass rates on the C++ AMP conformance test suite, and should produce the same output except for some whitespace and comment differences.

The backend currently has a cmake option to control whether libHSAIL will be used or not, defaulting to using it if libHSAIL is found. Developers building LLVM with or without HSAIL support will not need libHSAIL to build the backend or run tests, all of which pass using either AsmPrinter. libHSAIL and associated assembler / disassembler can be found at https://github.com/HSAFoundation/HSAIL-Tools.

Currently I have left it as an experimental target, not built by default. I have only implemented the build support necessary for cmake. I’m not planning on adding the autotools support, since hopefully that will be gone soon enough.

My highest priority is to get the backend upstreamed as soon as possible, so I would appreciate feedback about any kinds of blocking issues on that.

Thanks

Matt Arsenault

hsail_review.tar.bz2 (205 KB)

I have also posted the main patch to http://reviews.llvm.org/D9751

I haven't had a chance to look at it myself, but assuming that you get someone to review it in detail, it is following best-practices, and has reasonable testcases, I think it is reasonable to accept as an experimental backend.

-Chris

Can you provide some high-level statistics on the amount of code involved in the various pieces? It’s a crude approximation of how complex the pieces (backend, assembler, disassembler) are.

If HSAIL is similar to AMDIL and NVPTX, how does it compare to SPIR-V? Is it more of a backend, with lots of lowering, or more of a portable program representation?

Can you provide some high-level statistics on the amount of code involved in the various pieces? It’s a crude approximation of how complex the pieces (backend, assembler, disassembler) are.

There is no assembler, inline assembly support or disassembler support and no current plan to implement those. The backend is all there is, and I think it’s pretty similar in size to other targets at this time. The parts I’ve spent the most time on and were the most problematic were for the required pretty printing of various components, so most of the complexity of the backend is really for printing purposes.

If HSAIL is similar to AMDIL and NVPTX, how does it compare to SPIR-V? Is it more of a backend, with lots of lowering, or more of a portable program representation?

It’s a replacement for AMDIL. It is a virtual ISA, with physical registers and mostly goes through the normal target code generation paths. It is not similar in design to SPIR-V, although its purpose is pretty similar.

-Matt

One noteworthy difference from how other targets are structured is there are two code emission paths in the backend. The first path, which is the original used by the internal OpenCL compiler, uses a third party library (libHSAIL) for code emission plugged into the AsmPrinter. The reason for this is partially legacy, and partially because the HSA specification defines its own object format, BRIG, which is unlike any of the supported object formats such as ELF. Supporting BRIG in MC would be a challenge (largely because it is not really streamable), but attempting to emit it using the standard infrastructure may be a consideration in the future. This path supports emitting object files and text output via libHSAIL’s disassembler. The second path which I’ve implemented over the past few months uses a normal AsmPrinter emitting HSAIL text using the standard MC infrastructure, and does not support object output. The two paths have similar pass rates on the C++ AMP conformance test suite, and should produce the same output except for some whitespace and comment differences.

This part is scary.

Having a third party library dependency is very undesirable from a testing perspective.

One of the important property of MC is avoiding the need for two code paths in the code generator.

If MC cannot support the format you need, we should work on fixing that in a way that maintains the property that most code is shared when writing objects or assembly. This is a need that is shared by Webassembly I think.

My suggestion would be to start with just the assembly printing path and work to figure out what needs to happen in MC.

Cheers,
Rafael

One noteworthy difference from how other targets are structured is there are two code emission paths in the backend. The first path, which is the original used by the internal OpenCL compiler, uses a third party library (libHSAIL) for code emission plugged into the AsmPrinter. The reason for this is partially legacy, and partially because the HSA specification defines its own object format, BRIG, which is unlike any of the supported object formats such as ELF. Supporting BRIG in MC would be a challenge (largely because it is not really streamable), but attempting to emit it using the standard infrastructure may be a consideration in the future. This path supports emitting object files and text output via libHSAIL’s disassembler. The second path which I’ve implemented over the past few months uses a normal AsmPrinter emitting HSAIL text using the standard MC infrastructure, and does not support object output. The two paths have similar pass rates on the C++ AMP conformance test suite, and should produce the same output except for some whitespace and comment differences.

This part is scary.

Having a third party library dependency is very undesirable from a testing perspective.

One of the important property of MC is avoiding the need for two code paths in the code generator.

If MC cannot support the format you need, we should work on fixing that in a way that maintains the property that most code is shared when writing objects or assembly. This is a need that is shared by Webassembly I think.

My suggestion would be to start with just the assembly printing path and work to figure out what needs to happen in MC.

I wonder if we need a ‘raw’ version of MC which literally only emits bytes to the stream and doesn’t nothing else. Of course there would be a ton of details to work out, like whether the raw MC supports sections, symbols, relocations, etc. That might be just as much work as just adding a BRIG emitter.

Just a thought though.

Cheers,
Pete

>
>
> > One noteworthy difference from how other targets are structured is there are two code emission paths in the backend. The first path, which is the original used by the internal OpenCL compiler, uses a third party library (libHSAIL) for code emission plugged into the AsmPrinter. The reason for this is partially legacy, and partially because the HSA specification defines its own object format, BRIG, which is unlike any of the supported object formats such as ELF. Supporting BRIG in MC would be a challenge (largely because it is not really streamable), but attempting to emit it using the standard infrastructure may be a consideration in the future. This path supports emitting object files and text output via libHSAIL's disassembler. The second path which I've implemented over the past few months uses a normal AsmPrinter emitting HSAIL text using the standard MC infrastructure, and does not support object output. The two paths have similar pass rates on the C++ AMP conformance test suite, and should produce the same output except for some whitespace and comment differences.
>
> This part is scary.
>
> Having a third party library dependency is very undesirable from a testing perspective.
>
> One of the important property of MC is avoiding the need for two code paths in the code generator.
>
> If MC cannot support the format you need, we should work on fixing that in a way that maintains the property that most code is shared when writing objects or assembly. This is a need that is shared by Webassembly I think.
>
> My suggestion would be to start with just the assembly printing path and work to figure out what needs to happen in MC.
>
I wonder if we need a ‘raw’ version of MC which literally only emits bytes to the stream and doesn’t nothing else. Of course there would be a ton of details to work out, like whether the raw MC supports sections, symbols, relocations, etc. That might be just as much work as just adding a BRIG emitter.

This used to exist. It was called MCPureStreamer and was removed about a year ago.
We may be able to bring it back if it would be useful.

-Tom

This part is scary.

Having a third party library dependency is very undesirable from a testing perspective.

I agree, but it’s what we are stuck with for now. It’s an optional dependency now, so most people building LLVM won’t need to worry about it

One of the important property of MC is avoiding the need for two code paths in the code generator.

If MC cannot support the format you need, we should work on fixing that in a way that maintains the property that most code is shared when writing objects or assembly. This is a need that is shared by Webassembly I think.

My suggestion would be to start with just the assembly printing path and work to figure out what needs to happen in MC.

It will take a long time to come up with a replacement for emitting BRIG in MC. How blocking of an issue is this to getting this committed? If really necessary, I can strip out the BRIG stuff, but would need to constantly maintain a patch re-adding it on top of trunk which would be a huge hassle so I would rather not.

-Matt

One of the important property of MC is avoiding the need for two code
paths in the code generator.

If MC cannot support the format you need, we should work on fixing that in
a way that maintains the property that most code is shared when writing
objects or assembly. This is a need that is shared by Webassembly I think.

My suggestion would be to start with just the assembly printing path and
work to figure out what needs to happen in MC.

It will take a long time to come up with a replacement for emitting BRIG
in MC. How blocking of an issue is this to getting this committed? If
really necessary, I can strip out the BRIG stuff, but would need to
constantly maintain a patch re-adding it on top of trunk which would be a
huge hassle so I would rather not.

Dan and I have talked to a few folks who have given us the same feedback
for WebAssembly: we should consider using MC for virtual ISA emission, even
though it isn't quite suitable as it exists in LLVM today. It's an
opportunity to fix LLVM. Dan and I don't know much about MC, but we are
going to explore this route.

I have no opinion on how this should affect HSAIL's current patch. Dan and
I are willing to take the lead on this, and would be happy to work with
other virtual ISAs to make sure architectural changes to MC make sense for
their usecases.

This part is scary.

Having a third party library dependency is very undesirable from a testing
perspective.

I agree, but it’s what we are stuck with for now. It’s an optional
dependency now, so most people building LLVM won’t need to worry about it

One of the important property of MC is avoiding the need for two code
paths in the code generator.

If MC cannot support the format you need, we should work on fixing that in
a way that maintains the property that most code is shared when writing
objects or assembly. This is a need that is shared by Webassembly I think.

My suggestion would be to start with just the assembly printing path and
work to figure out what needs to happen in MC.

It will take a long time to come up with a replacement for emitting BRIG
in MC. How blocking of an issue is this to getting this committed? If
really necessary, I can strip out the BRIG stuff, but would need to
constantly maintain a patch re-adding it on top of trunk which would be a
huge hassle so I would rather not.

Could you maybe explain a bit more about BRIG and the barriers to using MC
for it?

-- Sean Silva

One of the important property of MC is avoiding the need for two code

paths in the code generator.

If MC cannot support the format you need, we should work on fixing that
in a way that maintains the property that most code is shared when writing
objects or assembly. This is a need that is shared by Webassembly I think.

My suggestion would be to start with just the assembly printing path and
work to figure out what needs to happen in MC.

It will take a long time to come up with a replacement for emitting BRIG
in MC. How blocking of an issue is this to getting this committed? If
really necessary, I can strip out the BRIG stuff, but would need to
constantly maintain a patch re-adding it on top of trunk which would be a
huge hassle so I would rather not.

Dan and I have talked to a few folks who have given us the same feedback
for WebAssembly: we should consider using MC for virtual ISA emission, even
though it isn't quite suitable as it exists in LLVM today. It's an
opportunity to fix LLVM. Dan and I don't know much about MC, but we are
going to explore this route.

It might be worth having a discussion on LLVMDev about "virtual ISA's".
With HSAIL, WebAssembly, and SPIR-V off the top of my head, it sounds like
we should have some sort of common discussion about them, since some of the
issues appear to be the same.

-- Sean Silva

The main problem is it isn’t streamable. Everything is split into multiple sections in the binary. For example, instructions have their operands placed in a different section and the instruction encoding includes the offset into the other section. libHSAIL needs to construct the full output for the module in memory and then emit code at the end, which is not how MC expects binary formats to work. This particular problem we’ve thought might be fixable with lots of custom fixups. There are also issues with debug info. One of the problems is that the text format currently doesn’t have a way of representing DWARF, and BRIG has its own special handling of DWARF in a separate section as well. Binary formats and MC aren’t areas I’m particularly familiar with. -Matt

It will take a long time to come up with a replacement for emitting BRIG in
MC. How blocking of an issue is this to getting this committed? If really
necessary, I can strip out the BRIG stuff, but would need to constantly
maintain a patch re-adding it on top of trunk which would be a huge hassle
so I would rather not.

If committed it is the LLVM community that takes the burden, so I
don't think we should do it.

Cheers,
Rafael

The main problem is it isn't streamable. Everything is split into multiple
sections in the binary. For example, instructions have their operands placed
in a different section and the instruction encoding includes the offset into
the other section. libHSAIL needs to construct the full output for the
module in memory and then emit code at the end, which is not how MC expects
binary formats to work. This particular problem we've thought might be
fixable with lots of custom fixups.

That is not that different from other formats. We build fragments for
the entire file as we go and only output everything in the end.

The fact that an istruction

foo bar, zed

creates two fragments instead of one looks fairly minor IMHO.

There are also issues with debug info. One of the problems is that the text
format currently doesn't have a way of representing DWARF, and BRIG has its
own special handling of DWARF in a separate section as well. Binary formats
and MC aren't areas I'm particularly familiar with.

That is something that has to be fixed in the text format.

Cheers,
Rafael