[RFC][LLVM] Add Support for Target Specific Asm Streamer

Goal:
Currently all assembly emission is performed through the MCAsmStreamer class. However, this class was implemented with only the GNU assembly syntax in mind. While MCTargetStreamers can provide some added platform specific customization, it is insufficient if we wish to support a new assembly syntax without creating if/else blocks everywhere in the code. While the current MCAsmStreamer implementation has worked for most platforms, the official assembler programming language on the z/OS operating system is HLASM. As a result, we wish to introduce a new SystemZHLASMAsmStreamer to provide assembly generation support for the HLASM syntax.

For some more information on HLASM, see the HLASM Language Reference.

PR: [MC][SystemZ] Introduce Target Specific HLASM Streamer for z/OS by tltao · Pull Request #130535 · llvm/llvm-project · GitHub

It seems this approach is favourable based on the comments in the PR, but for some more justification, below are some simple examples that illustrates the fundamental differences between GNU assembly syntax and HLASM assembly syntax:

  1. MCAsmStreamer::emitBytes/MCAsmStreamer::PrintQuotedString:

Current (GNU):

.ascii  "\224\250\201\204\204"          * Name of Function
…
L#__const.main.x:
  .asciz  "test test\n"

HLASM:

DC    AL2(5)                  Function Name
DC    C'myadd'
…
@@CONST@AREA@@ DS 0D
         DC    XL11'A385A2A340A385A2A31500'

The key difference is that HLASM uses the “DC” instruction to establish constants that must be emitted. But this is different from the directives concept used in the current MCAsmStreamer. In addition, in PrintQuotedString, the output hardcodes GNU/ASCII syntax use of backslashes, which is invalid HLASM syntax. Constants in HLASM also requires certain fields to be filled out (e.g. XL11 or AL2(5)) to tell the HLASM assembler details about the constant. These fields provide additional information such as type, length, and other information about the data provided.

  1. MCAsmStreamer::emitAlignmentDirective:

Current (GNU):

.globl  main 
.p2align  4

HLASM:

ENTRY MAIN
DC    0FD

Here is another example of hard coded GNU syntax when emitting alignment information. The .p2 or \t.p2 directives is not valid HLASM syntax, instead, HLASM again uses the “DC” instruction (or “DS” instruction, as will be shown later) to indicate alignment.

  1. MCAsmStreamer::emitLabel

Current (GNU)

main:

HLASM

MAIN     DS    0FD

When emitting labels, HLASM can also emit alignment information together with the label using the DS instruction, which is not do-able with the current MCAsmStreamer implementation. In addition, when we emit function labels currently in AsmPrinter via emitFunctionEntryLabel, we emit the alignment via emitAlignment first. Therefore, in HLASM, we’ll potentially need to keep track of some state information within the streamer. This would be quite messy to do within the current MCAsmStreamer.

  1. MCAsmStreamer::emitFileDirective:

Current (GNU):

.file "a.c"

HLASM:

TITLE '5650ZOS V2.4 z/OS XL C ./a.c'

Another example of hardcoded GNU syntanx with “\t.file”. This is not valid HLASM syntax.

  1. MCAsmStreamer::emitValueImpl:

Current (GNU):

.byte 128                             * PPA1 Flags 1
                                      *   Bit 0: 1 = 64-bit DSA
.byte 128                             * PPA1 Flags 2
                                      *   Bit 0: 1 = External procedure
                                      *   Bit 3: 0 = STACKPROTECT is not enabled
.byte 0                               * PPA1 Flags 3
.byte 129                             * PPA1 Flags 4
                                      *   Bit 7: 1 = Name Length and Name
.short  0                               * Length/4 of Parms

HLASM:

DC    BL1'10000000'           Flag Set 1
DC    BL1'10000001'           Flag Set 2
DC    BL1'00000000'           Flag Set 3
DC    BL1'00000001'           Flag Set 4

EmitValue is used to print a lot of the control sections and flags needed in the object format. Currently it only understands the GNU syntax and the use of certain size related directives. However, once again, for HLASM we’ll need to use DC instead, with additional HLASM specific modifiers (BL1) in this case.

  1. EmitEOL:

HLASM has a character limit of 80 per line. It’s possible to extend the line past this limit, but it requires the use of specific characters on the continuation column. In addition, the following line must begin on another specific column for it to be valid. This means we cannot simply use the current EmitEOL implementation and will require additional logic to handle this complexity.

  1. Uses on other platforms:

There are functions in MCAsmStreamer that are specific to certain object formats. E.g. COFF/XCOFF/ELF. By introduction the additional framework in TargetRegistry, it allows other platforms to also have their own platform specific AsmStreamer, which could be beneficial for multiple platforms.

I have also seen comments about wishing for additional MCAsmStreamer for other .s formats. E.g. from NVPTXAsmPrinter.h:

// The ptx syntax and format is very different from that usually seem in a .s
// file,
// therefore we are not able to use the MCAsmStreamer interface here.
//
// We are handcrafting the output method here.
//
// A better approach is to clone the MCAsmStreamer to a MCPTXAsmStreamer
// (subclass of MCStreamer).

For further interest, below is an example of HLASM output for a simple “helloworld” program to illustrate the differences it has with GNU Asm:
a2.c:

extern void printf(const char *, ...);

void foo() {
  printf("hello world\n");
}
HLASM
A2       CSECT                                                           000000
A2       AMODE 64                                                        000000
A2       RMODE ANY                                                       000000
         SYSSTATE ARCHLVL=2,AMODE64=YES                                  000000
* extern void printf(const char *, ...);                                 000001
*                                                                        000002
* void foo() {                                                           000003
         J     FOO                                                       000003
@@PFD@@  DC    XL8'00C300C300D50000'   Prefix Data Marker                000003
         DC    CL8'20250307'           Compiled Date YYYYMMDD            000003
         DC    CL6'142456'             Compiled Time HHMMSS              000003
         DC    XL4'42040000'           Compiler Version                  000003
         DC    XL2'0000'               Reserved                          000003
         DC    BL1'01000000'           Flag Set 1                        000003
         DC    BL1'00000000'           Flag Set 2                        000003
         DC    BL1'00000000'           Flag Set 3                        000003
         DC    BL1'00000000'           Flag Set 4                        000003
         DC    XL4'00000000'           Reserved                          000003
         ENTRY FOO                                                       000003
FOO      AMODE 64                                                        000003
         DC    0FD                                                       000003
         DC    XL8'00C300C300D50100'   Function Entry Point Marker       000003
         DC    A(@@FPB@1-*+8)          Signed offset to FPB              000003
         DC    XL4'00000000'           Reserved                          000003
FOO      DS    0FD                                                       000003
         STMG  14,11,8(13)                                               000003
         LGR   10,0                                                      000003
         LGR   15,13                                                     000003
         LG    13,136(,13)                                               000003
         STG   15,128(,13)                                               000003
@@BGN@1  DS    0H                                                        000003
         LLILH 9,X'C6F4'                                                 000003
         OILL  9,X'E2C1'                                                 000003
         ST    9,4(,13)                                                  000003
         LGHI  9,168                                                     000003
         ALGR  9,13                                                      000003
         STG   9,#NAB_1-@@AUTO@1(,13)                                    000003
         LGR   0,10                                                      000003
         USING @@AUTO@1,13                                               000003
         LARL  3,@@LIT@1                                                 000003
         USING @@LIT@1,3                                                 000003
         STG   0,#WSA_1                                                  000000
         LARL  11,@@CONST@AREA@@                                         000000
*   printf("hello world\n");                                             000004
         LG    0,#WSA_1                                                  000004
         LLGT  15,=V(PRINTF)                                             000004
         LGR   14,11                                                     000004
         LA    1,152(,13)              #MX_TEMP1                         000004
         STG   14,152(0,13)            #MX_TEMP1                         000004
         MVC   136(8,13),#NAB_1                                          000004
         BASR  14,15                                                     000004
* }                                                                      000005
@1L1     DS    0H                                                        000005
@1L1     DS    0H                                                        000005
         DROP                                                            000005
         LG    13,128(,13)                                               000005
         LG    14,8(,13)                                                 000005
         LMG   1,11,32(13)                                               000005
         BR    14                                                        000005
         DS    0FD                                                       000005
@@LIT@1  LTORG                                                           000000
@@FPB@   LOCTR                                                           000000
@@FPB@1  DS    0FD                     Function Property Block           000000
         DC    XL2'CCD5'               Eyecatcher                        000000
         DC    BL2'1111111111110011'   Saved GPR Mask                    000000
         DC    A(@@PFD@@-@@FPB@1)      Signed Offset to Prefix Data      000000
         DC    BL1'10000000'           Flag Set 1                        000000
         DC    BL1'10000001'           Flag Set 2                        000000
         DC    BL1'00000000'           Flag Set 3                        000000
         DC    BL1'00000001'           Flag Set 4                        000000
         DC    XL4'00000000'           Reserved                          000000
         DC    XL4'00000000'           Reserved                          000000
         DC    AL2(3)                  Function Name                     000000
         DC    C'foo'                                                    000000
A2       LOCTR                                                           000000
         EJECT                                                           000000
@@AUTO@1 DSECT                                                           000000
         DS    21FD                                                      000000
         ORG   @@AUTO@1                                                  000000
#GPR_SA_1 DS   18FD                                                      000000
#NAB_1   DS    FD                                                        000000
         ORG   @@AUTO@1+160                                              000000
#WSA_1   DS    XL8                                                       000000
         EJECT                                                           000000
A2       CSECT ,                                                         000000
@@CONST@AREA@@ DS 0D                                                     000000
         DC    XL13'888593939640A6969993841500'                          000000
         END   ,(5650ZOS   ,2400,25066)                                  000000

In conclusion, we feel the current MCAsmStreamer implementation is insufficient to fully support HLASM Asm output on z/OS. Thus, giving Targets the ability to register their own custom AsmStreamer is needed to achieve the functionality we want.