Apple's GCC and .s/.S files in llvm-test (fwd)

Hello,

Apple's GCC does not make the distinction between .s and .S files and
always run the preprocessor. From the man:

> file.s
> Assembler code. Apple's version of GCC runs the preprocessor on these
> files as well as those ending in .S.
>
> file.S
> Assembler code which must be preprocessed.

Yes. The reason for this is that MacOS supports some non-case-sensitive filesystems, so .s and .S are not in general distinguishable.

The problem is that sometimes llc generates comments in the assembly
that look like this for x86:

  ...
       pushl %esi
  ...
  # implicit-def: EDI
  ...
       ret

The comment line is perfectly valid for the assembler, but the
preprocessor does not like it because it tries to interpret it as a
macro... I can see it happening for example if -std=c99 is set in the
CFLAGS (that's the case in SingleSource/Regression/C++) :

$ gcc --std=c99 -o t t.s
t.s:5:4: error: invalid preprocessing directive #implicit

One solution is to force the language to be assembler (and not
assembler-with-cpp) on Darwin, that's what the attached patch does, but
maybe there is a nicer solution ?

In general I think we want llvm-gcc's output .s files to be acceptable as input to gcc, so the right thing is to change the x86 asm printer so it doesn't generate these comments. I don't see a good way to do full-line comments that works both if you run the preprocessor and if you don't. (Using #pragma works since compilers are required to ignore unknown pragmas, but nobody would call that good.) If we're willing to build in the assumption that the preprocessor will be run, // or /**/ comments work. Or attaching them to the end of the previous line works.

[...]

  I don't see a good way to do
full-line comments that works both if you run the preprocessor and if
you don't.

Could you use "##" instead of "#"?

  Daveed

I'm skeptical. The case of the extension comes from the user, the user _knows_ what language the file is in. The user can be obligated to get the case right.

Also, on darwin, the filenames are case preserving, meaning, the filesystem can even track the language directly as encoded by the filename, .s, no cpp, .S, run cpp.

I'm be tempted to eject this and see what breaks, then get them to fix it.

Pragmatically, that works (as I'm sure you know). Digging into the legalities of C99 I'm not sure that it's guaranteed to work, though. Unknown directives actually match the "non-directive" case in the grammar in 6.10; while nothing is said anywhere about semantics that I can find, I'm not sure why gcc feels this should be a hard error at all....

[...]

I don't see a good way to do
full-line comments that works both if you run the preprocessor and if
you don't.

Could you use "##" instead of "#"?

Pragmatically, that works (as I'm sure you know). Digging into the
legalities of C99 I'm not sure that it's guaranteed to work, though.
Unknown directives actually match the "non-directive" case in the
grammar in 6.10; while nothing is said anywhere about semantics that I
can find, I'm not sure why gcc feels this should be a hard error at
all....

"##" is a punctuator (6.4.6) and therefore a preprocessing token of its own (6.4/1).

A line that starts with "##" is therefore a text-line in 6.10/1 parlance (i.e., it doesn't match the "# non-directive" rule), and so yes, I think it's guaranteed to work on the preprocessor side of things. (I know next to nothing about the assembler side of things.)

I don't know either why GCC faults non-directive "# ..." cases. Maybe it's a C89 leftover, or maybe it has to do with the older "# <line-

" forms.

  Daveed

Ah right, ## is still a token although it can appear only in restricted places...ok, now I like this. This is documented to work in the assembler, so it looks good. Thanks.

I think that this is an extremely clever hack, and is nice and simple. It works because ## is a token that just gets passed through when not in a macro replacement list. Because of maximal munch, it prevents the preprocessor from seeing it as a #.

Dale, do you see any problem with changing the "comment character" to "##"?

-Chris