Language not recognized: 'c++-module-cpp-output'

Hi,

$ clang++-5.0 --version
clang version 5.0.0-svn304373-1~exp1 (trunk)

$ clang++-5.0 -std=c++1z -fmodules-ts -D__cpp_modules=201704 --precompile \
  -o hello.pcm.o -c -x c++-module-cpp-output hello.mxx
  
clang: error: language not recognized: 'c++-module-cpp-output'

This language name was there at some point[1]. Maybe it was (accidentally)
removed by this[2] commit?

[1] clang/Types.def at master · llvm-mirror/clang · GitHub
[2] r300611 - [modules-ts] Fold together -x c++ and -x c++-module at -cc1 level.

Thanks,
Boris

Doesn’t look like this was ever tested, so I’m not sure what it was for. I’m guessing Richard didn’t consider it necessary when he removed it…

So could you outline (show a small/complete example with source, commands, etc) what you’re trying to do? You might be able to take a look at Clang’s test cases to see what’s supported/implemented right now.

Hi David,

David Blaikie <dblaikie@gmail.com> writes:

So could you outline (show a small/complete example with source, commands,
etc) what you're trying to do?

In build2 (build system I am working on) we have the ability to specify
that a source file is preprocessed to a certain degree. This is used to
speed things up. While currently having translation units that no longer
require preprocessing is not very common, with modules (and modularized
standard library) this situation will become a lot more plausible. So we
want to be ready.

If the user says that certain TUs are fully preprocessed, then build2
passes this information on to the compiler via the -x option. So for C
we say the source is '-x cpp-output' and for C++ -- '-x c++-cpp-output'.

Now in Clang module interface units have to be compiled as '-x c++-module',
not '-x c++'. So we also need '-x c++-module-cpp-output'.

BTW, it is not clear why c++-module is needed (and it is needed -- I tried
to compile as just c++ and things didn't go well). Perhaps now that Clang
use the 'export module M;' syntax c++-module is no longer necessary?

Thanks,
Boris

Hi David,

David Blaikie <dblaikie@gmail.com> writes:

So could you outline (show a small/complete example with source, commands,
etc) what you’re trying to do?

In build2 (build system I am working on) we have the ability to specify
that a source file is preprocessed to a certain degree. This is used to
speed things up.

Does this actually speed up clang in an observable way?

While currently having translation units that no longer
require preprocessing is not very common, with modules (and modularized
standard library) this situation will become a lot more plausible. So we
want to be ready.

What makes it more likely for modularized code? In the sense that many more source files won’t /need/ to use the preprocessor, so they’re sort of quasi-preprocessed (trivially, in the sense that there’s no work to do) already?

Hi David,

David Blaikie <dblaikie@gmail.com> writes:

> So could you outline (show a small/complete example with source,
commands,
> etc) what you're trying to do?

In build2 (build system I am working on) we have the ability to specify
that a source file is preprocessed to a certain degree. This is used to
speed things up. While currently having translation units that no longer
require preprocessing is not very common, with modules (and modularized
standard library) this situation will become a lot more plausible. So we
want to be ready.

If the user says that certain TUs are fully preprocessed, then build2
passes this information on to the compiler via the -x option. So for C
we say the source is '-x cpp-output' and for C++ -- '-x c++-cpp-output'.

Now in Clang module interface units have to be compiled as '-x c++-module',
not '-x c++'. So we also need '-x c++-module-cpp-output'.

What difference are you expecting the -cpp-output to make to the
compilation? For Clang, it does almost exactly nothing, except that we'll
consider the name from the GNU linemarker directive on the first line of
the file (if any) to be its "real" name.

We currently do not support a user-specified -cpp-output file type for any
of the formats which we support precompiling (*-header, *-module). We
certainly could if there's a good reason.

BTW, it is not clear why c++-module is needed (and it is needed -- I tried

to compile as just c++ and things didn't go well). Perhaps now that Clang
use the 'export module M;' syntax c++-module is no longer necessary?

It's necessary because (a) it instructs clang to build a different tool
pipeline, with a precompile stage rather than a stage that generates an
object file, and (b) under the Modules TS, the "module X;" directive has
different semantics in a module interface unit versus in a module
implementation unit -- in the latter case, it acts as a souped up form of
module import in addition to entering the semantic context of the module.

It's possible that (b) will change prior to the publication of the TS, but
we'd still need to know whether we're entering a module interface unit for
(a).

David Blaikie <dblaikie@gmail.com> writes:

Does this actually speed up clang in an observable way?

I haven't measured, but I very much doubt there will be any significant
speedup if you are going all the way to the object code, so to speak.
The benefit will come if you want to send the file for compilation on
a remote host. In this case you won't need to start the compiler locally
at all.

What makes it more likely for modularized code? In the sense that many more
source files won't /need/ to use the preprocessor, so they're sort of
quasi-preprocessed (trivially, in the sense that there's no work to do)
already?

Yes, exactly. They won't need #include's and quite a bit of code could
probably be written without relying on macros (especially so for module
interface units).

One hold out is assert(). Maybe it can be done as a compiler intrinsic?

Boris

Richard Smith <richard@metafoo.co.uk> writes:

What difference are you expecting the -cpp-output to make to the
compilation?

Make it a lot faster? :wink:

Seriously, though, in my case I know that certain TU do not use the
preprocessor. I do some optimizations at the build system level but
also want to pass this information along to the compiler in case it
wants to do some as well. It may not do any currently (e.g., because
such preprocessed TU are virtually non-existent and its not worth the
effort). But maybe it makes sense to "reserve" the name even if it's
just an alias for 'c++-module'. But I am also happy to always pass
'c++-module' (and perhaps 'c++' for consistency).

Also, on the topic of expectations: I assume that such a preprocessed
TU can still contain comments and line continuations. Clang's -cpp-output
handles this but not GCC's (where I use -fdirectives-only to indicate
that the source is only partially preprocessed).

Boris

David Blaikie <dblaikie@gmail.com> writes:

Does this actually speed up clang in an observable way?

I haven’t measured, but I very much doubt there will be any significant
speedup if you are going all the way to the object code, so to speak.
The benefit will come if you want to send the file for compilation on
a remote host. In this case you won’t need to start the compiler locally
at all.

Not sure I follow this - that’s a benefit to the build system in knowing whether it needs to preprocess the file before sending it to the remote system. But not a benefit to the compiler/clang.

What makes it more likely for modularized code? In the sense that many more
source files won’t /need/ to use the preprocessor, so they’re sort of
quasi-preprocessed (trivially, in the sense that there’s no work to do)
already?

Yes, exactly. They won’t need #include’s and quite a bit of code could
probably be written without relying on macros (especially so for module
interface units).

One hold out is assert(). Maybe it can be done as a compiler intrinsic?

I don’t think it can, but maybe - the issue being that the contents of the assert expression mustn’t be evaluated or even parsed, etc, if assertions are not enabled.

Richard Smith <richard@metafoo.co.uk> writes:

What difference are you expecting the -cpp-output to make to the
compilation?

Make it a lot faster? :wink:

Seriously, though, in my case I know that certain TU do not use the
preprocessor. I do some optimizations at the build system level but
also want to pass this information along to the compiler in case it
wants to do some as well. It may not do any currently (e.g., because
such preprocessed TU are virtually non-existent and its not worth the
effort). But maybe it makes sense to “reserve” the name even if it’s
just an alias for ‘c+±module’. But I am also happy to always pass
‘c+±module’ (and perhaps ‘c++’ for consistency).

Also, on the topic of expectations: I assume that such a preprocessed
TU can still contain comments and line continuations. Clang’s -cpp-output
handles this but not GCC’s (where I use -fdirectives-only to indicate
that the source is only partially preprocessed).

As an aside - Clang has -frewrite-includes that might be sufficiently similar for your needs. If you don’t use this (& use full preprocessing) you will find that Clang’s diagnostics differ between a preprocessed file being compiled and an unpreprocessed file being compiled (Clang uses macro information to decide whether to warn in some cases).

I think you misunderstood my question. What difference do you expect the -x c+±cpp-output / -x c+±module-cpp-output flag to make, compared to passing -x c++ / -x c+±module for the same input file? What do you think that flag does?

David Blaikie <dblaikie@gmail.com> writes:

Not sure I follow this - that's a benefit to the build system in knowing
whether it needs to preprocess the file before sending it to the remote
system. But not a benefit to the compiler/clang.

A compiler could optimize things if it knows that it doesn't need to do
macro expansion, handle #include's, etc. I agree it may not be worth it
right now. But I am wondering if it makes sense to keep the door open for
this kind of optimizations in the future.

I don't think it can, but maybe - the issue being that the contents of the
assert expression mustn't be evaluated or even parsed, etc, if assertions
are not enabled.

I don't think it will affect a lot of code if it is parsed and discarded
when disabled. One will have to do things differently based on NDEBUG to
craft something like this. Which will then disqualify it from this "already
preprocessed" mode.

In fact, I would very much like to have a version of assert() that always
evaluates the condition.

Boris

David Blaikie <dblaikie@gmail.com> writes:

As an aside - Clang has -frewrite-includes that might be sufficiently
similar for your needs. If you don't use this (& use full preprocessing)
you will find that Clang's diagnostics differ between a preprocessed file
being compiled and an unpreprocessed file being compiled (Clang uses macro
information to decide whether to warn in some cases).

Yes, I use -frewrite-includes to partially preprocess before compiling.

The difference between -frewrite-includes and GCC's -fdirectives-only
is that for the latter you pass this option both when preprocessing (to
indicate that only partial preprocessing is required) and when compiling
(to indicate that the source is already partially preprocessed).

Boris

Richard Smith <richard@metafoo.co.uk> writes:

I think you misunderstood my question. What difference do you expect the -x
c++-cpp-output / -x c++-module-cpp-output flag to make, compared to passing
-x c++ / -x c++-module for the same input file? What do you think that flag
does?

I believe that -x *cpp-output tells the compiler that what's being
compiled has previously gone through -E. At least that's the GCC's
semantics.

It is also now clear that this semantics is pretty useless since it
looses information and that's why we have -frewrite-includes and
-fdirectives-only.

Now if you ask me what I would like -x *cpp-output to mean, it would
be this: The input can still contain comments and line continuations
but no macro expansions/conditions or #include directives.

My understanding is that the preprocessor is essentially a tokenizer
for the compiler frontend. So in this model a compiler could
substitute a "full preprocessor" with a simpler and maybe faster
tokenizer.

For what it's worth, I've implemented such a tokenizer in build2[1]
and it turned out not too hairy. We use[2] it to extract module
information from translation units.

Its performance is about the same as Clang's full preprocessor (-E)
which I think is not bad considering I haven't done any optimization
work and it uses std::istream to read the data.

[1] https://git.build2.org/cgit/build2/tree/build2/cc/lexer.hxx
    https://git.build2.org/cgit/build2/tree/build2/cc/lexer.cxx

[2] https://git.build2.org/cgit/build2/tree/build2/cc/parser.hxx
    https://git.build2.org/cgit/build2/tree/build2/cc/parser.cxx

Boris

Boris Kolpackov <boris@codesynthesis.com> writes:

David Blaikie <dblaikie@gmail.com> writes:

> I don't think it can, but maybe - the issue being that the contents of the
> assert expression mustn't be evaluated or even parsed, etc, if assertions
> are not enabled.

I don't think it will affect a lot of code if it is parsed and discarded
when disabled. One will have to do things differently based on NDEBUG to
craft something like this. Which will then disqualify it from this "already
preprocessed" mode.

In fact, I would very much like to have a version of assert() that always
evaluates the condition.

It was pointed to me off-list that there is a proposal[1] that would
provide the assert attribute as an alternative to the assert() macro:

[[assert: foo != 0 && bar == 0]];

[1] Support for contract based programming in C++

Boris

Richard Smith <richard@metafoo.co.uk> writes:

> I think you misunderstood my question. What difference do you expect the
-x
> c++-cpp-output / -x c++-module-cpp-output flag to make, compared to
passing
> -x c++ / -x c++-module for the same input file? What do you think that
flag
> does?

I believe that -x *cpp-output tells the compiler that what's being
compiled has previously gone through -E. At least that's the GCC's
semantics.

It is also now clear that this semantics is pretty useless since it
looses information and that's why we have -frewrite-includes and
-fdirectives-only.

Yes. It's also pretty much pointless since the point of -E is to produce a
source file that is still a valid input in the original language.

Now if you ask me what I would like -x *cpp-output to mean, it would
be this: The input can still contain comments and line continuations
but no macro expansions/conditions or #include directives.

My understanding is that the preprocessor is essentially a tokenizer
for the compiler frontend. So in this model a compiler could
substitute a "full preprocessor" with a simpler and maybe faster
tokenizer.

For us at least, this would add complexity (by adding a "no preprocessing"
mode) and likely not actually bring about any performance improvement --
the additional checks for "does this identifier have a defined macro" and
"is this a # at the start of a line" are extremely cheap. Plus, as you
mentioned above, this actually isn't what you want -- for compilers like
Clang (and recent versions of GCC) that take into account the provenance of
tokens (via macro expansion etc) when issuing diagnostics, preprocessing
prior to compilation proper harms the quality of experience of your users.

For what it's worth, I've implemented such a tokenizer in build2[1]

and it turned out not too hairy. We use[2] it to extract module
information from translation units.

Its performance is about the same as Clang's full preprocessor (-E)
which I think is not bad considering I haven't done any optimization
work and it uses std::istream to read the data.

That's impressive, considering that we have done a lot of tuning on our
preprocessor. Perhaps it's time for us to stare at some profiles again and
see where we're wasting time.

Hi Richard,

Richard Smith <richard@metafoo.co.uk> writes:

For us at least, this would add complexity (by adding a "no preprocessing"
mode) and likely not actually bring about any performance improvement --
the additional checks for "does this identifier have a defined macro" and
"is this a # at the start of a line" are extremely cheap.

Yes, you are probably right.

Plus, as you mentioned above, this actually isn't what you want -- for
compilers like Clang (and recent versions of GCC) that take into account
the provenance of tokens (via macro expansion etc) when issuing diagnostics,
preprocessing prior to compilation proper harms the quality of experience
of your users.

The scenario I had in mind for -cpp-output is translation units that no
longer use the preprocessor, not previously-preprocessed (to certain
degree) units. Think of a C++ source file that uses module imports
instead of #include's, [[assert:]] instead of assert(), etc.

Boris

I think such a mode could be reasonable, especially in a post-Modules-TS
world, for those people who (rationally or irrationally) are concerned
about accidental macro expansion. That said, it makes more sense to me to
control that with a flag like -fno-cpp rather than a -x value.