Custom C++ extension

Hey everyone on cfe-dev

New list member but an old admirer of your great work here.

I am planning the implementation of a set of C++ language extensions which we hope to be used in a [non-compiler related] research project at the University of Cambridge next year.

Naturally, I chose Clang to do this. However, I have an "engineering" question.

The extensions are, as you would imagine, not part of the standard - and I had like to add the word "yet", hehe.

Given the potential invasiveness of some of the features and the pace of changes to the code base, I am left with no answer as to how I can keep up with the pace of the change and whether such requirements have been taken into account in the internal design of the front end, by any chances?

Basically, I do not want to be in a position where we have working version of the extensions next year but based on a version of the front end which is one year old.

It may help to give you a flavour of what some of these changes may look like.

As an example, the extension would allow us to "tag" some identifiers in the code as follows:

class CLXYZ {
public:
  int <tag1> x;
};

The identifier tagged as such is later picked at one of the phases and operated on.

So what is the best practice here? Other than rebasing every so often?

Appreciate your input and advice.
- Ramin

(ps, not so related to my question but we call this "tagged-programming" paradigm. Actually we hope that the combination of these extensions can eventually help programmers in scenarios as the very question I am asking above!).

I guess it all depends on how localized you changes are. But you pretty much answered your question, develop against trunk and merge often. I remember a talk from someone at Embarcadero complaining about pain of not doing this.

There will always be unexpected API changes and there’s no way to get around that. You will have to use the API of course.

If you want to follow top of tip you need to setup some kind of automerging environment that merges with master every night and in the morning if you are lucky you have everything working and if you are unlucky somebody changed the number of parameters of the constructor of some classes you use and you will need to fix that to continue the merge.
Some changes are more subtle … the API remains the same, but the semantic of some parameter changes and you will get probably test errors you will have to figure out why started failing.

Doing it daily is the best thing in my experience to keep it going. There are bad periods and good periods with resepect to changes. Certainly though if you leave it unmerged for two or more months and then you want to try and merge just wait to find yourself in a lot of pain … (experience again :stuck_out_tongue: )

Of course if you are working alone and diligent enough you can do it manually every day, but setting up something automatic always helps in keeping up with your initial proposal :slight_smile:

Cheers,
Marcello

I have my own Pascal compiler frontend. So I need to keep up with API changes in the LLVM API, but not the Clang changes. I don’t do what Marcello suggests - not because I think it’s a bad idea, just because I haven’t made the effort to set it up.

I find that the API breaks once in a while - the other day, when I moved to latest version, I found that the IRBuilder::CreateCall{2, 3,4,5} had been removed, so I needed to change that bit of code.

My experience, however, is that it’s not that often that things change.

As an example, the extension would allow us to "tag" some identifiers
in the code as follows:

class CLXYZ { public: int <tag1> x; };

The identifier tagged as such is later picked at one of the phases
and operated on.

For this particular example, have you considered using __attribute__ syntax [1], [2]?

i.e. something like:

   class CLXYZ { public: int x __attribute__((tag("1"))); };

If you're able to leverage existing infrastructure, that will
significantly reduce your patch burden and make following trunk _much_ easier.

(ps, not so related to my question but we call this
"tagged-programming" paradigm. Actually we hope that the combination
of these extensions can eventually help programmers in scenarios as
the very question I am asking above!).

How so?

Cheers,

Jon

1: Variable Attributes - Using the GNU Compiler Collection (GCC)

2: http://clang.llvm.org/docs/InternalsManual.html#how-to-add-an-attribute

Everyone, thank you for the very helpful comments and replies.

To answer a few of the questions:

@Nikola and @Marcello:

develop against trunk and merge often
setting up something automatic always helps

I do have automatic update scripts which aid me with rebasing git repositories.
I am not sure how welcome git patches are to the community but that is a longer term concern than anything else.

@Jonathan

have you considered using __attribute__

I am aware of the syntax, however the idea behind this work goes far beyond a simple tagging.
So that wouldn’t help and even if it did it would make the code look illegible.

that will significantly reduce your patch burden

I do agree :slight_smile:

How so?

It is a bit tricky to explain it in a few statements really. I made it sound so simple.
You would need to rewrite code to be able to do this - so I didn’t mean “magic”.

Basic idea is to create Template-like constructs for the compiler to emit code at
various “tagged" locations in the source code. This is something C++ templates
are missing and that’s being able to locate “things” in the code to do their job.
I imagine that, if we told the compiler what a “C++ [sub]statement”
looked like in Clang source code, then hypothetically we could tell the compiler
to patch the Clang source to introduce new parts to [some] “C++ statements”...

Anyway, I actually meant to post another question on a new thread regarding
source-to-source or source-to-binary compilation for this.
I may well ask the question here:

My original prototype which didn’t use Clang produced standard C++ (source to source).
I would still like to have that option. However, I would also like to be able to go from
the source directly to the backend.
I think if I implement them the way C++’s Templates are implemented
in Clang then I would get the source to backend part.
Now, I don’t know much about how Templates work in Clang yet but I had
a quick look and it seems like the template instantiation happens internally
and no code is ever produced - only ASTs.
So if I implemented these the way Templates work now, then is there no
way to switch it to a source-to-source mode?
Can Clang, for example, emit C++ code after Template instantiation, somehow?

-Ramin

Everyone, thank you for the very helpful comments and replies.

To answer a few of the questions:

@Nikola and @Marcello:

develop against trunk and merge often
setting up something automatic always helps

I do have automatic update scripts which aid me with rebasing git repositories.
I am not sure how welcome git patches are to the community but that is a longer term concern than anything else.

@Jonathan

have you considered using __attribute__

I am aware of the syntax, however the idea behind this work goes far beyond a simple tagging.
So that wouldn’t help and even if it did it would make the code look illegible.

that will significantly reduce your patch burden

I do agree :slight_smile:

How so?

It is a bit tricky to explain it in a few statements really. I made it sound so simple.
You would need to rewrite code to be able to do this - so I didn’t mean “magic”.

Basic idea is to create Template-like constructs for the compiler to emit code at
various “tagged" locations in the source code. This is something C++ templates
are missing and that’s being able to locate “things” in the code to do their job.
I imagine that, if we told the compiler what a “C++ [sub]statement”
looked like in Clang source code, then hypothetically we could tell the compiler
to patch the Clang source to introduce new parts to [some] “C++ statements”...

Anyway, I actually meant to post another question on a new thread regarding
source-to-source or source-to-binary compilation for this.
I may well ask the question here:

My original prototype which didn’t use Clang produced standard C++ (source to source).
I would still like to have that option. However, I would also like to be able to go from
the source directly to the backend.
I think if I implement them the way C++’s Templates are implemented
in Clang then I would get the source to backend part.
Now, I don’t know much about how Templates work in Clang yet but I had
a quick look and it seems like the template instantiation happens internally
and no code is ever produced - only ASTs.
So if I implemented these the way Templates work now, then is there no
way to switch it to a source-to-source mode?
Can Clang, for example, emit C++ code after Template instantiation, somehow?

-Ramin

As an example, the extension would allow us to "tag" some identifiers
in the code as follows:

class CLXYZ { public: int <tag1> x; };

The identifier tagged as such is later picked at one of the phases
and operated on.

For this particular example, have you considered using __attribute__ syntax [1], [2]?

i.e. something like:

  class CLXYZ { public: int x __attribute__((tag("1"))); };

If you're able to leverage existing infrastructure, that will
significantly reduce your patch burden and make following trunk _much_ easier.

You've already responded that you don't like the syntax, but I'd strongly suggest considering attributes as an implementation mechanism. If you can reduce your patches to a custom bit of parsing which adds an attribute to the appropriate AST nodes and then phrase everything in terms of attributes, your merging/support is going to be much easier. It also gives you the possibility of pushing bug fixes and small extensions to the attribute mechanism upstream with standalone test cases.

I played with an extension a while back for supporting pre and post conditions on methods + object invariants. I went through a couple of designs before setting on a "syntactic sugar" + "attributes" design and that really was the simplest to maintain. (Mind you, this was strictly a hobby project so I wasn't that worried about perfection in the syntax or semantic analysis.)

As an example, the extension would allow us to "tag" some identifiers
in the code as follows:

class CLXYZ { public: int <tag1> x; };

The identifier tagged as such is later picked at one of the phases
and operated on.

For this particular example, have you considered using __attribute__
syntax [1], [2]?

i.e. something like:

  class CLXYZ { public: int x __attribute__((tag("1"))); };

If you're able to leverage existing infrastructure, that will
significantly reduce your patch burden and make following trunk _much_
easier.

You've already responded that you don't like the syntax, but I'd

Also, the syntax doesn't have to be so ugly while you're prototyping it, provided you hide the attribute itself in a macro:

   #define TAG(x) __attribute__((tag(#x)))

   class CLXYZ { public: int x TAG(1); };

strongly suggest considering attributes as an implementation mechanism.
If you can reduce your patches to a custom bit of parsing which adds an
attribute to the appropriate AST nodes and then phrase everything in
terms of attributes, your merging/support is going to be much easier. It
also gives you the possibility of pushing bug fixes and small extensions
to the attribute mechanism upstream with standalone test cases.

+1

I wholeheartedly agree.

Cheers,

Jon

Thanks guys. Your suggestions are interesting I will consider how much of the work I can get done with these.

Can I just copy paste another question I had asked earlier which I think was not noticed much?
"
What I am working on has many similarities to the way C++ Templates work.
There needs to be a declaration+definition and an instantiation phase.
My original prototype which didn’t use Clang produced standard C++ (source to source).
I would still like to have that option. However, I would also like to be able to go from
the source directly to the backend.
I think if I implement them the way C++’s Templates are implemented
in Clang then I would get the source to backend part.
But I had a quick look and it seems like the template instantiation
happens internally (i.e. only ASTs are produced).
So if I implemented these the way Templates work now, then is there
any ways to switch it to a source-to-source mode?
Can Clang, for example, emit C++ code after Template instantiation, somehow?
"

Regards
- Ramin

If you don't actually need the exact syntax below, you can use the existing __attribute__((annotate("tag1")) syntax, which can be extracted by an ASTVisitor pass, which is perhaps a bit more stable of an API since so many in-tree tools use it.

Alex