technical debt

something to think about as llvm and clang grows.

http://en.wikipedia.org/wiki/Technical_debt

I'm pretty sure neither llvm nor clang have any technical debt at all.

I hope you are joking.

It's not meant as a criticism of llvm or clang but there is already an enormous amount
of technical debt.

It's something to try and get a handle on before it gets out of hand.

Documentation is one area where it is accumulating fast but there are others.
Testing is another area.
Tablegen alone has huge technical debt.

To me, there should be a cap placed on the number of lines of code in llvm.
Like a budget. We should try and rewrite and refactor to keep the number of lines from growing
without bound.

At this point lots of patterns should be developing where other tools (like tablegen) could be
written to reduce the amount of code and make things more understandable.

Reed

I'm pretty sure neither llvm nor clang have any technical debt at all.

something to think about as llvm and clang grows.

http://en.wikipedia.org/wiki/Technical_debt
_______________________________________________
LLVM Developers mailing list
LLVMdev@cs.uiuc.edu http://llvm.cs.uiuc.edu
http://lists.cs.uiuc.edu/mailman/listinfo/llvmdev

I hope you are joking.

Why would I be joking?

It's not meant as a criticism of llvm or clang but there is already an
enormous amount
of technical debt.

I don't see that.

It's something to try and get a handle on before it gets out of hand.

The consequences will never be the same

Documentation is one area where it is accumulating fast but there are
others.

I think LLVM is incredibly well documented

Testing is another area.

It also has at least 10-15 tests.

Tablegen alone has huge technical debt.

I'm sorry you feel that way.

To me, there should be a cap placed on the number of lines of code in llvm.

Will there be a credit offset system?

Like a budget. We should try and rewrite and refactor to keep the number of
lines from growing
without bound.

At this point lots of patterns should be developing where other tools (like
tablegen) could be
written to reduce the amount of code and make things more understandable.

I agree. We should macroize most of the passes so they aren't so wordy.

Well, differences of opinion is what makes horse races.

Reed

Can we get back to the substantive discussion about your ideas for
lessening the technical debt?

obvious troll is obvious.

Can we get back to the substantive discussion about your ideas for
lessening the technical debt?

The lessening requires enlisting people that are willing to do this as opposed to doing fun science like cool optimization. I,for example, find the documentaiton, cleanup and refactoring to be interesting so I don't feel cheated to work on it as opposed to implementing some new fangled register allocator.

For example, there is almost no documentation on all the application specific plugins for tablegen.
There are some tablegen files and some small comments here and there and you can guess
some of it from just knowing about compilers but it's nothing close to what could be called
documentation.

I've started on my own to try and further document tablegen. I gave a talk/tutorial at LLVM
Europe on the general tablegen language and it was well received. Even people that had worked
with it for a while said they took away things they never understood about it.

It was clear when I studied tablegen that there are many serious problems with it from a language
point of view and from a tool point view. Those things would all need to be cleared up before
some bigger form of it that could go beyond just laying out data structures could be
developed.

If there is sufficient interest, I think that maybe a separate discussion list to deal with technical
debt would make sense. I think for a lot of people it would be uninteresting to get all those
extra posts.

It's a question of enlisting people that want to work on it and convincing people that are not interested to work on it that it's something important to do and to welcome the help and
not obstruct the effort.

So far I have created some google code projects for various things I'm interested to work on.
I've created separate google code projects because I don't have the bandwidth to work on this
if there is resistance to it. So in my google code areas, I can do what I want without a big
discussion on every step. So maybe only my team will use it and then it can just sit in google code
forever.

So there is a cutting edge of the llvm/clang project which will never want to wait for all the technical debt to get paid. This is a natural thing. You can't more forward trying to make everything be A+ quality; you can only do the A+ work after some reflection and experience with a given problem and rewriting and refactoring it many times. But at the same time, the technical debt needs to be settled or it will get out of hand and unpayable in the future.

Reed

FWIW, I’m putting together (hopefully to be done by the end of this weekend) a substantial refactoring of the TableGen backend API along with shiny new documentation (reStructuredText with sphinx) of all of TableGen, including documentation about how to write backends and—depending on how adventurous I get—a more detailed coverage of the syntax.

Also, Reed, in your TableGen talk, IIRC, you guessed that there are maybe ~10KLOC of TableGen, and said something to the tune that it wasn’t too late to move over to something new and better. There are over 100KLOC in TableGen files, so unfortunately a “flag day” transition to another language is out of the question.

Feel free to get in contact with me if you would like to discuss TableGen or related topics.

–Sean Silva

Hi Sean,

Glad to hear there is clean up of tablegen going on.

Just for the record, I don’t know what you are referring to regarding some comment of mine
at my talk about 10K LOC.

I don’t know how big tablegen is itself nor how much code has been written in it so I would not have ventured such a guess.

The idea of totally replacing the tablegen language came up at the talk during the question and answer period and I was not optimistic about that possibility for various reasons.

I will definitely get in touch with you about tablegen.

Reed

Things that are needed regarding documentation for tablegen:

  1. syntax and reference manual for non application specific tablegen portion
  2. reference manual for each application specific plugin. There are getting started manuals that give you the idea of what is going on but nothing even close to a real users manual and certainly no reference manual.
  3. manual for writing plugins and API for that.

There are TBD references for these things in the current manuals. Paying the technical debt involves fulfilling the TBDs.

The current version stack dumps on any kind of bad input, usually with just an assertion message.

Lets see what you new version is and then we can talk.

I definitely trust what you say now with time to think at your keyboard over what you said on the spot in a live presentation. The comment that I was referring to was:

36:44 of http://llvm.org/devmtg/2012-04-12/videos/Reed_Kotler-mobile.mov
“there’s not really more than a couple thousand lines of .td … I mean there’s not tons of this code so if we had to use a different one I don’t think it would be a huge problem”

I definitely trust what you say now with time to think at your keyboard over what you said on the spot in a live presentation. The comment that I was referring to was:

36:44 of http://llvm.org/devmtg/2012-04-12/videos/Reed_Kotler-mobile.mov
“there’s not really more than a couple thousand lines of .td … I mean there’s not tons of this code so if we had to use a different one I don’t think it would be a huge problem”

hmm. that would have been a stupid thing to say if i said that. well, i’d be rich if i had a dime
for everything stupid thing i’ve said in my life.

anyway, i agree that it’s a non starter to think of just making a wholesale change at this point.

we would have to have a plan for such a change.

the first step is to clean up and define the syntax and semantics of what is already there in tablegen.

at that an point evolutionary as opposed to revolutionary path of changes could be possible.

We already have a list of technical debt, just look at bugzilla (code-cleanup). There has been many refactoring efforts, e.g. one major was to MC by Evan Cheng last year. Before the refactoring was done, it was clearly a technical debt, because the refactoring only fixed violations of the design. Dan Gohman has been pointing out some design debt about the LLVM IR. In general, it is very easy to become used to bad living conditions because the are seen as “natural” and people generally try to be happy with what they have. Code tend to slowly get worse without anyone taking much notice, people are busy, and the code looks good enough. Lack of understanding of the design and not knowing the better way of doing something is often a cause of adding to the technical debt. I don’t know if there is an easy way to lessen technical debt, other than to think more and type less, which might not be in a developers mind when a deadline is close. One thing that might help would be to write a couple more lines for code reviews, and explain why things are the way they are, and if they can’t be easily explained something else should be done e.g. documentation or refactoring to make the code simpler (add to bugzilla if it can’t be done at the time).

  • Jan

Hello,

I am new to this alias but still wanted to pitch in. When I was learning tablegen syntax I found tablegen tests to be a lot more educative than examples in the “tablegen fundamentals”.

I’d hope that newer documentation found adopt some of those tests as examples. I found writing backends quite easy but modifying front end rather hard. I wanted to modify the parser to allow #NAME# notation in multiclasses to be used in the right side of definitions but could not find an easy way to inject such change.

Thanks,

Ivan