PTX target for LLVM?

To the list:

Tons of LLVM research is being done that is damn near worthless to anyone but the person who did it because the team doesn’t publish supporting code or even describe at a high level description of the algorithms they’re using. And the excuse is always, ALWAYS the same: “we need to clean up the code before we release it.”

No! Just put a repository up (or make a tarball)! This is open source. Code is never perfect, so just put it out there with the same BSD-style license as LLVM. Every programmer can read code, even bad code, especially when there’s a research paper or thesis to go along with it. Every delay in releasing code just slows down the progress of the the world. That’s the only “benefit”: slowing the progress of other researchers. If the code is needed to replicate your research, for the love of Turing publish it alongside the research.

Look, the LLVM project has already set a great example with its permissive BSD license and all LLVM development is done out in the open; I see (and read) every commit. The rest of us need to get with the program and follow suit. If you publish research built on top of LLVM, please,please, PLEASE at least make a tarball of the source code available alongside the research so the rest of us can view it, build on it, and improve it.

Thanks!

Best, Erich Ocean

P.s. I too would like access to a PTX backend for LLVM. :slight_smile:

Hi,

(Disabling lurk mode)

I must admit, I believe this would be an extremely valuable addition to llvm, to the point where I was also seriously considering writing this backend. The main thing holding me back is the thought that other people are almost certainly working on the same thing (probably including nvidia)!

I assume it’s not yet ready to publish, but it’s worth remembering that feedback from use of research in practice is a major plus point for any thesis J.

+1 for even a tarball please!

Ta,

Sam

+1 for the PTX Backend, could lead to all sort of new things

Fabrizio

To the list:

Tons of LLVM research is being done that is damn near worthless to anyone
but the person who did it because the team doesn't publish supporting code
or even describe at a high level description of the algorithms they're
using. And the excuse is always, ALWAYS the same: "we need to clean up the
code before we release it."

I certainly agree that publishing code is A Good Thing and will help scientific
progress, but we don't live in an ideal world. There are perfectly valid
reasons to publish code late or even never.

No! Just put a repository up (or make a tarball)! This is open source.
Code is never perfect, so just put it out there with the same BSD-style
license as LLVM. Every programmer can read code, even bad code, especially
when there's a research paper or thesis to go along with it. Every delay
in releasing code just slows down the progress of the the world. That's
the only "benefit": slowing the progress of other researchers. If the code
is needed to replicate your research, for the love of Turing *publish it
alongside the research*.

It's not that easy, unfortunately. To get funding for research, publications
still count more than whether your code is used by others or not. Code (re)use
is also often not properly cited, nor tracked and valued as "contributions" as
much as it should be. Some communities are better than others in this, but in
general you cannot expect groups from academia to risk their future funding.
These decisions always depend on the community you are in, how the other
researchers in your community behave, etc.

Also, there are other practical reasons for publishing later (or for
publishing code that is polished and idiot-proof). For example, some people
don't understand the difference between a code drop and a release that is
supposed to work in every case. Some treat any piece of code that they can get
as final release, never update even if you publish new versions, and report
results based on this (ie, there might be a risk it will be (accidentally)
misused). You might get more (support) questions due to instable code than you
can handle, and you don't want people to loose interest because you seem to be
unresponsive.

Look, the LLVM project has already set a great example with its permissive
BSD license and all LLVM development is done out in the open; I see (and
read) every commit. The rest of us need to get with the program and follow
suit. If you publish research built on top of LLVM, please,*please*,
PLEASE at least make a tarball of the source code available alongside the
research so the rest of us can view it, build on it, and improve it.

Thanks!

Best, Erich Ocean

P.s. I too would like access to a PTX backend for LLVM. :slight_smile:

Perhaps you should just have asked for access, if that was your intention.

Torvald

[NOTE: This is MY opinion and not reflective of what people at Cray may
or may not think nor does it in any way imply a company-wide position.]

The is actually part of a larger problem within the computing research
community. Unlike the hard sciences, we don't require reproducibility
of results. This leads to endless hours of researchers fruitlessly trying
to improve upon results that they can't even verify in the first place.
The result is haphazard guesses and lower quality publications which
are of little use in the real world.

I've talked to high-up engineers at IBM who refuse to believe any research
publication until they've verified it themselves with their own simulators,
etc. Something like 99% of the time they find results to be meaningless
or not reproducible.

Now, the ideas are the most valuable part of a publication, but it is
important to be able to validate the idea. I disagree with the current
practice of not accepting papers that don't show a 10% improvement in
whatever area is under consideration. I believe we should also publish
papers that show negative results. This would save researchers an enormous
amount of time. Moreover, ideas that were wrong 20 years ago may very
well be right today.

The combination of the current "10% or better" practice with no requirement
for reproducibility means there's very little incentive to release tools and
code used for the experiments. In fact there is disincentive, as we wouldn't
want some precocious student to demonstrate the experiment was flawed. This
is another problem, in that researchers view challenges as personal threats
rather than a chance to advance the state of the art and students are
encouraged to combatively challenge published research rather than work with
the original publishers to improve it.

We need an overhaul of the system.

                             -Dave

[NOTE: This is MY opinion and not reflective of what people at Cray may
or may not think nor does it in any way imply a company-wide position.]

The is actually part of a larger problem within the computing research
community. Unlike the hard sciences, we don't require reproducibility
of results.

I would guess that this problem happens in other areas of research as well
(eg, just look at the sample sizes of some studies). And even if you can
reproduce sth, it might still not be a meaningful model of the real system.

This leads to endless hours of researchers fruitlessly trying
to improve upon results that they can't even verify in the first place.
The result is haphazard guesses and lower quality publications which
are of little use in the real world.

You can reproduce the implementation too, based on the description in the
paper. This kind of N-version programming can be useful but I agree that the
following comparison would require less effort if the original implementation
would be available.

Now, the ideas are the most valuable part of a publication, but it is
important to be able to validate the idea.

I agree. However, it's hard to enforce it. If one would require open-source
implementations + setup for every paper, there would be less or no papers by
industry research. But their input is valuable. And I guess you can't require
it just for academia (eg, some universities want to sell their research
results).

I disagree with the current
practice of not accepting papers that don't show a 10% improvement in
whatever area is under consideration. I believe we should also publish
papers that show negative results. This would save researchers an enormous
amount of time. Moreover, ideas that were wrong 20 years ago may very
well be right today.

I agree. Some communities/conferences also accept negative results, but I
think often the the quality requirements for these papers are much harder than
for the 10% papers. One issue with negative results though is how to determine
what is a negative result that is useful to report.

The combination of the current "10% or better" practice with no requirement
for reproducibility means there's very little incentive to release tools
and code used for the experiments. In fact there is disincentive, as we
wouldn't want some precocious student to demonstrate the experiment was
flawed. This is another problem, in that researchers view challenges as
personal threats rather than a chance to advance the state of the art and
students are encouraged to combatively challenge published research rather
than work with the original publishers to improve it.

I agree. What I wanted to point out in my previous mail is that even though
this is a change that would help, it's difficult to achieve it. I mean, you have
this problem _everywhere_. Even if the end result will be better, you need a
solution that can compete in the current system.

Torvald