Parallel IR [PIR] --- BoF preparation discussion

Dear community,

In preparation for the BoF on Parallel IR at the US developers meeting
we would like to collect feedback from the whole community. The
concerns, ideas, etc. will be summarized in the BoF and should provide a
good starting point for a discussion.

We know that over the years the topic of a parallel extension for LLVM
was discussed on the mailing list [0, 1, 2], workshops [3, 4] or in
scientific publications [5, 6, 7]. ***

We believe the solutions implemented in LLVM, namely parallel loop
metadata [8] and early proceduralization (aka. early outlining) for
OpenMP [9] or Cilk+ [10], are not well suited for optimization of "general
parallel codes". The reason are many fold and have been often
discussed alongside the various proposals mentioned above. Regarding
only the current implementation some problems have already manifested,
including
  - less optimization potential (partly) due to weak inter-procedural
    analysis [7, 11].
  - easy breakage of "parallelism" due to removal of metadata by
    intermediate passes [12, 13].
and others are likely to do so if we want to support more parallel
front-ends, optimizations and backends/runtimes.

In the beginning of this year a working group on
  "LLVM-HPC IR extensions for Parallelization, Vectorization and
   Offloading of LLVM compilers"
was initiated by Xinmin Tian. People from various companies, research
institutions and some universities discussed different approaches
regarding "parallelism" in the compiler IR/pipeline. Based on the
generally positive attitude regards a "more intrusive" parallel
extension we decided to resurrect the discussion once more, including a
BoF at the US developers meeting in 3 weeks.

To structure the mailing list discussion we propose to:
  - Inform a broader audience on the (currently) proposed approaches
    targeted specifically at LLVM (including but not necessarily limited
    to the work by Intel, Dounia Khaldi et al, Tao Schardl et al and our
    own work)
  - Collect/summarize arguments for and against a "more intrusive"
    parallel representation in LLVM.
  - Collect/summarize requirements including abstract design goals but
    also concrete examples that should (not) be supported.

=> We will use the summaries to prepare a short presentation for the BoF
   (~10min) which allows us to use the majority of time for a qualified
   discussion on the topic.

__Before__ we now dive into a technical discussion I would like people
to provide feedback on the proposed structure first. This will
(hopefully) allow a more organized and constructive discussion.

Thanks,
  Johannes, Simon and Kevin

*** All lists of references are incomplete. They provide a starting
    point for interested readers but not a summary of what happened.

[0] http://lists.llvm.org/pipermail/llvm-dev/2015-February/082220.html
[1] http://lists.llvm.org/pipermail/llvm-dev/2015-March/083134.html
[2] Redirecting to Google Groups
[3] The Second Workshop on the LLVM Compiler Infrastructure in HPC
[4] CSW Spring 2013 - HiPEAC
[5] http://dl.acm.org/citation.cfm?id=2523721.2523727
[6] http://dl.acm.org/citation.cfm?id=2095103
[7] http://cpc2016.infor.uva.es/wp-content/uploads/2016/06/CPC2016_paper_12.pdf
[8] http://llvm.org/docs/LangRef.html#llvm-mem-parallel-loop-access-metadata
[9] http://llvm.org/devmtg/2014-10/Slides/Bataev-OpenMP.pdf
[10] http://cilkplus.github.io/
[11] http://openmp.org/sc13/OpenMPBoF_LLVM.pdf
[12] https://reviews.llvm.org/D5344
[13] https://reviews.llvm.org/D12710

Hi Johannes,

I think this is a great idea! With a deadline to meet (~2 weeks), and
a particular focus (feed the BoF discussion), I think we can keep
ourselves on track.

I particularly welcome a more intrusive change to IR (away from
metadata) to hold parallelism ideas, since we're past the point where
SIMD / multi-core was considered only an optimisation, and the
compiler IR has to evolve to match.

But I'm also worried that we'll end up lost in the multitude of ways
we can extend the IR. A step by step pragmatic approach is fundamental
to keep it sane and robust, and I think your starting point conveys
that well.

We need to make sure we cover simpler cases first, without losing
sight of the more complicated cases as an evolution of whatever plan
we come up with. But also, we need to be backward compatible, so that
the optimisation passes don't start depending solely on the new IR
constructs to work.

cheers,
--renato

For anyone interested in alternative extensions in the direction HDLs (Verilog, VHDL), I did a prototype extended C++ a while back -

  http://parallel.cc/

There are fairly simple examples here (sync/async life) -

  Test Cases

It's currently a meta-compiler for g++, but I am intending to migrate it to LLVM if I get the time :wink:

The motivation was partly that the committees that handle HDL standards (IEEE SystemVerilog & VHDL) refused to support an asynchronous-FSM methodology that would work for both hardware and software development, and most of the open-source HDL efforts are some subset of the commercial stuff (and not proper compilers).

The HDL paradigm matches what you need for programming things like neural networks, NUMA/heterogeneous architectures in general, and distributed computing since linkage is done on data-pipes/signals rather than over API/ABI interfaces.

Kev.

Hi Renato,

> To structure the mailing list discussion we propose to:
> - Inform a broader audience on the (currently) proposed approaches
> targeted specifically at LLVM (including but not necessarily limited
> to the work by Intel, Dounia Khaldi et al, Tao Schardl et al and our
> own work)
> - Collect/summarize arguments for and against a "more intrusive"
> parallel representation in LLVM.
> - Collect/summarize requirements including abstract design goals but
> also concrete examples that should (not) be supported.
>
> => We will use the summaries to prepare a short presentation for the BoF
> (~10min) which allows us to use the majority of time for a qualified
> discussion on the topic.

Hi Johannes,

I think this is a great idea! With a deadline to meet (~2 weeks), and
a particular focus (feed the BoF discussion), I think we can keep
ourselves on track.

I hope so yes.

I particularly welcome a more intrusive change to IR (away from
metadata) to hold parallelism ideas, since we're past the point where
SIMD / multi-core was considered only an optimisation, and the
compiler IR has to evolve to match.

I fully agree and more and more people seem to do the same.

But I'm also worried that we'll end up lost in the multitude of ways
we can extend the IR. A step by step pragmatic approach is fundamental
to keep it sane and robust, and I think your starting point conveys
that well.

I agree. Defining and understanding the scope of the proposed or just
desired extensions is hard but necessary. More on this below.

We need to make sure we cover simpler cases first, without losing
sight of the more complicated cases as an evolution of whatever plan
we come up with. But also, we need to be backward compatible, so that
the optimisation passes don't start depending solely on the new IR
constructs to work.

I guess backwards compatibility with the existing OpenMP/Cilk frontends
is not too hard. Early proceduralization should always work.

Regarding the simple cases and more complicated ones I will put examples
in a google doc to show how easy parallel constructs can become hard to
represent (e.g., due to control flow or non-wait annotations). So far we
mostly looked at how to model the parallelization schemes of OpenMP,
Cilk and OpenCL but also defined some we do not want to represent at all
(e.g., general task/thread) parallelism). I think examples will help to
argue about cases in a more constructive (or better target oriented) way.

Cheers,
  Johannes

On 15/10/2016 10:22, "llvm-dev on behalf of Johannes Doerfert via
llvm-dev" <llvm-dev-bounces@lists.llvm.org on behalf of

Hi Renato,

> To structure the mailing list discussion we propose to:
> - Inform a broader audience on the (currently) proposed approaches
> targeted specifically at LLVM (including but not necessarily
limited
> to the work by Intel, Dounia Khaldi et al, Tao Schardl et al and
our
> own work)
> - Collect/summarize arguments for and against a "more intrusive"
> parallel representation in LLVM.
> - Collect/summarize requirements including abstract design goals but
> also concrete examples that should (not) be supported.
>
> => We will use the summaries to prepare a short presentation for the
BoF
> (~10min) which allows us to use the majority of time for a
qualified
> discussion on the topic.

Hi Johannes,

I think this is a great idea! With a deadline to meet (~2 weeks), and
a particular focus (feed the BoF discussion), I think we can keep
ourselves on track.

I hope so yes.

I particularly welcome a more intrusive change to IR (away from
metadata) to hold parallelism ideas, since we're past the point where
SIMD / multi-core was considered only an optimisation, and the
compiler IR has to evolve to match.

I fully agree and more and more people seem to do the same.

But I'm also worried that we'll end up lost in the multitude of ways
we can extend the IR. A step by step pragmatic approach is fundamental
to keep it sane and robust, and I think your starting point conveys
that well.

I agree. Defining and understanding the scope of the proposed or just
desired extensions is hard but necessary. More on this below.

We need to make sure we cover simpler cases first, without losing
sight of the more complicated cases as an evolution of whatever plan
we come up with. But also, we need to be backward compatible, so that
the optimisation passes don't start depending solely on the new IR
constructs to work.

I guess backwards compatibility with the existing OpenMP/Cilk frontends
is not too hard. Early proceduralization should always work.

Regarding the simple cases and more complicated ones I will put examples
in a google doc to show how easy parallel constructs can become hard to
represent (e.g., due to control flow or non-wait annotations). So far we
mostly looked at how to model the parallelization schemes of OpenMP,
Cilk and OpenCL but also defined some we do not want to represent at all
(e.g., general task/thread) parallelism). I think examples will help to
argue about cases in a more constructive (or better target oriented) way.

Hi Johannes,

Great idea! I just want to add a pointer to some of the work (I am aware
of) that has beed done in defining an IR that can represent parallelism,
called “INSPIRE".

Quoting the relevant parts in the abstract of the paper that describe this
IR:

   “ […] INSPIRE, a unified, parallel, high-level intermediate
representation.
Instead of mapping parallel constructs and APIs to external routines,
their behaviour is modeled explicitly using a unified and fixed set of
parallel language constructs. Making the parallel control flow
accessible to the compiler lays the foundation for the development of
reusable, static and dynamic analyses and transformations bridging the
gap between a variety of parallel paradigms “

The paper can be found here:

I don’t know how much of this can/should be reused in defining the new
parallel IR, but it might be a good starting point.

I am looking forward to see the outcome of this new development in LLVM!

Regards,

Francesco