RFC: Import of Integer Set Library into LLVM source tree

Michael_Kruse2 · January 15, 2018, 4:52pm

Dear community,

for our goal to make polyhedral optimization available in the main
LLVM source, we will need the Integer Set Library (isl)[1]. It is the
main dependency of Polly, but would be required even if we do not
directly import Polly.

I already prepared a patch [2], unfortunately without feedback on the
general approach. As Philip suggested in the review, I am reaching out
with an RFC with a wider audience on llvm-dev.

As for the main motivation on why to import the entire source of isl at all:
Polly interacts relative tightly with isl which provide the main
optimization algorithms. For instance, Polly's regression tests
compare the string output of set representation, which unfortunately
can change between versions. For instance, if a newer version
implements a simpler string description of the same set. That means
that tests would fail if a different version is installed on the host
system. This is significant for buildbots that would need regular
manual action to install a newer version of isl on the system.

An alternative would be to only store the git sha1 and a CMake script
would download that version when building. However, it would also be
more difficult to both sources in-sync in the checkout (think of git
bisect).

I suppose these were also the reasons why Polly imported the isl
source in commit r228193 instead of requiring users to call
utils/checkout_isl.sh manually.

The other design decisions and rationales are mentioned in the review
summary [2]. I repeat them here so they can be referred to in replies.

* The library is named LLVMISL and contained in the lib/ISL folder to
work best with LLVM's component system. The component's name "ISL" was
chosen over "isl" as it matches the capitalization of other
two/three-letter-acronym components.

* Isl's headers are installed into $PREFIX/include/llvm/ISL/isl. This
is to not conflict with potentially user/system-installed isl headers.
Please note that this means that MIT-licensed headers land on the
target system.

* The source of isl itself is added to the contrib/isl directory. This
is to keep the source in one directory rather than spreading it into
the lib and include directories, facilitating integration of upstream
changes. There is a script "isl-merge.py" that automates this. It
currently requires LLVM being checked out using git. The new subfolder
"contrib" should make it clear that this is downstream code and maybe
one shouldn't run clang-format over it which would make the next merge
difficult.

* There is an additional "GIT_HEAD_ID" file containing the upstream's
revision. It is used by isl-merge.py to determine the common ancestor
and CMakeLists.txt to determine the version. Normally isl's make dist
would generate this file, it is done manually here to not requiring a
configure/make/make dist cycle when merging the latest isl. Due to
different Autotools versions being installed on the different systems,
this might lead to spurious conflicts with the generated configure
etc. files. However, due to the need of an isl-noexeceptions.h, I may
need to include such a cycle in isl-merge.py anyway.

* Isl has its own unittest "isl_test.c". It is put into bin/ (but not
installed) and run by llvm-lit on llvm-check. This was just the
simplest to do since it doesn't require any additional lit.site.cfg in
order to find the isl_test in a different directory.

Looking forward for your feedback.

Michael

[1] http://isl.gforge.inria.fr/
[2] ⚙ D40122 Add isl to LLVM repository.

Finkel_Hal_J · January 15, 2018, 5:21pm

Dear community,

for our goal to make polyhedral optimization available in the main
LLVM source, we will need the Integer Set Library (isl)[1]. It is the
main dependency of Polly, but would be required even if we do not
directly import Polly.

I already prepared a patch [2], unfortunately without feedback on the
general approach. As Philip suggested in the review, I am reaching out
with an RFC with a wider audience on llvm-dev.

If this is ready to be reviewed, I recommend removing the "[WIP]" from
the title. Normally I interpret "[WIP]" (which I presume stands for
"work in progress") as meaning "I'm looking for early feedback on this
approach, perhaps primarily from a limited audience, but the patch
itself is not really ready for review (e.g., still known not to build,
to crash, miscompile, and so on).

What you have below sounds generally good to me (a couple of questions
below).

As for the main motivation on why to import the entire source of isl at all:
Polly interacts relative tightly with isl which provide the main
optimization algorithms. For instance, Polly's regression tests
compare the string output of set representation, which unfortunately
can change between versions. For instance, if a newer version
implements a simpler string description of the same set. That means
that tests would fail if a different version is installed on the host
system. This is significant for buildbots that would need regular
manual action to install a newer version of isl on the system.

An alternative would be to only store the git sha1 and a CMake script
would download that version when building. However, it would also be
more difficult to both sources in-sync in the checkout (think of git
bisect).

I suppose these were also the reasons why Polly imported the isl
source in commit r228193 instead of requiring users to call
utils/checkout_isl.sh manually.

The other design decisions and rationales are mentioned in the review
summary [2]. I repeat them here so they can be referred to in replies.

* The library is named LLVMISL and contained in the lib/ISL folder to
work best with LLVM's component system. The component's name "ISL" was
chosen over "isl" as it matches the capitalization of other
two/three-letter-acronym components.

* Isl's headers are installed into $PREFIX/include/llvm/ISL/isl. This
is to not conflict with potentially user/system-installed isl headers.
Please note that this means that MIT-licensed headers land on the
target system.

* The source of isl itself is added to the contrib/isl directory. This
is to keep the source in one directory rather than spreading it into
the lib and include directories, facilitating integration of upstream
changes.
There is a script "isl-merge.py" that automates this. It
currently requires LLVM being checked out using git. The new subfolder
"contrib" should make it clear that this is downstream code and maybe
one shouldn't run clang-format over it which would make the next merge
difficult.

* There is an additional "GIT_HEAD_ID" file containing the upstream's
revision. It is used by isl-merge.py to determine the common ancestor
and CMakeLists.txt to determine the version. Normally isl's make dist
would generate this file, it is done manually here to not requiring a
configure/make/make dist cycle when merging the latest isl. Due to
different Autotools versions being installed on the different systems,
this might lead to spurious conflicts with the generated configure
etc. files. However, due to the need of an isl-noexeceptions.h, I may
need to include such a cycle in isl-merge.py anyway.

Can you clarify when/if autotools are required and under what circumstances.

* Isl has its own unittest "isl_test.c". It is put into bin/ (but not
installed) and run by llvm-lit on llvm-check. This was just the
simplest to do since it doesn't require any additional lit.site.cfg in
order to find the isl_test in a different directory.

Can it end up in the unittests subdirectory with the other unit tests?

Thanks again,
Hal

Michael_Kruse2 · January 15, 2018, 6:08pm

Thank you for the feedback.

If this is ready to be reviewed, I recommend removing the "[WIP]" from
the title. Normally I interpret "[WIP]" (which I presume stands for
"work in progress") as meaning "I'm looking for early feedback on this
approach, perhaps primarily from a limited audience, but the patch
itself is not really ready for review (e.g., still known not to build,
to crash, miscompile, and so on).

Thanks for the hint, I did not think of this. The intention to keep
the flag was that I do not intend to commit it right away, but rather
to discuss the approach. Committing is only becomes necessary when
implementing something requiring it.

* There is an additional "GIT_HEAD_ID" file containing the upstream's
revision. It is used by isl-merge.py to determine the common ancestor
and CMakeLists.txt to determine the version. Normally isl's make dist
would generate this file, it is done manually here to not requiring a
configure/make/make dist cycle when merging the latest isl. Due to
different Autotools versions being installed on the different systems,
this might lead to spurious conflicts with the generated configure
etc. files. However, due to the need of an isl-noexeceptions.h, I may
need to include such a cycle in isl-merge.py anyway.

Can you clarify when/if autotools are required and under what circumstances.

Polly's update_isl.py runs it to generate the files GIT_HEAD_ID,
isl_config.h and isl-noexeceptions.h where only the GIT_HEAD_ID is
actually used.

GIT_HEAD_ID:
Polly) Use the one generated by autotools, run by update_isl.py
D40122) Generated by isl-merge.py

isl_config.h:
Polly) Generated by CMakeLists.txt
D40122) Generated by CMakeLists.txt

isl-noexeceptions.h:
Polly) Added manually to the repository. When an update is necessary,
the user has to configure/build isl and copy the generated
isl-noexeceptions.h into the repository
D40122) There is no isl-noexeceptions.h

Since sometime we will need isl-noexeceptions.h, here are the approaches:

using autotools) Like update_isl.py, run autotools by isl-merge.py and
copy the file generated into the repository.
without autotools) Run the commands that the autotools' build would
run by isl-merge.py

The latter might require maintenance if the way isl's build system
creates isl-noexeceptions.h changes.

autotools are required only in the former solution and only when
someone wants to merge a newer version of isl by running isl-merge.py.

* Isl has its own unittest "isl_test.c". It is put into bin/ (but not
installed) and run by llvm-lit on llvm-check. This was just the
simplest to do since it doesn't require any additional lit.site.cfg in
order to find the isl_test in a different directory.

Can it end up in the unittests subdirectory with the other unit tests?

The lit.cfg in test\Unit directory assumes that all tests are using
gtest. This is not the case for isl_test.

The unittests subdirectory only contains the sources of the unittests,
isl_test's sources are in contrib/isl/isl_test.c for the
external-source separating reason.

We could move the CMakeLists.txt to build isl_test into the unittest
dir, but for compiling, it requires some definitions from
lib/ISL/CMakeLists.txt (e.g. include paths) which are not available in
sibling build directories. One could move these definitions further up
into the root CMakeLists.txt or duplicate in unittests/ISL. IMHO, both
have more downsides than the advantage of having a unittest/ISL
directory.

Michael

Philip_Pfaffe · January 15, 2018, 9:10pm

Hi Micheal,

thanks for moving this forward!

Tobias_Grosser4 · January 16, 2018, 7:53am

Hi Micheal,

thanks for moving this forward!

2018-01-15 17:52 GMT+01:00 Michael Kruse via llvm-dev <
llvm-dev@lists.llvm.org>:

> Dear community,
>
> for our goal to make polyhedral optimization available in the main
> LLVM source, we will need the Integer Set Library (isl)[1]. It is the
> main dependency of Polly, but would be required even if we do not
> directly import Polly.
>
> I already prepared a patch [2], unfortunately without feedback on the
> general approach. As Philip suggested in the review, I am reaching out
> with an RFC with a wider audience on llvm-dev.
>
> As for the main motivation on why to import the entire source of isl at
> all:
> Polly interacts relative tightly with isl which provide the main
> optimization algorithms. For instance, Polly's regression tests
> compare the string output of set representation, which unfortunately
> can change between versions. For instance, if a newer version
> implements a simpler string description of the same set. That means
> that tests would fail if a different version is installed on the host
> system. This is significant for buildbots that would need regular
> manual action to install a newer version of isl on the system.
>
> An alternative would be to only store the git sha1 and a CMake script
> would download that version when building. However, it would also be
> more difficult to both sources in-sync in the checkout (think of git
> bisect).
>
> I suppose these were also the reasons why Polly imported the isl
> source in commit r228193 instead of requiring users to call
> utils/checkout_isl.sh manually.
>
>
> The other design decisions and rationales are mentioned in the review
> summary [2]. I repeat them here so they can be referred to in replies.
>
> * The library is named LLVMISL and contained in the lib/ISL folder to
> work best with LLVM's component system. The component's name "ISL" was
> chosen over "isl" as it matches the capitalization of other
> two/three-letter-acronym components.
>
> * Isl's headers are installed into $PREFIX/include/llvm/ISL/isl. This
> is to not conflict with potentially user/system-installed isl headers.
> Please note that this means that MIT-licensed headers land on the
> target system.
>

I think this needs to be emphasized. This means we effectively ship another
version of isl to our users, independent of whether they have one already.
Is this really what we want?

Yes, I feel this is important for predicability and reproducability of test cases. Even minor differences in isl version might result in differences in how code is compiled. Without using a fixed version of isl for a given SVN id, it will be more difficult to reproduce bug reports.

Do you see any drawback with this approach?

Best,
Tobias

Tobias_Grosser4 · January 16, 2018, 7:59am

> Hi Micheal,
>
> thanks for moving this forward!
>
> 2018-01-15 17:52 GMT+01:00 Michael Kruse via llvm-dev <
> llvm-dev@lists.llvm.org>:
>
> > Dear community,
> >
> > for our goal to make polyhedral optimization available in the main
> > LLVM source, we will need the Integer Set Library (isl)[1]. It is the
> > main dependency of Polly, but would be required even if we do not
> > directly import Polly.
> >
> > I already prepared a patch [2], unfortunately without feedback on the
> > general approach. As Philip suggested in the review, I am reaching out
> > with an RFC with a wider audience on llvm-dev.
> >
> > As for the main motivation on why to import the entire source of isl at
> > all:
> > Polly interacts relative tightly with isl which provide the main
> > optimization algorithms. For instance, Polly's regression tests
> > compare the string output of set representation, which unfortunately
> > can change between versions. For instance, if a newer version
> > implements a simpler string description of the same set. That means
> > that tests would fail if a different version is installed on the host
> > system. This is significant for buildbots that would need regular
> > manual action to install a newer version of isl on the system.
> >
> > An alternative would be to only store the git sha1 and a CMake script
> > would download that version when building. However, it would also be
> > more difficult to both sources in-sync in the checkout (think of git
> > bisect).
> >
> > I suppose these were also the reasons why Polly imported the isl
> > source in commit r228193 instead of requiring users to call
> > utils/checkout_isl.sh manually.
> >
> >
> > The other design decisions and rationales are mentioned in the review
> > summary [2]. I repeat them here so they can be referred to in replies.
> >
> > * The library is named LLVMISL and contained in the lib/ISL folder to
> > work best with LLVM's component system. The component's name "ISL" was
> > chosen over "isl" as it matches the capitalization of other
> > two/three-letter-acronym components.
> >
> > * Isl's headers are installed into $PREFIX/include/llvm/ISL/isl. This
> > is to not conflict with potentially user/system-installed isl headers.
> > Please note that this means that MIT-licensed headers land on the
> > target system.
> >
>
> I think this needs to be emphasized. This means we effectively ship another
> version of isl to our users, independent of whether they have one already.
> Is this really what we want?

Yes, I feel this is important for predicability and reproducability of
test cases. Even minor differences in isl version might result in
differences in how code is compiled. Without using a fixed version of
isl for a given SVN id, it will be more difficult to reproduce bug
reports.

Do you see any drawback with this approach?

Also, to add some more context. gcc e.g. does not include isl as a dependency, but a limited set of isl versions can be chosen. So this is indeed possible. However, it the set of isl versions that are allowed is indeed rather limited, and besides the previously mentioned drawbacks, we also have less test coverage on combinations that have not been validated for a release and it is harder to get fixes fast into isl (if older releases should still somehow be supported).

I feel the LLVM community as a whole has a tradition of 1) staying on trunk and 2) including all (most) dependences in the default checkout.

Best,
Tobias

Finkel_Hal_J · January 16, 2018, 5:04pm

Hi Micheal,

thanks for moving this forward!

2018-01-15 17:52 GMT+01:00 Michael Kruse via llvm-dev <
llvm-dev@lists.llvm.org>:

Dear community,

for our goal to make polyhedral optimization available in the main
LLVM source, we will need the Integer Set Library (isl)[1]. It is the
main dependency of Polly, but would be required even if we do not
directly import Polly.

I already prepared a patch [2], unfortunately without feedback on the
general approach. As Philip suggested in the review, I am reaching out
with an RFC with a wider audience on llvm-dev.

As for the main motivation on why to import the entire source of isl at
all:
Polly interacts relative tightly with isl which provide the main
optimization algorithms. For instance, Polly's regression tests
compare the string output of set representation, which unfortunately
can change between versions. For instance, if a newer version
implements a simpler string description of the same set. That means
that tests would fail if a different version is installed on the host
system. This is significant for buildbots that would need regular
manual action to install a newer version of isl on the system.

An alternative would be to only store the git sha1 and a CMake script
would download that version when building. However, it would also be
more difficult to both sources in-sync in the checkout (think of git
bisect).

I suppose these were also the reasons why Polly imported the isl
source in commit r228193 instead of requiring users to call
utils/checkout_isl.sh manually.

The other design decisions and rationales are mentioned in the review
summary [2]. I repeat them here so they can be referred to in replies.

* The library is named LLVMISL and contained in the lib/ISL folder to
work best with LLVM's component system. The component's name "ISL" was
chosen over "isl" as it matches the capitalization of other
two/three-letter-acronym components.

* Isl's headers are installed into $PREFIX/include/llvm/ISL/isl. This
is to not conflict with potentially user/system-installed isl headers.
Please note that this means that MIT-licensed headers land on the
target system.

I think this needs to be emphasized. This means we effectively ship another
version of isl to our users, independent of whether they have one already.
Is this really what we want?

Yes, I feel this is important for predicability and reproducability of
test cases. Even minor differences in isl version might result in
differences in how code is compiled. Without using a fixed version of
isl for a given SVN id, it will be more difficult to reproduce bug
reports.

Do you see any drawback with this approach?

Also, to add some more context. gcc e.g. does not include isl as a dependency, but a limited set of isl versions can be chosen. So this is indeed possible. However, it the set of isl versions that are allowed is indeed rather limited, and besides the previously mentioned drawbacks, we also have less test coverage on combinations that have not been validated for a release and it is harder to get fixes fast into isl (if older releases should still somehow be supported).

I feel the LLVM community as a whole has a tradition of 1) staying on trunk and 2) including all (most) dependences in the default checkout.

I completely agree for exactly these reasons. It also gives us greater
flexibility in how we move forward with the integration, for example how
things are refactored, later on in the process.

-Hal

Philip_Pfaffe · January 16, 2018, 11:26pm

Hi folks,

I actually just wanted to point out that this means isl will also be part of any binary distribution. Especially if, as Tobias says, minor differences can lead to issues. In that, I’m not worried about the LLVM community, since they have the in-tree version available. But do we want to throw that onto the average, say, Ubuntu user?

Alternatively, we can pull isl from the public Polly interface, and possibly replace it with a wrapper that we can ship. I’m worried that that might incur a performance impact, but of course we’d have to see numbers on that.

Cheers,
Philip

preames · January 17, 2018, 12:41am

Dear community,

for our goal to make polyhedral optimization available in the main
LLVM source, we will need the Integer Set Library (isl)[1]. It is the
main dependency of Polly, but would be required even if we do not
directly import Polly.

I already prepared a patch [2], unfortunately without feedback on the
general approach. As Philip suggested in the review, I am reaching out
with an RFC with a wider audience on llvm-dev.

As for the main motivation on why to import the entire source of isl at all:
Polly interacts relative tightly with isl which provide the main
optimization algorithms. For instance, Polly's regression tests
compare the string output of set representation, which unfortunately
can change between versions. For instance, if a newer version
implements a simpler string description of the same set. That means
that tests would fail if a different version is installed on the host
system. This is significant for buildbots that would need regular
manual action to install a newer version of isl on the system.

An alternative would be to only store the git sha1 and a CMake script
would download that version when building. However, it would also be
more difficult to both sources in-sync in the checkout (think of git
bisect).

I suppose these were also the reasons why Polly imported the isl
source in commit r228193 instead of requiring users to call
utils/checkout_isl.sh manually.

The other design decisions and rationales are mentioned in the review
summary [2]. I repeat them here so they can be referred to in replies.

* The library is named LLVMISL and contained in the lib/ISL folder to
work best with LLVM's component system. The component's name "ISL" was
chosen over "isl" as it matches the capitalization of other
two/three-letter-acronym components.

* Isl's headers are installed into $PREFIX/include/llvm/ISL/isl. This
is to not conflict with potentially user/system-installed isl headers.
Please note that this means that MIT-licensed headers land on the
target system.

This comment caught my eye. Looking at ISL, it appears to offer a MIT license only. How does this interact with LLVM's licensing? There does appear to be some precedent for third party libraries under different licenses, but this needs careful discussion and review by a knowledgeable party. (I am not one.)

clattner · January 17, 2018, 6:22am

Hi Michael,

I have significant concerns about including ISL into LLVM. LLVM intentionally takes a very conservative approach to bundling third party code with itself: we prefer instead of have them be external dependencies that are conditionally configured in on demand.

I’m not an expert in ISL in particular, but it looks like it is something of an odd dependency to include: it isn’t structured at all like LLVM, and it (apparently, I could definitely be wrong) depends on the GMP library which is LGPL. Beyond licensing, it has an old-school C-style API which appears to be extremely inefficient compared to things like APInt that avoid allocations for 64-bit integers and less.

How much of ISL are you using? What other options have you considered? Why should this be included with every LLVM distribution instead of being conditionally compiled? If the major issue is the polly regression tests, why not fix the tests?

-Chris

Tobias_Grosser4 · January 17, 2018, 8:14am

> As for the main motivation on why to import the entire source of isl at all:
> Polly interacts relative tightly with isl which provide the main
> optimization algorithms. For instance, Polly's regression tests
> compare the string output of set representation, which unfortunately
> can change between versions. For instance, if a newer version
> implements a simpler string description of the same set. That means
> that tests would fail if a different version is installed on the host
> system. This is significant for buildbots that would need regular
> manual action to install a newer version of isl on the system.

Hi Michael,

Hi Chris,

thank you for your feedback and especially for raising these concerns. We talked with different members of the community and share parts of your concerns.

I have significant concerns about including ISL into LLVM. LLVM
intentionally takes a very conservative approach to bundling third party
code with itself: we prefer instead of have them be external
dependencies that are conditionally configured in on demand.

I understand. If needed we can make the compilation of isl optional.

However, as pointed out earlier in this thread it seems preferable to lock LLVM with a very specific version of isl to make sure version differences don't cause spurious bugs. This at least has shown to be very useful in Polly.

I’m not an expert in ISL in particular, but it looks like it is
something of an odd dependency to include: it isn’t structured at all
like LLVM, and it (apparently, I could definitely be wrong) depends on
the GMP library which is LGPL.

It can optionally be compiled with GMP, however especially for the use in LLVM we added together with Qualcomm an MIT licensed imath backend (and tuned its performance).

Beyond licensing, it has an old-school
C-style API which appears to be extremely inefficient compared to things
like APInt that avoid allocations for 64-bit integers and less.

Over the last year we developed C++ bindings for isl which make its use a lot easier:

github.com

llvm-mirror/polly/blob/master/lib/External/isl/include/isl/isl-noexceptions.h

/// These are automatically generated checked C++ bindings for isl.
///
/// isl is a library for computing with integer sets and maps described by
/// Presburger formulas. On top of this, isl provides various tools for
/// polyhedral compilation, ranging from dependence analysis over scheduling
/// to AST generation.

#ifndef ISL_CPP_CHECKED
#define ISL_CPP_CHECKED

#include <isl/val.h>
#include <isl/aff.h>
#include <isl/set.h>
#include <isl/space.h>
#include <isl/id.h>
#include <isl/map.h>
#include <isl/vec.h>
#include <isl/ilp.h>
#include <isl/union_set.h>
#include <isl/union_map.h>

This file has been truncated. show original

While this does not reduce heap allocations, it will make it easier to replace performance critical parts of isl over time. While we don't have concrete plans to rewrite core parts of isl, I expect over time more and more of isl-style functionality to be incorporated natively in LLVM.

I expect that we could likely reduce heap allocations further, but some of the low-hanging fruits have already been picked. We avoid allocations of GMP/imath data structures all together for cases where integers are smaller than 32 bit. Most critical performance improvements today come from algorithmic improvements.

Having an implementation that serves as baseline when considering to replacing parts of isl seems to be a good idea.

How much of ISL are you using?

We currently use the (parameteric) integer programming solver, the integer constraint library, the constraint to AST code generator, the ILP based scheduler, ...

What other options have you considered?

There are no other license compatible alternatives for the full set of functionalities we are using.

Alternatives for the constraint library part are PPL (which does not provide integer constraints), as well as omega (which served as inspiration for isl, but is today mostly unmaintained). Omega is to my knowledge used in compilers such as icc.

Alternatives for the code generation part are ClooG and codegen+. Both are older, provide less functionality, have license issues, and are C based.

Alternatives for the scheduler are Pluto(+) (which is license incompatible).

There are many alternatives for the LP solver. I did not do an extensive search here.

Why should this be included with every LLVM distribution instead of
being conditionally compiled? If the major issue is the polly
regression tests, why not fix the tests?

We currently don't have any issue. isl is today shipped as part of the Polly repository.

As Polly matured over time, all license issues have been resolved, we want to better integrate Polly also with the new pass manager, and alternative uses of parts of isl arose, we propose (see other thread) to move Polly into the LLVM core repository. This would allow also for a closer integration of polyhedral operations with parts of LLVM. This patch focuses on where/how to integrate isl.

Best,
Tobias

Tobias_Grosser4 · January 17, 2018, 8:22am

> > As for the main motivation on why to import the entire source of isl at all:
> > Polly interacts relative tightly with isl which provide the main
> > optimization algorithms. For instance, Polly's regression tests
> > compare the string output of set representation, which unfortunately
> > can change between versions. For instance, if a newer version
> > implements a simpler string description of the same set. That means
> > that tests would fail if a different version is installed on the host
> > system. This is significant for buildbots that would need regular
> > manual action to install a newer version of isl on the system.
>
> Hi Michael,

Hi Chris,

thank you for your feedback and especially for raising these concerns.
We talked with different members of the community and share parts of
your concerns.

> I have significant concerns about including ISL into LLVM. LLVM
> intentionally takes a very conservative approach to bundling third party
> code with itself: we prefer instead of have them be external
> dependencies that are conditionally configured in on demand.

I understand. If needed we can make the compilation of isl optional.

However, as pointed out earlier in this thread it seems preferable to
lock LLVM with a very specific version of isl to make sure version
differences don't cause spurious bugs. This at least has shown to be
very useful in Polly.

> I’m not an expert in ISL in particular, but it looks like it is
> something of an odd dependency to include: it isn’t structured at all
> like LLVM, and it (apparently, I could definitely be wrong) depends on
> the GMP library which is LGPL.

It can optionally be compiled with GMP, however especially for the use
in LLVM we added together with Qualcomm an MIT licensed imath backend
(and tuned its performance).

> Beyond licensing, it has an old-school
> C-style API which appears to be extremely inefficient compared to things
> like APInt that avoid allocations for 64-bit integers and less.

Over the last year we developed C++ bindings for isl which make its use
a lot easier:

https://github.com/llvm-mirror/polly/blob/master/lib/External/isl/include/isl/isl-noexceptions.h

While this does not reduce heap allocations, it will make it easier to
replace performance critical parts of isl over time. While we don't have
concrete plans to rewrite core parts of isl, I expect over time more and
more of isl-style functionality to be incorporated natively in LLVM.

I expect that we could likely reduce heap allocations further, but some
of the low-hanging fruits have already been picked. We avoid allocations
of GMP/imath data structures all together for cases where integers are
smaller than 32 bit. Most critical performance improvements today come
from algorithmic improvements.

Having an implementation that serves as baseline when considering to
replacing parts of isl seems to be a good idea.

> How much of ISL are you using?

We currently use the (parameteric) integer programming solver, the
integer constraint library, the constraint to AST code generator, the
ILP based scheduler, ...

> What other options have you considered?

There are no other license compatible alternatives for the full set of
functionalities we are using.

Alternatives for the constraint library part are PPL (which does not
provide integer constraints), as well as omega (which served as
inspiration for isl, but is today mostly unmaintained). Omega is to my
knowledge used in compilers such as icc.

Alternatives for the code generation part are ClooG and codegen+. Both
are older, provide less functionality, have license issues, and are C
based.

Alternatives for the scheduler are Pluto(+) (which is license incompatible).

There are many alternatives for the LP solver. I did not do an extensive
search here.

> Why should this be included with every LLVM distribution instead of
> being conditionally compiled? If the major issue is the polly
> regression tests, why not fix the tests?

We currently don't have any issue. isl is today shipped as part of the
Polly repository.

As Polly matured over time, all license issues have been resolved, we
want to better integrate Polly also with the new pass manager, and
alternative uses of parts of isl arose, we propose (see other thread) to
move Polly into the LLVM core repository. This would allow also for a
closer integration of polyhedral operations with parts of LLVM. This
patch focuses on where/how to integrate isl.

Also, one more data-point. We currently ship isl + Polly with the standard LLVM releases, as well as the debian + ubuntu package builds. Hence, this path is well tested.

Best,
Tobias

Tobias_Grosser4 · January 17, 2018, 9:46am

We changed the license of isl to MIT following the recommendation of Chris Lattner. a while ago. Hal Finkel recently re-checked the license situation. It seems to be fine for now. If at some point this is becoming a blocker, we could theoretically consider re-licensing once more (or alternatively rewrite parts of isl with the knowledge we gained from using it).

Best,
Tobias

Michael_Kruse2 · January 17, 2018, 12:35pm

Hi Chris,

thanks for your feedback. I have some additions to what Tobias already
mentioned.

I have significant concerns about including ISL into LLVM. LLVM intentionally takes a very conservative approach to bundling third party code with itself: we prefer instead of have them be external dependencies that are conditionally configured in on demand.

I’m not an expert in ISL in particular, but it looks like it is something of an odd dependency to include: it isn’t structured at all like LLVM, and it (apparently, I could definitely be wrong) depends on the GMP library which is LGPL.

isl has three arbitrary-length integer backends: gmp, imath
(GitHub - creachadair/imath: Arbitrary precision integer and rational arithmetic library, MIT licensed), imath32 (32 bit
integer, falls back to imath on overflow).

The imath support has been added because of the licensing issue when
included into Polly.

Beyond licensing, it has an old-school C-style API which appears to be extremely inefficient compared to things like APInt that avoid allocations for 64-bit integers and less.

The imath32 backend was written because of this. We are constantly
improving the performance.

Why should this be included with every LLVM distribution instead of being conditionally compiled?

The patch ⚙ D40122 Add isl to LLVM repository. compiles isl conditionally
(LLVM_INCLUDE_ISL, off by default).

If the major issue is the polly regression tests, why not fix the tests?

The regression tests verify that the LLVM IR is correctly represented
in isl (affine expressions, iteration domains, dependencies, etc),
which in turn depends on isl's string output. The only way I see to
avoid this issue is to not check those at all. Only (input IR ->
generated IR) tests remain. It is like testing ScalarEvolution without
being able to dump an llvm::SCEV.

Michael

clattner · January 18, 2018, 5:40am

I have significant concerns about including ISL into LLVM. LLVM
intentionally takes a very conservative approach to bundling third party
code with itself: we prefer instead of have them be external
dependencies that are conditionally configured in on demand.

I understand. If needed we can make the compilation of isl optional.

However, as pointed out earlier in this thread it seems preferable to lock LLVM with a very specific version of isl to make sure version differences don't cause spurious bugs. This at least has shown to be very useful in Polly.

Great, I think that that would be a fine approach: you can have the cmake logic detect which version of isl is installed and fail if it is the wrong version. This would address my concern.

What other options have you considered?

There are no other license compatible alternatives for the full set of functionalities we are using.

I’m not familiar with this technology or how much of it you’re using, but… how much code is this really? Have you considered designing a from scratch implementation in the LLVM style? This would presumably be much more efficient, it would be something you could control, and you’d be able to focus it on a specific requirement.

Why should this be included with every LLVM distribution instead of
being conditionally compiled? If the major issue is the polly
regression tests, why not fix the tests?

We currently don't have any issue. isl is today shipped as part of the Polly repository.

As Polly matured over time, all license issues have been resolved, we want to better integrate Polly also with the new pass manager, and alternative uses of parts of isl arose, we propose (see other thread) to move Polly into the LLVM core repository. This would allow also for a closer integration of polyhedral operations with parts of LLVM. This patch focuses on where/how to integrate isl.

Right, but I understand that you’re interested in getting polly integrated with LLVM (something I don’t yet fully understand the implications of). That is a very different situation than including ISL in Polly when Polly is out of tree.

The motivation explained in the original email seemed to hinge around the fact that Polly’s unit tests depend on accidental details of ISL that change across releases. If you can fix that, then it seems more plausible to unbundle it.

If you can’t fix that, and can’t replace ISL, then the best approach seems to have CMake detect and reject incompatible versions.

-Chris

Michael_Kruse2 · January 18, 2018, 2:02pm

This would require buildbot admins to regularly install new versions of isl.

Isl implements the main loop optimization algorithms, so any change in
isl may potentially change Polly's output. This is not accidental but
intended, to profit from improvements in isl's loop optimization
algorithms. With the side-effect of input/output pairs (unit tests) to
be version-locked to a specific version of isl.

Isl currently has 177052 lines in source files. We thought about
writing a C++ implementation of it, but then again it would take
years, introduce new bugs and split the isl community.

Michael

Tobias_Grosser4 · January 18, 2018, 2:38pm

> Great, I think that that would be a fine approach: you can have the cmake logic detect which version of isl is installed and fail if it is the wrong version. This would address my concern.

> The motivation explained in the original email seemed to hinge around the fact that Polly’s unit tests depend on accidental details of ISL that change across releases. If you can fix that, then it seems more plausible to unbundle it.
>
> If you can’t fix that, and can’t replace ISL, then the best approach seems to have CMake detect and reject incompatible versions.

This would require buildbot admins to regularly install new versions of isl.

Isl implements the main loop optimization algorithms, so any change in
isl may potentially change Polly's output. This is not accidental but
intended, to profit from improvements in isl's loop optimization
algorithms.

Specifically, we want to give isl the ability to exploit freedom in representing integer sets (set of linear affine constraints) as long as this preserves semantics. Such kind of simplifications are very useful when collecting assumptions (as SCEV meanwhile does). While SCEV does not yet perform many simplifications over the constraint sets it collects, isl does (and uses ILP solvers to derive these): One of the simplifications we are doing is e.g. set coalescing: http://impact.gforge.inria.fr/impact2015/papers/impact2015-verdoolaege.pdf . Simpler representations in the end will correspond to simpler generated code e.g. for run-time checks, but means that improvements in isl will change test cases.

With the side-effect of input/output pairs (unit tests) to
be version-locked to a specific version of isl.

Isl currently has 177052 lines in source files. We thought about
writing a C++ implementation of it, but then again it would take
years, introduce new bugs and split the isl community.

Michael, Me, Sebastian, Zino and others spent some time in performance tuning isl. Compared to the original gmp version we got about 2x speedup across all our experiments and compared to the original imath version (which was really slow) I feel it was more around 7-8x. We also played with fixed-size-integer versions of isl which could give another 10-30% speedup and I would guess that further reducing malloc traffic certainly will give us another constant speedup. Some other optimizations that certainly make sense are:

- Ensure hot loops are vectorized
- Potentially version parts of isl on the data-types they work with

I feel there is certainly some non-trivial speedup to be gained. However, the largest speedups we got came over the last years from algorithmic changes. Some interesting ones that still require more work are:

- Look into sparse constraint sets
- Avoid redundant computations by caching semantic properties
- Avoid duplicate computations

In fact, we are in parts on very early experiments to replace parts of the isl stack with C++ alternatives and this seems something we should regularly reconsider. I would like to encourage discussions if / what parts of isl would benefit from being rewritten, but I feel this is something we should do as a separate more forward looking discussion. Having isl in-tree will give us a solid baseline as foundation for such discussions.

Best,
Tobias

clattner · January 19, 2018, 4:47am

It seems like you can choose how frequently to update. Updating every 6 to 12 months doesn’t seem overly onerous, and still gives the benefit of updates from the upstream project.

-Chris

Tobias_Grosser4 · January 19, 2018, 8:13am

We tend to update isl at certain times more often and prefer to have a fast turnaround in updates. The reason is that if a new LLVM change exposes a bug we would like to fix the bug in isl immediately and want to get this fix out as fast as possible. Should we in this case ask all buildbot admins to update? What about the debian/ubuntu builds?

Best,
Tobias

asb · January 19, 2018, 1:40pm

Maybe it's worth separating the issues here somewhat:

First issue: For a user building LLVM with Polly support, is it
expected they would build against a specifically chosen version of ISL
rather than whatever their distro happens to have packaged. Is it
expected that a distributed LLVM build would be packaged with a
compiled version of the ISL library? It sounds like the answer is yes
to both. Or am I misinterpreting your comments about rapidly updating
ISL alongside Polly?

Second issue: Should the ISL source be bundled with LLVM? This
discussion is clearly _strongly_ influenced by the previous issue.
Including ISL in LLVM might be the first time an external dependency
of this size has been imported. On the other hand, having on ISL as an
external dependency could also be the first time LLVM has such a
tightly coupled dependency (due to versioning requirements).

Best,

Alex

Topic		Replies	Views
RFC: Import of Integer Set Library into LLVM source tree LLVM Dev List Archives	0	177	January 23, 2018
[RFC] Polly Status and Integration LLVM Dev List Archives	0	90	September 29, 2017
Inclusion of Polly and isl into core LLVM LLVM Dev List Archives	4	243	January 24, 2018
(no subject) LLVM Dev List Archives	2	106	January 16, 2018
Interest in LLVM LLVM Dev List Archives	2	114	December 8, 2014

RFC: Import of Integer Set Library into LLVM source tree

Related topics