Moderators for the 2017 LLVM Developers' Mtg Needed!

The 2017 LLVM Developers’ Meeting relies on volunteers to keep things running smoothly. Moderators are critical to this as they keep speakers on track and facilitate Q&A after the talk. I’m looking for community members who would be attending specific talks anyway, to volunteer to moderate the session.

If you are interested in volunteering, please respond to this email with your first and second choice session times. You will moderate all talks during that time slot and they will occur back to back in the same room. Moderators introduce the speaker, give the speaker warnings about time, and facilitate Q&A by running microphones.

Full schedule here: https://2017llvmdevmtg.sched.com

Session 1 (10:30-12:45, Technical Track)
Dominator Trees and incremental updates that transcend time
GlobalISel: Past, Present, and Future
XRay in LLVM: Function Call Tracing and Analysis

Session 2 (2:15-4:00PM, General Session)
LIGHTNING TALKS
LLVM Compile-Time: Challenges. Improvements. Outlook.

Session 3 (2:15-4:00PM, Technical Track)
Tutorial: Welcome to the back-end: The LLVM machine representation.
Scale, Robust and Regression-Free Loop Optimizations for Scientific Fortran and Modern C++

Session 4 (4:20-5:50PM, General Session)
The Type Sanitizer: Free Yourself from -fno-strict-aliasing
Structure-aware fuzzing for Clang and LLVM with libprotobuf-mutator

Session 5 (4:20-6:05PM, General Session)
Vectorizing Loops with VPlan – Current State and Next Steps
Tutorial: Writing Great Machine Schedulers

Session 6 (9:00-10:45AM, General Session)
Falcon: An optimizing Java JIT
Apple LLVM GPU Compiler: Embedded Dragons

Session 7 (10:00-10:45AM, Technical Track)
eval() in C++

Session 8 (11:10AM-12:40PM, General Session)
Implementing Swift Generics
The Further Benefits of Explicit Modularization: Modular Codegen

Session 9 (11:10AM-12:40PM, Technical Track)
Bringing link-time optimization to the embedded world: (Thin)LTO with Linker Scripts
lld: A Fast, Simple, and Portable Linker

Session 10 (2:10-3:40PM, General Session)
Adding Index‐While‐Building and Refactoring to Clang
Advancing Clangd: Bringing persisted indexing to Clang tooling

Session 11 (2:10-3:40PM, Technical Track)
Enabling Parallel Computing in Chapel with Clang and LLVM
Challenges when building an LLVM bitcode Obfuscator

Session 12 (4:40-6:25PM, General Session)
Building Your Product Around LLVM Releases
Tutorial: Head First into GlobalISel

Thanks,
Tanya

Hi Tanya,

I am interested in moderating the following sessions:

Session 4 (4:20-5:50PM, General Session)
The Type Sanitizer: Free Yourself from -fno-strict-aliasing
Structure-aware fuzzing for Clang and LLVM with libprotobuf-mutator

Session 9 (11:10AM-12:40PM, Technical Track)
Bringing link-time optimization to the embedded world: (Thin)LTO with Linker Scripts
lld: A Fast, Simple, and Portable Linker

Thanks,

Evgeny Astigeevich

Thank you for those who have volunteered. I am still in need of people for the following sessions:

Session 7 (10:00-10:45AM, Technical Track)
eval() in C++

Session 8 (11:10AM-12:40PM, General Session)
Implementing Swift Generics
The Further Benefits of Explicit Modularization: Modular Codegen

Session 10 (2:10-3:40PM, General Session)
Adding Index‐While‐Building and Refactoring to Clang
Advancing Clangd: Bringing persisted indexing to Clang tooling

Session 11 (2:10-3:40PM, Technical Track)
Enabling Parallel Computing in Chapel with Clang and LLVM
Challenges when building an LLVM bitcode Obfuscator

-Tanya

Hello together,

as the title states, I am looking for benchmarks which are particularly suitable for GPU accelerators (or at least make use of the #teams pragma).
I already tried rodinia benchmark suite, but they seem to be written for CPU acceleration only.
I would be very pleased if someone could provide me with one or more kernels which can be used with the NVPTX backend.
The purpose is a master thesis about using OpenCL and SPIR-V as OpenMP backend.

Thank you in advance and
kind regards,
Daniel

Maybe SpecAccel? SPEC ACCEL® but you’d need a SPEC license which costs money, and reporting rules are, justifiably, strict.

-- Jim

Jim Cownie <james.h.cownie@intel.com>
SSG/DPD/TCAR (Technical Computing, Analyzers, and Runtimes)
Tel: +44 117 9071438

Hi,

some (public) benchmarks that come to my mind:
  * LULESH: https://codesign.llnl.gov/lulesh.php
  * CloverLeaf: CloverLeaf
  * my own, a Conjugate Gradient solver: https://github.com/hahnjo/CGxx

As James wrote, SPEC ACCEL also has an OpenMP suite, but you need a license.

Regards
Jonas

https://github.com/ParRes/Kernels/tree/master/Cxx11 implements at least two kernels in OpenCL, GPU-oriented OpenMP4 (#pragma omp teams distribute parallel for simd collapse(2) schedule(static,1) as recommended by NVIDIA), RAJA, Kokkos, and numerous CPU implementations.

You’ll need to (1) tune the OpenCL code for your GPU as it is currently not optimized for any architecture and (2) tweak the RAJA and KOKKOS implementations to target GPU models in addition to CPU ones. Neither should be very difficult and the PRK team is happy to provide assistance.

That project contains one C++ CUDA and two Fortran OpenACC implementations but these are unfinished and need work. However, I’m very happy to see others contribute better versions. It may not be the easiest route to fame and fortune, but it’s not the worst either :joy:

Write me privately or create GitHub issues if you have questions. I am currently on leave from my day job but will be responsive via GMail and GitHub.

Sorry if this is a repost. I intended to reply early but didn’t and can’t find it if I did, but mistakes have been known to occur.

Jeff

Just 2 more slots to fill! Any volunteers?

Session 7 (10:00-10:45AM, Technical Track)
eval() in C++

Session 10 (2:10-3:40PM, General Session)
Adding Index‐While‐Building and Refactoring to Clang
Advancing Clangd: Bringing persisted indexing to Clang tooling

-Tanya

I just found http://aces.snu.ac.kr/software/snu-npb/ today, but have not tried it.

Jeff

Yes these SNU OMP 3.1 C codes are quite useful (and some of the authors of this software also attend SC and have a booth on the showfloor. I met one or 2 of the last year at SC).

On a related note, some of us (w/ NASA) in my group have just begun to create OpenMP 4.5 NPB codes (that did not make it to the SPEC HPG). We will keep you informed.

Thanks

Sunita

​**********************************​
Sunita Chandrasekaran
Asst. Prof. Computer and Information Sciences
Affiliated, Center for Bioinformatics and Computational Biology
430 Smith Hall, University of Delaware
p: 302-831-2714 e: schandra@udel.edu

​----------------------------------------
Adjunct Prof. Dept. of Computer Science

University of Houston, TX

Can anyone help with this session?

Session 10 (2:10-3:40PM, General Session)
Adding Index‐While‐Building and Refactoring to Clang

Advancing Clangd: Bringing persisted indexing to Clang tooling

Thanks,
Tanya

I can do it!

-Raphael

Thanks for all the help!

These benchmarks, especially lulesh, showed me that my approach of setting the address spaces manually according to their scope doesn’t work(*) and that I have to use the generic address space
like the nvptx backend does.

Now with this much more robust version, I decided to make my project public:

My clang fork is available at

and the openmp runtime fork at

and the necessary llvm fork (for generating SPIR-V) is from

All 3 of them might need a pull from upstream as they are not always synced.

libomptarget-spir needs an OpenCL runtime which supports SPIRV kernels (with OpenCL 2.1 headers).
Unfortunately, the Intel OpenCL runtime started to segfault with the change to generic address space.
Therefore, the only working OpenCL runtime I know of is AMDGPU-Pro.

The follwing pragmas should work for now:
#target (enter/exit data)
#teams
#distribute / parallel for
#master
#barrier

as well as the clauses:
map, shared, private, firstprivate, lastprivate, schedule

A device runtime is not planned for the moment (I also don’t know, how to compile and link OpenCL sources into libomptarget), but the generated code works independently from runtime functions.

For those more adventurous, you can (try to) build your sources with -fopenmp -fopenmp-targets=spir64-unknown-unknown
For C++, I recommend to add -fno-exceptions and -O0 as there seem to be optimizer passes enabled
which don’t work for spir.

I would be very thankful for some feedback (I hope, it won’t get too depressing).
Although I am not able to accept pull requests at the moment for legal purposes,
I would welcome any hint to make the implementation more robust and complete
as well as statements about (not) working programs.
(no comments on code style please, this can be fixed later °°)

Kind regards,
Daniel

(*) While this is a valid program snipped,

#pragma omp target map(to:a[0:n])
{ int * b = a; }

it doesn’t work if |a| is a pointer to addrSpace(1) (cl_global) and |b| gets allocated as pointer to cl_private.

Hi Daniel,

interesting work!

Two question:
  - With the latest commit, you perform a strncmp with "OpenCL 2.0 AMD". Does this mean that all other OpenCL implementations are effectively blocked out?
  - Is this the proprietary AMD OpenCL SDK or the "new" ROCm stack? https://rocm.github.io/

One remark: For __tgt_rtl_is_valid_binary: Does SPIR-V have its own machine id? That's how the CUDA plugin detects compatible binaries...

Cheers,
Jonas

Hi Jonas,

yes, other OpenCL runtimes are blocked out at the moment.
I would like to just test on "OpenCL 2.1", but that would block the AMD runtime. (I hope, this will change in future)
It is the proprietary AMD OpenCL SDK, part of the AMDGPU-Pro driver. ROCm states to only support OpenCL 1.2 runtime, but it might be worth a test.

Part of SPIRV-Tools is a validator, but this may be too much overhead. You mean checking the magic number? Good idea!

Thanks,
Daniel

“Unfortunately, the Intel OpenCL runtime started to segfault with the change to generic address space.”

Does this mean that it works (or at least worked) prior to this change? Lots of folks would love to be able to use OpenMP 4.5 with Intel OpenCL for the integrated GPUs in Intel processors.

Have you tried Apple OpenCL? That’s another very popular implementation where OpenMP 4.5 would be highly desirable.

Jeff

Sorry, Intel OpenCL runtime was a little imprecise.
I meant the Intel® SDK for OpenCL™ Applications 2017 for CPU only. (I just see there has been a new version)
For integrated GPUs, there is Beignet with, I think, OpenCL 2.0 support. I don’t know when OpenCL 2.1 will be supported,
but from then on, this should probably work. I don’t have an Apple Computer, but they are stuck with OpenCL 1.2 it seems.

Kind regards,

Daniel

If so, I have this approach to making legacy code run on PiM -

Wandering Threads - the easy way to go parallel

  • works best with some processor/hardware support.

The original target application was circuit simulation, but the code pattern for that is similar to neural-networks and database search.

Kev.