2022 LLVM Developers' Meeting Videos Posted

All of the videos for the 2022 LLVM Developers’ Meeting have been posted to the LLVM YouTube channel. Below is also a list of the videos that you can also find on the 2022 LLVM Dev Mtg website. Posters are coming soon.

Keynotes:

Paths towards unifying LLVM and MLIR
Speaker: Nicolai Hähnle
[ Video ] [ Slides ]

Where do you see the LLVM project 10 years from now? Intermediate representation (IR) plays a central role in this question. LLVM IR can be represented on top of MLIR’s data structures, but in practice it uses its own data structures. That creates a barrier in compilation pipelines and has other downsides. Is there hope for unification on a single set of data structures? How can we move
towards such a goal? Let me show you a framework for thinking about these questions and some concrete ideas for how we can move in the right direction.

Implementing Language Support for ABI-Stable Software Evolution in Swift and LLVM
Speaker: Doug Gregor
[ Video ] [ Slides ]

Unlike its peer languages, Swift has made the deliberate decision to embrace a stable Application Binary Interface (ABI) along with native code compilation, such that separately-compiled software modules can evolve independently without breaking binary compatibility. Come learn about the impact that a stable ABI has on the design of a programming language and its implementation in LLVM.

Technical Talks:

Implementing the Unimplementable: Bringing HLSL’s Standard Library into Clang
Speaker: Chris Bieneman
[ Video ] [ Slides ]

The HLSL programming language has a rich library of built in types that model semantics which can’t be written in HLSL. Clang’s implementation of HLSL leverages existing extensions and abstractions with a few tweaks here and there to implement the unimplementable datatypes in valid Clang ASTs.

Heterogeneous Debug Metadata in LLVM
Speaker: Scott Linder
[Video] [ Slides ]

An alternative debug information representation for LLVM is proposed, which removes classes of redundant representations of semantically equivalent expressions and makes expression evaluation context-free. These changes open the possibility of general support for heterogeneous architectures, as well as more aggressive optimizations.

Clang, Clang: Who’s there? WebAssembly!
Speaker: Paulo Matos
[Video] [ Slides ]

An introduction to the Reference Types and Garbage Collection proposal along with what already has been upstreamed and how we propose to integrate the trickier bits into Clang/LLVM.

MC/DC: Enabling easy-to-use safety-critical code coverage analysis with LLVM
Speaker: Alan Phipps
[Video] [ Slides ] [ PPT ]

Modified Condition/Decision Coverage (MC/DC) is a comprehensive code coverage criterion that is extremely useful in weeding out hidden bugs and guaranteeing robustness. MC/DC is very handy for average developers as well as those in the safety-critical embedded Industrial, Automotive, and Aviation markets where it is required. In this talk, I will show how we extended LLVM’s Source-based Code Coverage infrastructure to support MC/DC by tracking test vectors, which represent the sequential true/false evaluation of conditions through a boolean expression.

What does it take to run LLVM Buildbots?
Speaker: David Spickett
[ Video ] [ Slides ]

Many of us have broken a Buildbot at least once, but do you know what goes into running them? Why are there so many configurations and who are the people behind it all? Attend this talk to see behind the scenes of one of the largest providers of LLVM Buildbots.

llvm-gitbom: Building Software Artifact Dependency Graphs for Vulnerability Detection
Speaker: Bharathi Seshadri, Ed Warnicke
[Video] [ Slides ] [ PPT ]

What if we could know the complete and reproducible artifact tree for every binary executable, shared object, container, etc – including all its dependencies – and we could efficiently cross-reference that against a database of known vulnerabilities before deployment? If we had had that information, could we have remediated vulnerabilities such as Log4Shell faster? Might it even help open-source maintainers identify at-risk dependencies sooner? GitBOM is an open-source initiative to construct a verifiable Artifact Dependency Graph (ADG) and enable automatic, verifiable artifact resolution. In this talk, we will explain about GitBOM and demonstrate a use case on CVE detection using llvm-gitbom. Given a version of OpenSSL, we will show how we detect if this version has any vulnerabilities that are not fixed and what if any, have been fixed in that version.

CuPBoP: CUDA for Parallelized and Broad-range Processors
Speaker: Ruobing Han
[Video] [ Slides ]
We propose and build a framework that executes CUDA programs on non-NVIDIA devices without relying on any other programming languages. In particularly, compared with existing CUDA on CPU frameworks, our framework achieves the highest coverage and performance on X86, AArch64, and RISC-V.

Uniformity Analysis for Irreducible CFGs
Speaker: Sameer Sahasrabuddhe
[ Video ] [ Slides ]
We present a definition of thread convergence that is reasonable for targets that execute threads in groups (e.g., GPUs). This is accompanied by a definition of uniformity (i.e., when do different threads compute the same value), and a uniformity analysis that extends the existing divergence analysis to cover irreducible control-flow.

Using Content-Addressable Storage in Clang for Caching Computations and Eliminating Redundancy
Speaker: Steven Wu, Ben Langmuir
[Video] [ Slides ]

In this presentation, we introduce a Content-Addressable Storage (CAS) library for LLVM and use it to create a compilation caching system for Clang. We isolate the functional computations from filesystem and execution environment and model input discovery explicitly, caching computations based on explicit inputs from the CAS. We increase cache hits between related compiler invocations by caching fine-grained actions/requests that prune and canonicalize their inputs. We also explore modeling object file contents, such as debug information, as a CAS graph, in order to deduplicate and reduce the redundancy in the output format, thus reducing the storage cost for the cached compilation artifacts.

Direct GPU Compilation and Execution for Host Applications with OpenMP Parallelism
Speaker: Shilei Tian, Joseph Huber
[Video] [ Slides ]

In this talk, we will present a direct GPU compilation scheme that leverages the portable target offloading interface provided by LLVM/OpenMP Utilizing this infrastructure allows us to compile an existing host application for the GPU and execute it there with only a minimal wrapper layer for the user code, command line arguments, and a compiler provided GPU implementation of C/C++ standard library functions. The C/C++ library functions are partially implemented for direct device execution and otherwise fallback to remote procedure call (RPC) to call host functions transparently. Our proposed prototype will allow users to quickly compile for, and test on, the GPU without explicitly handling kernel launches, data mapping, or host-device synchronization. We will demonstrate our implementation using three proxy applications with host OpenMP parallelism and three microbenchmarks to test the correctness of our prototype GPU compilation.

Linker Code Size Optimization for Native Mobile Applications
Speaker: Gai Liu
[Video] [ Slides ]

Modern mobile applications have grown rapidly in binary size, which restricts user growth and updates for existing users. Thus, reducing the binary size is important for application developers.

In this paper, we propose several novel optimization techniques that do not require significant customization to the build pipeline and reduce binary size with low build time overhead. As opposed to re-invoking the compiler during link time, we perform true linker optimization directly as optimization passes within the linker. The proposed optimizations are generic and could be incorporated into popular linkers as optimization passes.

Minotaur: A SIMD Oriented Superoptimizer
Speaker: Zhengyang Liu
[Video] [ Slides ]

Minotaur is a synthesis-based superoptimizer for the LLVM intermediate representation, that focuses on optimizing LLVM’s portable vector operations as well as intrinsics specific to the Intel AVX extensions. The goal is to automatically discover transformation rules that are missing from LLVM, which are challenging due to the large number of intrinsics, their semantic complexity, and their counterintuitive costs. Minotaur has found many new transformations for vectors instruction. We have evaluated Minotaur on various micro-benchmarks and real-world applications such as GNU MP library. The micro-benchmarks optimized by Minotaur show speedups up to 1.4x, and the real-world applications show speedups up to 1.06x.

ML-based Hardware Cost Model for High-Level MLIR
Speaker: Dibyendu Das
[ Video ] [ Slides ]

Compilers often need to make estimates of hardware characteristics during early optimization passes, which are available only later such as execution unit utilization, number of register spills, latency, throughput etc. Often a hand-written static/analytical hardware cost model is built into the compiler, for example, LLVM’s TTI. However, the need for more sophisticated and varied predictions has become more pronounced with the development of deep learning compilers which need to optimize dataflow graphs. Such compilers usually employ a much higher level MLIR form as an IR representation. A static/analytical cost model is cumbersome and error prone for such systems. We develop a machine learning-based cost model for high-level MLIR which can predict different target variables of interest such as CPU/GPU/xPU utilization, instructions executed, register usage etc. by considering the incoming MLIR as a text input a la NLP models. The learnt model can be deployed to estimate the target variable of interest for a new computation graph. We report early work in progress results of developing such a model and show that these models can provide reasonably good estimates with low error bounds.

VAST: MLIR for program analysis of C/C++
Speaker: Henrich Lauko
[Video] [ Slides ]

Program analysis has specific requirements for compiler toolchains that are usually unsatisfied. Ideally, an analysis tool would pick the best-fit representation that preserves interesting semantic features. Such a representation would know the precise relationships between low-level constructs in IR and the analyzed source code. LLVM IR is rarely the best fit representation for program analysis. In this talk, we will look at how we can improve the situation using an MLIR infrastructure called VAST. VAST is an MLIR library for multi-level C/C++ representation. With VAST, an analysis does not need to commit to a single best fit. Instead, an analysis can have simultaneous visibility into multiple progressions of the code, from very high-level down to very low-level.

MLIR for Functional Programming
Speaker: Siddharth Bhat
[ Video ] [ Slides ]

In this talk, we discuss the implementation, upstreaming, and community concerns of adoption LLVM and MLIR within the Lean4 proof assistant, and more broadly, discuss takeaways for MLIR to have strong support for functional programming languages. We walk through the process of creating a new MLIR-based backend for Lean4, a dependently typed programming language. We demonstrate our MLIR dialect ([2201.07272] Lambda the Ultimate SSA: Optimizing Functional Programs in SSA) which encode core functional programming concepts within the SSA style. However, having a fully functional backend is not enough; We discuss the worries around MLIR adoption in the Lean4 community, and the discussions that led to Lean4 choosing to adopt LLVM for the time being. We discuss our current LLVM backend effort for Lean4 (https://github.com/leanprover/lean4/pull/1497), and end with a discussion of how the MLIR community could help with the adoption of MLIR for functional programming languages.

SPIR-V Backend in LLVM: Upstream and Beyond
Speaker: Michal Paszkowski, Alex Bezzubikov
[Video] [ Slides ]

SPIR-V is a binary intermediate language commonly used for GPU computations and targeted by many projects (including OpenCL, OpenMP and SYCL). In this talk, we will discuss what it took to upstream SPIR-V GlobalISel-based backend, present some of the issues stemming from the high-level design of the language, and explain the steps required to maintain the target in-tree. We will also talk briefly about the extensibility, support for other APIs/SPIR-V flavors (e.g. Vulkan), and the ongoing effort to unify methods of lowering builtins across GPU targets.

IRDL: A Dialect for dialects
Speaker: Mathieu Fehr, Théo Degioanni
[Video] [ Slides ]

We present IRDL, a dialect for representing IR definitions. IRDL lets users define dialects in a declarative style, allowing for a dynamic registration of dialects using dynamic dialects, which were recently introduced in MLIR. Additionally, we will present two lower-lever dialects, IRDL-SSA and IRDL-Eval, and their respective lowerings, which enable interesting optimizations on the operation verifiers, which ODS does not currently handle. We hope that with IRDL, we will simplify the generation of dialects through metaprogramming, or external languages, like Python.

Automated translation validation for an LLVM backend
Speaker: Nader Boushehrinejad Moradi
[Video] [ Slides ]

We developed an automated bug-finding tool for LLVM’s AArch64 backend. Our prototype, ARM-TV, builds on Alive2, a bounded translation validator for LLVM’s optimization passes. Using ARM-TV, we have discovered and reported 17 new miscompilation bugs in the SelectionDAG and GlobalISel backends, most of which have been fixed. In this talk, we will describe the current state of our prototype and our plans for enhancing the tool.

llvm-dialects: bringing dialects to the LLVM IR substrate
Speaker: Nicolai Hähnle
[Video] [ Slides ]

Is your compiler stack built on LLVM and you’re eyeing some of the goodness provided by MLIR, but can’t justify rewriting your stack? Then we may have just the project for you! llvm-dialects is an add-on to LLVM that allows you to define dialects and gradually transition to their use within a compiler stack built on LLVM IR.

YARPGen: A Compiler Fuzzer for Loop Optimizations and Data-Parallel Languages
Speaker: Vsevolod Livinskii
[Video] [ Slides ] [ PPT ]

YARPGen is a generative (as opposed to mutation-based) compiler fuzzer that we developed. It previously focused on testing scalar optimizations, but after a recent substantial upgrade, it now supports a collection of strategies for stress-testing loop optimizations. To achieve this, we ensure that its tests contain optimization prerequisites and interesting data access patterns (e.g., stencils), which are necessary to trigger loop optimizations (e.g., GVN). YARPGen’s internal intermediate representation allows us to lower generated tests to C, C++, DPC++, and ISPC. Along with an automated testing system, this new version of YARPGen has discovered more than 120 bugs in compilers such as Clang, GCC, the ISPC compiler, and the DPC++ compiler, in addition to finding a comparable number of bugs in proprietary compilers.

RISC-V Sign Extension Optimizations
Speaker: Craig Topper
[Video] [ Slides ]

The 64-bit RISC-V target is the only one in-tree that does not have 32-bit sub-registers or i32 as a legal type. Many instructions have forms that sign extend their result by copying bit 31 into bits 63:32. Only loads are able to implicitly zero bits 63:32. Some instructions such as comparisons only operate on all 64 bits and require smaller values to be extended. The ABI also requires 32-bit arguments and return values to be sign extended to 64 bits. Making good use of the implicit sign extending instructions is important to generate optimal assembly from C code where 32-bit integers are prevalent.

This talk will discuss how this differs from other 64-bit targets, how single basic block SelectionDAG makes this difficult, and how optimizations that are good for other 64-bit targets may be harmful for RISC-V. It will cover the optimizations and custom passes that have been added to improve the generated code and ideas for future enhancements.

Execution Domain Transition: Binary and LLVM IR can run in conjunction
Speaker: Jaeyong Ko, Sangrok Lee
[Video] [ Slides ]

In this talk, we will start by showing the multi-CPU architectural IoT malware. And why it is challenging to analyze such IoT malware from the perspective of static and dynamic analysis. Then we will talk about how it was possible to do cross-architectural malware analysis through the LLVM interpreter by lifting it to LLVM IR. Next, we will explain a problem that could be a significant hurdle in being a practical analysis tool: slow execution, and how we resolved this problem by inventing execution domain transition. Finally, we will end our talk with a demo of our work.

Tutorials:

Using LLVM’s libc
Speaker: Sivachandra Reddy, Michael Jones, Tue Ly
[Video] [ Slides ]

LLVM’s libc is a sanitizer friendly green field libc which will eventually serve as a full drop-in-replacement for the system libc. While it is not yet ready to be a drop-in-replacement, it has enough functionality that one can start using it in their projects and avail themselves of its benefits in production contexts. In this tutorial, we will talk about how we have used modern C++ to implement a sanitizer instrumentable libc which can be easily decomposed and custom tuned. We will also talk about how it is being used in production contexts at Google. There has been a lot of interest in the LLVM community in putting together an LLVM only toolchain. We will demonstrate how one can build and package the libc in order to put together such a toolchain and use it in their projects.

JITLink: Native Windows JITing in LLVM
Speaker: Sunho Kim
[Video] [ Slides ]

JITLink is a new JIT linker in LLVM developed to eliminate limitations in LLVM’s JIT implementation. With JITLink, it is not required to use special compilation flags or workarounds to load code into the JIT, since most of the object file features including small code model and thread local storage are fully implemented. This tutorial will explain how to use JITLink by working on a windows JIT application that just-in-time links to third-party static libraries. The tutorial will also dig into internals of JITLink by working on a JITLink plugin managing SEH exception tables.

Panels:

Machine Learning Guided Optimizations (MLGO) in LLVM
Speakers: Johannes Doerfert (moderator), Petr Hosek, Chris Cummins, Aiden Grossman, Mircea Trofin, Zoom: Yundi Qian, Ondrej Sykora, Dibyendu Das, Amir Ashouri, Mostafa Elhoushi, S. VenkataKeerthy
[Video] [ Slides ]
The panel brings together: compiler engineers working on ML-guided optimizations in LLVM, product engineers applying such optimizations in production, and researchers exploring the area.

Panel discussion on “Best practices with toolchain release and maintenance”
Speakers: Aditya Kumar, Petr Hosek , Jeremy Stenglein , Han Zhu
[ Video ]
With the proliferation of vendors shipping custom llvm toolchain, it would be great to bring in toolchain distributors and share each other’s experience. We’ll focus the discussion on:

  • Integration testing
  • Keeping compatibility with GNU toolchain
  • Challenges of keeping up with upstream
  • Changes in upstream llvm-project that will help

Static Analysis in Clang
Speakers: Gabor Horvath, Bruno Cardoso Lopes, Artem Dergachev, Yitzhak Mandelbaum, Dmitri Gribenko
[ Video ]

The Clang ecosystem has multiple static analysis tools. The compiler can produce easy to understand error and warning messages. The Clang Static Analyzer (CSA) is capable of finding bugs that span across multiple function calls using symbolic execution. Clang Tidy can help modernize large code bases using automatic code rewrites. While there are some out of tree Clang-based static analysis tools, CSA and Clang Tidy were the go-to solutions for the static analysis needs of the community. However, during the last year, a couple of RFCs surfaced on the mailing list to add a dataflow analysis framework to Clang and introduce a MLIR based new IR. Come and join this panel discussion to learn how to get involved in the ongoing static analysis projects, what the new proposals mean for our loved and proven tools, and what does the future holds for static analysis in Clang. You will have the opportunity to ask questions from some of the code owners of these tools, and authors of the new proposals.

High-level IRs for a C/C++ Optimizing Compiler
Speakers: Bruno Lopes, Alex Zinenko, Ivan Baev , Johannes Doerfert, Chris Lattner, Mehdi Amini
[ Video ]

Most C/C++ optimizing compilers employ multiple intermediate representations (IRs). LLVM IR has been the cornerstone of C/C++ LLVM-based compilers for many years. However, optimizations involving loop nests, data layout, or multidimensional arrays, for example, challenge the existing LLVM infrastructure.

The panelists will discuss higher-level (HL) IRs for optimizing compilers, primarily from C/C++ and optimization/analysis perspective. We will ask our expert panel to share their experience and insights on:

  • What optimizations are easier to implement and maintain with HL IR?
  • Must-have and good-to-have features in HL IR for optimizing compilers
  • Agreement on MLIR as HL IR for C/C++ optimizing compilers?
  • Other motivations for HL IR (in addition to run-time performance) - e.g. security, debuggability?
  • Promising HL IR initiatives for C/C++ compilers

Both experts and newcomers are welcome to attend. Send questions to the organizers prior to the conference to allow consideration.

Quick Talks:

LLVM Education Initiative
Speakers: Chris Bieneman, Mike Edwards, Kit Barton
[ Video ] [ Slides ]
Interested in expanding the LLVM community through education? Interested in better documentation, tutorials, and examples? Interested in sharing your knowledge to help other engineers grow? Come learn about the proposal for a new LLVM Education working group!

Enabling AArch64 Instrumentation Support In BOLT
Speakers: Elvina Yakubova
[ Video ] [ Slides ]
BOLT is a post-link optimizer, built on top of the LLVM. It achieves performance improvement by optimizing application’s code layout based on execution profile gathered by a sampling profiler, such as Linux perf tool. In case when necessary advanced hardware counters for precise profiling are not available on some target platforms, one may collect profile by instrumenting binary. In this talk, we will cover changes essential for enabling instrumentation support in BOLT for a new target platform using AArch64 as an example.

Approximating at Scale: How strto in LLVM’s libc is faster
Speaker: Michael Jones
[ Video] [ Slides ]
The string to float conversion functions are deceptively simple. You pass them a string of digits, and they return the floating point value closest to that string. The process of finding that value as quickly as possible is very complex, and in this talk I will describe how the implementation in LLVM’s libc works. The focus will be mainly on the three conversion algorithms used, specifically W.D Clinger’s Fast Path, the Eisel-Lemire fast_float algorithm, and Nigel Tao’s Simple Decimal Conversion. I will explain the overview of how they work and how they fit together to create a complete strto implementation. Finally, I’ll demonstrate how this makes it faster than existing libc implementations, specifically about 15% faster than glibc.

MIR support in llvm-reduce
Speaker: Matthew Arsenault
[ Video ] [ Slides ] [ PPT ]
Bugpoint has long existed to assist in reducing LLVM IR testcases, but lacked an equivalent tool for reducing code generation passes. Recently llvm-reduce gained support for reducing MIR. This talk will discuss the current status and future improvements, difficulties MIR presents compared to the higher level IR, and my experience using it to reduce register allocation failures in large test cases.

Interactive Crashlogs in LLDB
Speaker: Med Ismail Bennani
[ Video ] [ Slides ]
While we’d all prefer if programs never crashed, the logs captured from those crashes can help troubleshoot bugs and get your program up and running again. At Apple, diagnostic data gets captured into a crash report: a detailed textual representation of the program’s state when it crashed. Thanks to the addition of interactive crashlogs, developers can now load crash reports into LLDB and interact with them like a regular lldb session, using all the techniques they’re already familiar with to debug the issue.

clang-extract-api: Clang support for API information generation in JSON
Speaker: Zixu Wang
[ Video ] [ Slides ]
This talk introduces clang-extract-api, a new tool to collect and serialize API information from header files, that enables downstream tooling, like documentation generation, to inspect API symbols without having to understand the clang AST.

Using modern CPU instructions to improve LLVM’s libc math library.
Speaker: Tue Ly
[ Video ] [ Slides ]
LLVM libc’s math routines aim to be both performant and correctly rounded according to the IEEE 754 standard. Modern CPU instruction sets include many useful instructions for mathematical computations. Effectively utilize these instructions could boost the performance of your math functions’ implementations significantly. In this talk, we will discuss about how 2 families of such instructions, fused-multiply-add (FMA) and floating point rounding, are used in LLVM’s libc for x86-64 and ARMv8 architectures allowing us to have comparable performance to glibc while achieving accuracy for all rounding modes.

Challenges Of Enabling Golang Binaries Optimization By BOLT
Speaker: Vasily Leonenko, Vladislav Khmelevskyi
[ Video ] [ Slides ]
Golang is a very specific language, which compiles to an architecture-specific binary, but also uses its own runtime library, which in turn uses a version-specific data structures to support internal mechanisms like garbage collection, scheduling, reflection and others. BOLT is a post-link optimizer – it rearranges code and data locations in the output binary, so Golang-specific tables should also be updated according to performed modifications. In this talk, we will cover the status of current implementation of Golang support in BOLT, achieved optimization effect and challenges of enabling Golang binaries optimization by BOLT.

Inlining for Size
Speaker: Kyungwoo Lee
[ Video ] [ Slides ]

Inlining for size is critical in mobile apps as app size continues to grow.
While a link-time optimization (LTO) largely minimizes the app size at minimum size optimization (-Oz), a scalable link-time optimization (ThinLTO) misses many inline opportunities because each module inliner works independently without modeling the size cost globally.
We first show how to use the ModuleInliner with LTO.
Then, we describe how to improve inlining with ThinLTO by extending the bitcode summary, followed by a global inline analysis.
We also explain how to overcome import restrictions, often appearing in Objective-C or Swift, by pre-merging bitcode modules.
We reduced the code size by 2.8% for SocialApp, 4.0% for ChatApp, and 3.0% for Clang, compared to -Oz with ThinLTO.

Automatic indirect memory access instructions generation for pointer chasing patterns
Speaker: Przemysław Ossowski
[ Video ] [ Slides ] [ PPT ]
This short talk provides an example how newly introduced feature into real HW can be adopted into Clang and LLVM and thanks to it easily available for the user. Indirect Memory Access Instructions (IMAI) can provide significant performance improvement but its usability is limited with particular HW restrictions. This talk will present how we tried to reconcile HW limitations, complexity of IMAI and ease of use by handling dedicated pragma in Clang and applying Complex Patterns in DAG in LLVM Backend.

Link-Time Attributes for LTO: Incorporating linker knowledge into the LTO recompile
Speaker: Todd Snider
[ Video ] [ Slides ]

Embedded-application systems have limited memory, so user control over placement of functions and variables is important. The programmer uses a linker script to define a memory configuration and specify placement constraints on input sections that contain function and variable definitions. With LTO enabled, it is critical that the compiler incorporate link-time placement information into the LTO recompile (Edler von Koch - LLVM 2017). This talk discusses a compiler and linker implementation that roughly follows the ideas presented in Edler von Koch, highlighting differences in our implementation that offer significant advantages.

Expecting the expected: Honoring user branch hints for code placement optimizations
Speaker: Stan Kvasov, Vince Del Vecchio
[ Video] [ Slides ]

LLVM’s __builtin_expect, and a variant we recently added, __builtin_expect_with_probability, allow source code control over branch weights and can boost performance with or without PGO via hot/cold splitting. But in LLVM optimization, it’s not always intuitive how to update branch weight metadata with control flow changes. We talk about recent issues with losing branch weights in SimplifyCFG and possible improvements to the infrastructure for maintaining branch weights.

CUDA-OMP — Or, Breaking the Vendor Lock
Speaker: Joseph Huber, Johannes Doerfert
[ Video] [ Slides ]
In this talk we show that performance portability and interoperability are achievable goals even for existing (HPC) software. Through compiler and runtime augmentation, we can run off-the-shelf CUDA programs efficiently on AMD GPUs and further debug them on the host, all without modifications of the original source code.
As a side-effect, a modern LLVM/Clang will provide a compilation environment in which CUDA and OpenMP offload are fully interoperable, allowing the use of both in the same project, even the same kernel, without intrinsic overheads.

Thoughts on GPUs as First-Class Citizens
Speaker: Johannes Doerfert
[ Video] [ Slides ]

In this short talk we will ramble about some of the discrepancies between GPU and CPU targets as well as the accompanying infrastructure. While we briefly mention ongoing efforts to rectify some of the problems, we’ll mainly focus on the areas where solutions are sparse and efforts are required.

Building an End-to-End Toolchain for Fully Homomorphic Encryption with MLIR
Speaker: Alexander Viand
[Video] [ Slides ]

Fully Homomorphic Encryption (FHE) allows a third party to perform arbitrary computations on encrypted data, learning neither the inputs nor the computation results. However, the complexity of developing an efficient FHE application currently limits deploying FHE in practice. In this talk, we will first present the underlying challenges of FHE development that motivate the development of tools and compilers. We then discuss how MLIR has been used by three different efforts, including one led by us, to significantly advance the state of the art in FHE tooling. While MLIR has brought great benefits to the FHE community, we also want to highlight some of the challenges experienced when introducing the framework to a new domain. Finally, we conclude by discussing how the ongoing efforts could be combined and unified before potentially being up-streamed.

Lightning Talks:

LLVM Office Hours: addressing LLVM engagement and contribution barriers
Speaker: Kristof Beyls
[ Video ] [ Slides ]

As part of registering for the 2021 LLVM dev meeting, participants were asked to answer a few questions about how the LLVM community could increase engagement and contributions. Out of the 450 people replying, the top 3 issues mentioned were “sometimes people aren’t receiving detailed enough feedback on their proposals”; “people are worried to come across as an idiot when asking a question on the mailing list/on record”; “People cannot find where to start; where to find documentation; etc.” These were discussed in the community.o workshop at the 2021 LLVM dev meeting, and a summary of that discussion was presented by Adelina Chalmers as a keynote session, see 2021 LLVM Dev Mtg "Deconstructing the Myth: Only real coders contribute to LLVM!? - Takeaways” One of the solutions suggested to help address those top identified barriers from the majority of participants is introducing the concept of “office hours”. We have taken steps since then to make “office hours” a reality. In this lightning talk, I will talk about what issues “office hours” is aiming to address; how both newbies and experienced contributors can get a lot of value out of them; and where we are in implementing this concept and how you can help for them to be as effective as possible.

Improved Fuzzing of Backend Code Generation in LLVM
Speaker: Peter Rong
[ Video] [ Slides ]

Fuzzing has been a effective method to test software’s. However, even with libFuzzer, LLVM backend is not sufficiently fuzzed nowadays. The difficulties are two fold. First, we lack a better way to monitor program behavior, edge coverage is not effective when backend heavily rely on target descriptor where data flow is more important than control flow. Second, mutation method is naive and ineffective. We design a new tool to better fuzz LLVM backend and we have found numerous missing features inside AMD. We also found many bugs in LLVM upstream, eight of which have been confirmed, 2 of which are fixed.

Interactive Programming for LLVM TableGen
Speaker: David Spickett
[ Video] [ Slides ]
Interactive programming with Jupyter is a game changer for learning. The ability to have your code and documentation in one place, always up to date and extendable. See how this is being applied to a core part of LLVM, TableGen, and why we should embrace the concept.

Efficient JIT-based remote execution
Speaker: Anubhab Ghosh
[ Video ] [ Slides ]

In this talk we demonstrate a shared memory implementation and its performance improvements for most use cases of JITLink. We demonstrate the benefits of a separate executor process on top of the same underlying physical memory. We elaborate how this work will be useful to larger projects such as clang-repl and Cling.

FFTc: An MLIR Dialect for Developing HPC Fast Fourier Transform Libraries
Speaker: Yifei He
[ Video ] [ Slides ]

Discrete Fourier Transform (DFT) libraries are one of the most critical software components for scientific computing. Inspired by FFTW, a widely used library for DFT HPC calculations, we apply compiler technologies for the development of HPC Fourier transform libraries. In this work, we introduce FFTc, a domain-specific language, based on Multi-Level Intermediate Representation (MLIR), for expressing Fourier Transform algorithms. FFTc is composed of: A domain-specific abstraction level (FFT MLIR dialect), a domain-specific compilation pipeline, and a domain-specific runtime (working in progress). We present the initial design, implementation, and preliminary results of FFTc.

Recovering from Errors in Clang-Repl and Code Undo
Speaker: Purva Chaudhari, Jun Zhang
[ Video ] [ Slides ]

In this talk we outline the PTU-based error recovery capability implemented in Clang and available in Clang-Repl. We explain the challenges in error recovery of templated code. We demonstrate how to extend the error recovery facility to implement restoring the Clang infrastructure to a previous state. We demonstrate the undo command available in
Clang-Repl and the changes required for its reliability.

10 commits towards GlobalISel for PowerPC
Speaker: Kai Nacke, Amy Kwan, Nemanja Ivanovic
[ Video] [ Slides ]

We share our experiences with the first steps to implement GlobalISel for the PowerPC target.
Nonstandard reductions with SPRAY
Speaker: Jan Hueckelheim
[ Video ] [ Slides ]

We present a framework that allows non-standard floating point reductions in OpenMP, for example to ensure reproducibility, compute roundoff estimates, or exploit sparsity in array reductions.

Type Resugaring in Clang for Better Diagnostics and Beyond
Speaker: Matheus Izvekov
[ Video ] [ Slides ]

In this presentation, we talk about the effort to implement type resugaring in Clang.
This is an economical way to solve, for the majority of cases, diagnostic issues related to the canonicalization of template arguments during instantiation. The infamous ‘std::basic_string’ appearing on the diagnostics when the user wrote ‘std::string’ is the classic example."
Swift Bindings for LLVM
Speaker: Egor Zhdan
[ Video ] [ Slides ]
Using LLVM APIs from a different language than C++ has often been necessary to develop compilers and program analysis tools. However, LLVM headers rely on many C++ features, and most languages do not provide interoperability with C++. As part of the ongoing Swift/C++ interoperability effort, we have been creating Swift bindings for LLVM APIs that feel convenient and natural in Swift, with the purpose of using the bindings to implement parts of the Swift compiler in Swift. In this talk, I will present our current status and what we were able to accomplish so far.

Min-sized Function Coverage with IRPGO
Speaker: Ellis Hoag, Kyungwoo Lee
[ Video ] [ Slides ]

IRPGO has a mode to collect function entry coverage, which can be used for dead code detection. When combined with Lightweight Instrumentation, the binary size and performance overhead should be small enough to be used in a production setting. Unfortunately, when building an instrumented binary with -Oz, the “.text” size overhead is much larger than what we’d expect from the injected instrumentation instructions alone. In fact, even if we block instrumentation for all functions we still get a 15% “.text” size overhead from extra passes added by IRPGO. This talk explores the flags we can use to create a function entry coverage instrumented binary with a “.text” size overhead of 4% or smaller.

High-Performance GPU-to-CPU Transpilation and Optimization via High-Level Parallel Constructs in Polygeist/MLIR
Speaker: Ivan R. Ivanov
[ Video ] [ Slides ]

We extent Polygeist/MLIR to succinctly representation, optimize, and transpile CPU and GPU parallel programs. Through the use of our new operations (e.g. memory effects-based barrier) and transformations, we can successfully transpile GPU Rodinia and PyTorch benchmarks to efficiently run on the CPU faster than their existing CPU parallel versions.

Tools for checking and writing non-trivial DWARF programs
Speaker: Chris Jackson
[ Video ] [ Slides ]

DWARF expressions describe how to recover the location or value of a variable which has been optimized away. They are expressed in terms of postfix operations that operate on a stack machine. A DWARF program is encoded as a stream of operations, each consisting of an opcode followed by a variable number of literal operands. Some DWARF programs are difficult to interpret and check for correctness in their assembly-language format. Currently, checking a DWARF expression requires the building of an executable with debuginfo and running the executable in a debugger, such as LLDB. We propose and have begun a fun project to construct a small suite of tools to aid in construction and checking of non-trivial DWARF programs.

Analysis of RISC-V Vector Performance Using MCA Tools
Speaker: Michael Maitland
[ Video ] [ Slides ]

The llvm-mca tool performs static performance analysis on basic blocks and llvm-mcad tool performs dynamic performance analysis on program traces. These tools allow us to gain insights on how sequences of instructions run on different subtargets.

In this talk, I will discuss the shortcomings of these tools when they are tasked to report on RISC-V programs containing vector instructions, how we have extended these tools to generate more accurate reports for RISC-V vector programs, and how these improved reports can be used to make meaningful improvements to scheduler models and assist performance analysis.

Optimizing Clang with BOLT using CMake
Speaker: Amir Ayupov
[ Video ] [ Slides ]

Advanced build configuration with BOLT for faster Clang

Exploring OpenMP target offloading for the GraphCore architecture
Speaker: Jose M Monsalve Daiz
[ Video ] [ Slides ]

GraphCore is a mature and well documented architecture that features a MIMD execution model. Different to the other players in the market, GraphCore systems are currently available, its compiler infrastructure is based on LLVM, and it allows direct compilation to the device. Furthermore, the Poplar SDK is a C++ library that can be directly used with the current OpenMP Offloading Runtime (i.e. libomptarget). In this short presentation, we describe the strategy we are currently using to explore compilation of OpenMP Offloading support for the GraphCore architecture.

Student Technical Talks:

Merging Similar Control-Flow Regions in LLVM for Performance and Code Size Benefits
Speaker: Charitha Saumya
[ Video ] [ Slides ] [ PPT ]

In this talk, we will discuss about Control-flow Melding (CFM) and its implementation in LLVM. CFM is a new compiler transformation that exploits both instruction and control-flow similarity to improve performance and reduce code size. CFM uses a hierarchical region and instruction alignment approach to merge common code fragments. CFM is implemented as an LLVM-IR transformation pass and our evaluation suggests its utility in multiple applications.

Alive-mutate: a fuzzer that cooperates with Alive2 to find LLVM bugs
Speaker: Yuyou Fan
[ Video ] [ Slides ]
We developed a new fuzzer, Alive-mutate, that randomly alters an LLVM module and then invokes the Alive2 translation validation tool to see if the mutated module is optimized correctly. Alive-mutate achieves high throughput by avoiding the creation of invalid IR and also by running in the same address space as Alive2, keeping OS-related overhead out of our fuzzing loop. We support 9 different kinds of mutation and have used Alive-mutate to find 23 LLVM bugs including 10 miscompilation bugs in the AArch64 backend and 5 crashes in the instruction combiner.

Enabling Transformers to Understand Low-Level Programs
Speaker: William S. Moses, Zifan Guo
[ Video ] [ Slides ]
This talk explores the application of Transformers to learning LLVM, which can open up new possibilities in optimization. Low-level programs like LLVM tend to be more verbose than high-level languages to precisely specify program behavior and provide more details about microarchitecture, all of which make it difficult for machine learning. We apply Transformer models to translate from C to both unoptimized (-O0) and optimized (-O1) LLVM IR and discuss various techniques that can boost model effectiveness. On the AnghaBench dataset, our model achieves a 49.57% verbatim match and BLEU score of 87.68 against Clang -O0 and 38.73% verbatim match and BLEU score of 77.03 against Clang -O1.

LAGrad: Leveraging the MLIR Ecosystem for Efficient Differentiable Programming
Speaker: Mai Jacob Peng
[ Video ] [ Slides ]
Automatic differentiation (AD) is a central algorithm in machine learning and optimization. This talk introduces LAGrad, a reverse-mode source-to-source AD system that differentiates tensor operations in the linalg, scf, and tensor dialects of MLIR. LAGrad leverages the value semantics of linalg-on-tensors in MLIR to simplify the analyses required to generate adjoint code that is efficient in terms of both run time and memory consumption. LAGrad also combines AD with MLIR’s type system to exploit structured sparsity patterns such as lower triangular tensors. We compare performance results to Enzyme, a state of the art AD system, on Microsoft’s ADBench suite. Our results show speedups of up to 2x relative to Enzyme and in some cases use 30x less memory.

Posters:

Removal of Undef: Move Uninitialized Memory to Poison
[ Poster ]

LLVM is working toward the eventual replacement of undef with poison. One of the issues remaining is load semantics with uninitialized memory. During GSoC 2022 we have been investigating and implementing potential solutions to these issues. This poster will outline changes that have been proposed to clang to resolve bit-fields, freeze poison insertion for optimization, and overall performance impacts of these changes.

Optimizing Julia’s ORC JIT
[ Poster ]

Julia uses LLVM’s ORC JIT compilation framework to compile code just before it is executed. Although ORC provides its own LLJIT and LLLazyJIT engines, Julia does not use either of those, opting for a custom JIT stack for more control over code execution and memory management. Julia’s ORC JIT compiler cannot compile code in parallel, and this talk will cover some of the reasons that make parallelization challenging as well as work that is being done to solve those challenges. This talk will cover a general overview of how the Julia compilation pipeline works as well as the improvements to the compiler that will enable multithreaded compilation.

Clacc 2022: An Update on OpenACC Support for Clang and LLVM
[ Poster ]

Clacc is developing production-quality OpenACC compiler, runtime, and profiling support by extending Clang and LLVM for the Exascale Computing Project. A key Clacc design decision is to translate OpenACC to OpenMP in order to leverage the OpenMP offloading support that is actively being developed for Clang and LLVM. As our Clacc implementation matures, we continue to contribute mutually beneficial changes upstream, including improvements to LLVM’s testing infrastructure, Clang, standard OpenMP support, and OpenMP extensions that Clacc requires for OpenACC support. This activity not only paves the way for fully contributing Clacc’s OpenACC support to upstream Clang, but it also benefits Flang’s OpenACC support for Fortran via a shared runtime implementation. The purpose of this talk is to provide an update on recent Clacc progress, to present the plan for the road ahead, and to invite research and development participation from others.

An LLVM-Based Compiler for Quantum-Classical Applications
[ Poster ]

In this work, we design a quantum language extension to C++ and create a quantum compiler based on the LLVM framework. This compiler is integrated into Intel® Quantum SDK. It is a software package which allows users to interface with the Intel’s quantum computing stack. We utilize the LLVM Pass infrastructure to define custom transformation passes that perform IR to IR translation for the quantum program, as well as leverage LLVM’s built-in passes. These passes and tools are used to extract, manipulate, and decompose the quantum logic. The quantum runtime library is capable of handling dynamic parameters for quantum computing. A quantum-classical variational algorithm can be specified in a single-source code and only needs to be compiled once for all iterations.

Specializing Code to New Architectures via Dynamic Adaptive Recompilation
[ Poster ]

Modern microprocessors introduce performance-improving features that can only be accessed through novel instructions. Furthermore, new compiler releases are extended not only to make use of these new instructions but also to generate better schedules for the new hardware. Therefore, applications distributed as binaries need to be recompiled to tap into such features and improvements. However, applications are generally distributed with binaries compiled with a subset of instructions common to a family of processors. Such practice is widely adopted by Independent Software Vendors (ISVs) as a strategy to reduce the costs of deploying a large number of binaries. This talk presents DAR, a dynamic adaptive recompilation approach that recompiles segments of a program for target-specific specialization. DAR aims at allowing end-users to benefit from sub-target specialization and improvements in the compiler technology — that might have emerged after the binary was created —, while still letting ISVs release just a couple binaries for each architecture. DAR generates fat binaries with the IR of program segments that can benefit recompilation. The talk aims at discussing methods to identify program segments that might benefit from recompilation and share our experience while implementing a prototype of DAR in LLVM.

LLFPTrax: Tracking ill-conditioned floating-point inputs using relative error amplification in LLVM
[ Poster ]

Floating-point errors occur due to the inherent rounding that takes place due to the limited precision offered by the floating-point format. Many state-of-art tools use a high-precision oracle to estimate the absolute error and find high-error triggering (ill-conditioned) inputs through optimization techniques that help maximize the error. By switching to relative errors, we obtain a more meaningful metric when searching for ill-conditioned inputs, as they directly relate to the precision loss. This talk presents the details of LLFPTrax’s implementation and preliminary results.

LLVM continuous upstream integration and testing
[ Poster ]

This poster will describe the automation tools used at Qualcomm to maintain and test a downstream version of the llvm-project.

Automatic indirect memory access instructions generation for pointer chasing patterns
[ Poster ]

Poster is describing how newly introduced feature into real HW can be adopted into Clang and LLVM and thanks to it easily available for the user. Indirect Memory Access Instructions (IMAI) can provide significant performance improvement but its usability is limited with particular HW restrictions. We tried to reconcile HW limitations, complexity of IMAI and ease of use by handling dedicated pragma in Clang and applying Complex Patterns in DAG in LLVM Backend.

9 Likes

Hi, thank you for providing the summary.
Is there an eta for the posters? I’m mainly interested in the one for a compiler for Quantum-Classical Applications.