Talk: Learning a Static Analyzer from Data -- Zurich Compiler Social - Thur, February 8, 2017

Dear all,

join us Thursday 19:00 at the Zurich Compiler Social!

# Tech Talk: Learning a Static Analyzer from Data

To be practically useful, modern static analyzers must precisely
model the effect of both, statements in the programming language
as well as frameworks used by the program under analysis. While important,
manually addressing these challenges is difficult for at least two
reasons: (i) the effects on the overall analysis can be non-trivial, and (ii)
as the size and complexity of modern libraries increase, so is the number
of cases the analysis must handle.

In this paper we present a new, automated approach for creating static
analyzers: instead of manually providing the various inference rules of
the analyzer, the key idea is to learn these rules from a dataset of programs.
Our method consists of two ingredients: (i) a synthesis algorithm
capable of learning a candidate analyzer from a given dataset, and (ii)
a counter-example guided learning procedure which generates new programs
beyond those in the initial dataset, critical for discovering corner
cases and ensuring the learned analysis generalizes to unseen programs.

We implemented and instantiated our approach to the task of learning
JavaScript static analysis rules for a subset of points-to analysis and for
allocation sites analysis. These are challenging yet important problems
that have received significant research attention. We show that our approach
is effective: our system automatically discovered practical and
useful inference rules for many cases that are tricky to manually identify
and are missed by state-of-the-art, manually tuned analyzers.


Pavol is a 3rd year PhD student at the ETH Zurich advised by Martin Vechev. His research is spanning the areas of programming languages, program analysis and machine learning. In particular, he focuses on creating new kinds of techniques and tools based on probabilistic learning from large codebases consisting of millions lines of code. Such tools can help us to solve important software tasks that are beyond the reach of existing techniques and yet are required daily in writing, understanding, porting or debugging programs.

# Registration

# What

A social meetup to discuss compilation and code generation questions
with a focus on LLVM, clang, Polly and related projects.

Our primary focus is to provide a venue (and drinks & snacks) that
enables free discussions between interested people without imposing
an agenda/program. This is a great opportunity to informally discuss
your own projects, get project ideas or just learn about what people at
ETH and around Zurich are doing with LLVM.

Related technical presentations held by participants are welcome (please
contact us).

# Who: - Anybody interested -

  - ETH students and staff
  - LLVM developers and enthusiasts external to ETH

# Where: CAB E 72

# What is LLVM ?

LLVM ( is an open source project that provides
a collection of modular compiler and toolchain technologies. It is
centered around a modern SSA-based compiler around which an entire
ecosystem of compiler technology was developed. Most well know is
the clang C++ compiler, which is e.g. used to deploy iOS. Beyond this
a diverse set of projects is developed under the umbrella of LLVM.
These include code generators and assemblers for various interesting
architectures, a jit compiler, a debugger, run-time libraries (C++
Standard Library, OpenMP, Opencl library), program sanity checkers,
and many more.

LLVM has itself grown out of a research project more than 10 years ago
and is the base of many exciting research projects today:,5&hl=de