Posting this here with the hope to bring it to the round table discussion at the MLIR workshop next week.
RFC: A “poly” dialect for polynomial arithmetic
We propose a new MLIR dialect called poly representing computations involving single-variable polynomials, as well as lowerings to standard MLIR dialects like linalg, affine, tensor, and arith. In the taxonomy presented in the 2023 EuroLLVM talk, “MLIR Dialect Design and Composition for Front-End Compilers”, poly is a computation dialect. The central goal of upstreaming poly (and an associated set of lowering paths to standard MLIR) is to make programs with polynomial-heavy computations fast. We expect it to be useful for applications to cryptographic compilers and potentially scientific computing applications.
At a high level, poly will contain a type representing a polynomial, attributes specifying the domain of the polynomials (e.g., integer or real coefficients), and ops representing addition, multiplication, division/remainder, polynomial evaluation, etc.
An in-progress prototype implementation of poly exists in the HEIR project.
Motivation
Our main motivation for upstreaming poly is for cryptographic applications. The HEIR project began in mid 2023 and aims to be a standardized platform for research in homomorphic encryption (HE) as well as an industry-strength compiler for production applications of HE. Most popular HE implementations rely in some form on a cryptographic problem called Ring Learning With Errors (RLWE), which is where the polynomial arithmetic comes from, and explains the specific need for polynomial quotient rings: the security of the cryptography relies heavily on the properties of the ring.
Most analyses of HE workflows indicate that polynomial arithmetic accounts for > 95% of the workload. HE applications differ in specifics, but most fall on the end of one of two extremes of being memory bandwidth limited—doing lots of high-degree polynomial ops in parallel—versus compute latency limited—lots of sequential, but relatively small matrix-matrix multiplications where the matrix elements are polynomials.
Moreover, the decision of what sort of polynomial math to use is a backend implementation choice that depends on the target hardware. HE schemes tend to specify semantic operations at a high level (add two encrypted i32s, lookup an encrypted index in a cleartext table), and an HE implementation or compiler decides both what the security parameters should be—which affects the sizes and coefficients of the polynomials—as well as how to represent the high level operations as polynomial math, which involves many tradeoffs. While most of that work belongs in downstream projects, we point it out mainly to argue that it is sensible to treat polynomials as first-class citizens in the compiler.
Polynomial math is also the basis for cryptographic computing applications outside of HE, and they would benefit from a centralized treatment of fast polynomial math.
- Next-generation “post-quantum” cryptosystems currently being standardized in NIST are based on RLWE and related problems like NTRU, which heavily utilize single-variable polynomial quotient arithmetic.
- Private information retrieval applications often utilize RLWE for their privacy guarantees, though most implementations today manually implement and fine-tune them on a per-application basis.
- Multi-party computation also often involves polynomial arithmetic primitives.
Design
Poly will contain:
- A single new type, poly.poly, representing the elements of a specific polynomial quotient ring.
- Attributes that specify the polynomial ring in which the operations occur, which includes:
- What coefficients the polynomials have; in its simplest form this is just specifying an MLIR type (i32, complex, etc.), but one additional immediate requirement is to support the ability to represent coefficients in a non-power-of-two coefficient ring (e.g. Z/pZ for a prime p). Note this will require additional modular arithmetic operations inserted into various lowerings, but they can be omitted for users who don’t want special moduli and prefer to inherit the traditional overflow semantics.
- An attribute to specify a polynomial like
1 + x**2
, used to define constants and for the next item - What ideal is used for the polynomial quotient; for math reasons, this boils down to the specification of a list of polynomials, and is usually chosen to be a single polynomial in cryptographic applications.
- The following ops:
poly.constant
for constant materialization/propagationpoly.from_tensor
andpoly.to_tensor
to convert in and out of a poly type to a dense tensor representation.poly.add
,poly.sub
,poly.mul
the natural addition/subtraction/multiplication operations on polynomials.poly.div
returning quotient and remainderpoly.scale
: A rescaling of the coefficients by a ring scalarpoly.eval
evaluating a polynomial at a given pointpoly.leading_term
computing the degree and leading coefficient of a polynomialpoly.monomial
creating a monomial from a coefficient and degreepoly.dft
andpoly.idft
, discrete fourier transforms converting polynomials to and from “evaluation” formspoly.ntt
andpoly.intt
, integer-only analogues of discrete fourier transforms
We expect most of the optimizations to occur in the lowerings. For example, poly.mul
can be lowered to a linalg.conv_1d
over relevant tensors. poly.dft
and its friends can be lowered as matrix multiplications or fast Fourier transform implementations, depending on the target hardware and the ring.
One notable aspect of this design is that nothing in MLIR upstream today is expected to lower to poly. It can be thought of as an input dialect in this sense. Most polynomial arithmetic in scientific computing applications is currently implemented in software libraries, and they frequently offer an API that is similar to poly. In a cryptographic compiler like HEIR, cryptographic operations lower to poly, at which point the remaining semantic information about the underlying cryptography is dropped and a “secure” or “private” program reduces to polynomial number crunching. That said, having fast polynomial arithmetic in the compiler pipeline could incentivize more scientific computing to outsource polynomial math to the compiler.
The first implementation milestone would be the ability to benchmark polynomial arithmetic as lowered from poly through LLVM.