[RFC]: A "poly" dialect for polynomial arithmetic

Posting this here with the hope to bring it to the round table discussion at the MLIR workshop next week.

RFC: A “poly” dialect for polynomial arithmetic

We propose a new MLIR dialect called poly representing computations involving single-variable polynomials, as well as lowerings to standard MLIR dialects like linalg, affine, tensor, and arith. In the taxonomy presented in the 2023 EuroLLVM talk, “MLIR Dialect Design and Composition for Front-End Compilers”, poly is a computation dialect. The central goal of upstreaming poly (and an associated set of lowering paths to standard MLIR) is to make programs with polynomial-heavy computations fast. We expect it to be useful for applications to cryptographic compilers and potentially scientific computing applications.

At a high level, poly will contain a type representing a polynomial, attributes specifying the domain of the polynomials (e.g., integer or real coefficients), and ops representing addition, multiplication, division/remainder, polynomial evaluation, etc.

An in-progress prototype implementation of poly exists in the HEIR project.

Motivation

Our main motivation for upstreaming poly is for cryptographic applications. The HEIR project began in mid 2023 and aims to be a standardized platform for research in homomorphic encryption (HE) as well as an industry-strength compiler for production applications of HE. Most popular HE implementations rely in some form on a cryptographic problem called Ring Learning With Errors (RLWE), which is where the polynomial arithmetic comes from, and explains the specific need for polynomial quotient rings: the security of the cryptography relies heavily on the properties of the ring.

Most analyses of HE workflows indicate that polynomial arithmetic accounts for > 95% of the workload. HE applications differ in specifics, but most fall on the end of one of two extremes of being memory bandwidth limited—doing lots of high-degree polynomial ops in parallel—versus compute latency limited—lots of sequential, but relatively small matrix-matrix multiplications where the matrix elements are polynomials.

Moreover, the decision of what sort of polynomial math to use is a backend implementation choice that depends on the target hardware. HE schemes tend to specify semantic operations at a high level (add two encrypted i32s, lookup an encrypted index in a cleartext table), and an HE implementation or compiler decides both what the security parameters should be—which affects the sizes and coefficients of the polynomials—as well as how to represent the high level operations as polynomial math, which involves many tradeoffs. While most of that work belongs in downstream projects, we point it out mainly to argue that it is sensible to treat polynomials as first-class citizens in the compiler.

Polynomial math is also the basis for cryptographic computing applications outside of HE, and they would benefit from a centralized treatment of fast polynomial math.

  • Next-generation “post-quantum” cryptosystems currently being standardized in NIST are based on RLWE and related problems like NTRU, which heavily utilize single-variable polynomial quotient arithmetic.
  • Private information retrieval applications often utilize RLWE for their privacy guarantees, though most implementations today manually implement and fine-tune them on a per-application basis.
  • Multi-party computation also often involves polynomial arithmetic primitives.

Design

Poly will contain:

  • A single new type, poly.poly, representing the elements of a specific polynomial quotient ring.
  • Attributes that specify the polynomial ring in which the operations occur, which includes:
    • What coefficients the polynomials have; in its simplest form this is just specifying an MLIR type (i32, complex, etc.), but one additional immediate requirement is to support the ability to represent coefficients in a non-power-of-two coefficient ring (e.g. Z/pZ for a prime p). Note this will require additional modular arithmetic operations inserted into various lowerings, but they can be omitted for users who don’t want special moduli and prefer to inherit the traditional overflow semantics.
    • An attribute to specify a polynomial like 1 + x**2, used to define constants and for the next item
    • What ideal is used for the polynomial quotient; for math reasons, this boils down to the specification of a list of polynomials, and is usually chosen to be a single polynomial in cryptographic applications.
  • The following ops:
    • poly.constant for constant materialization/propagation
    • poly.from_tensor and poly.to_tensor to convert in and out of a poly type to a dense tensor representation.
    • poly.add, poly.sub, poly.mul the natural addition/subtraction/multiplication operations on polynomials.
    • poly.div returning quotient and remainder
    • poly.scale: A rescaling of the coefficients by a ring scalar
    • poly.eval evaluating a polynomial at a given point
    • poly.leading_term computing the degree and leading coefficient of a polynomial
    • poly.monomial creating a monomial from a coefficient and degree
    • poly.dft and poly.idft, discrete fourier transforms converting polynomials to and from “evaluation” forms
    • poly.ntt and poly.intt, integer-only analogues of discrete fourier transforms

We expect most of the optimizations to occur in the lowerings. For example, poly.mul can be lowered to a linalg.conv_1d over relevant tensors. poly.dft and its friends can be lowered as matrix multiplications or fast Fourier transform implementations, depending on the target hardware and the ring.

One notable aspect of this design is that nothing in MLIR upstream today is expected to lower to poly. It can be thought of as an input dialect in this sense. Most polynomial arithmetic in scientific computing applications is currently implemented in software libraries, and they frequently offer an API that is similar to poly. In a cryptographic compiler like HEIR, cryptographic operations lower to poly, at which point the remaining semantic information about the underlying cryptography is dropped and a “secure” or “private” program reduces to polynomial number crunching. That said, having fast polynomial arithmetic in the compiler pipeline could incentivize more scientific computing to outsource polynomial math to the compiler.

The first implementation milestone would be the ability to benchmark polynomial arithmetic as lowered from poly through LLVM.

4 Likes

Thanks for the proposal! Should we schedule a presentation and discussion at an Open Meeting?
Next week is the LLVM conference, but after that the calendar is open!

In addition to that (dev meeting does not replace ODM), we can discuss it at a roundtable!

I think this is a good idea. I would suggest changing the name to “polynomial” instead of “poly” because “poly” is too overloaded in my opinion.

2 Likes

We just had the roundtable happening last evening at the LLVM developers meeting and it seems that there is indeed support for getting this dialect in. It is the common foundation for any FHE compiler and consequently makes it a good start for building an FHE compiler stack in MLIR.

2 Likes

This is a very interesting dialect that will definitely be useful for us as well. Happy to collaborate here.

1 Like

Thanks & sorry for the late reply! @Manuel: What use case(s) for polynomial do you have in mind (FHE/(R)LWE-based cryptography or something different)?

@j2kun and I will be talking about polynomial at the ODM this week and we’d love to hear people’s thoughts!

Beyond the ODM, we’ve also got a bi-weekly working group meeting for the overall HE IR project (of which polynomial is just one layer). We meet Tuesdays at 8am PDT/17:00 CET, and the next meeting is today. The group includes a variety of industry people and academics from the FHE side but could always use a few more MLIR experts!

PS: gentle reminder for @mehdi_amini to create the ODM announcement post :wink:

1 Like

We are starting with post-quantum cryptoalgorithms (PQC) right now and are currently targeting Classic McEliece. Crystals Kyber is also on our list to be investigated. A focus will be put on efficient hardware implementation by designing accelerators. A key question will be how to implement PQC on diverse hardware (i.e. processors having a hardware accelerator and processors without it).

2 Likes