[RFC] DSL-style / dialect-specific builders

+1 data point: while writing approximations for math functions we had a style/design questions: use generic builder API or custom-built DSL/builders.

Writing math function approximations is basically expanding high level ops like Log into sequence of primitive add/mul and bitcasting, and it is pretty dense, and hard to follow, and in my opinion default builders API adds too much boilerplate and makes it even harder.

Discussion is in: https://reviews.llvm.org/D97146

With custom builder std::frexp can be implemented with:

  Value cst126f = f32Cst(126.0f);
  Value cstHalf = f32Cst(0.5f);
  Value cstInvMantMask = f32FromBits(~0x7f800000u);

  // Cast to i32 for bitwise operations.
  Value i32Half = bitcast(i32, cstHalf);
  Value i32InvMantMask = bitcast(i32, cstInvMantMask);
  Value i32Arg = bitcast(i32, arg);

  // Compute normalized fraction.
  Value tmp0 = bitwiseOr(bitwiseAnd(i32Arg, i32InvMantMask), i32Half);
  Value normalized_fraction = bitcast(f32, tmp0);

  // Compute exponent.
  Value tmp1 = castSiToFp(f32, logicalShiftRight(bitcast(i32, abs(arg)), 23));
  Value exponent = sub(tmp1, cst126f);

  return {normalized_fraction, exponent};

after multiple iterations my best ImplicitLocOpBuilder version is:

  int width = vectorWidth(arg.getType());

  auto bcast = [&](Value value) -> Value {
    return broadcast(builder, value, width);
  };

  auto i32 = builder.getIntegerType(32);
  auto i32Vec = broadcast(i32, width);
  auto f32Vec = broadcast(builder.getF32Type(), width);

  Value cst126f = f32Cst(builder, 126.0f);
  Value cstHalf = f32Cst(builder, 0.5f);
  Value cstInvMantMask = f32FromBits(builder, ~0x7f800000u);

  // Cast to i32 for bitwise operations.
  Value i32Half = builder.create<LLVM::BitcastOp>(i32, cstHalf);
  Value i32InvMantMask = builder.create<LLVM::BitcastOp>(i32, cstInvMantMask);
  Value i32Arg = builder.create<LLVM::BitcastOp>(i32Vec, arg);

  // Compute normalized fraction.
  Value tmp0 = builder.create<LLVM::AndOp>(i32Arg, bcast(i32InvMantMask));
  Value tmp1 = builder.create<LLVM::OrOp>(tmp0, bcast(i32Half));
  Value normalizedFraction = builder.create<LLVM::BitcastOp>(f32Vec, tmp1);

  // Compute exponent.
  Value biasedExponentBits = builder.create<UnsignedShiftRightOp>(
      builder.create<LLVM::BitcastOp>(i32Vec, builder.create<AbsFOp>(arg)),
      bcast(i32Cst(builder, 23)));
  Value biasedExponent = builder.create<SIToFPOp>(f32Vec, biasedExponentBits);
  Value exponent = builder.create<SubFOp>(biasedExponent, bcast(cst126f));

  return {normalizedFraction, exponent};

Although it is not that bad as my original attempt to use builders (without implicit loc), it is a bit hard to get the gist of what’s happening here. And there are many similar functions that have even more bitcasting and bitshifting.

Yeah, these kind of hand coded “subroutine expansions” are where the dsl style makes a big difference.

What I don’t know how many of these there are and whether it is a persistent, minority case or whether we just sweat through it.

Well 2 parts here, 1) you can use b instead of builder as that is not needed to be spelled out (and internal to function that is convention we often use as the scope of use removes visual “noise” without impacting readability) 2) you need to ensure non-determinism still which means you have to avoid argument evaluation order issues.

2 in particular will mean that writing these will look more verbose and uglier in C++. And that may have (for me, has) much larger impact than extra characters - tmp0 and tmp1 being both things, being careful when you can and can’t call multiple creates/helpers in one line. The autocompletion angle is one where Alex & Rahul (I think) made good points (but that could also perhaps be done uniformly rather). But I worry if the uncanny value effect would not make DSL style more fragile (it is so close to math that one forgets it is building IR and then you use wrong constructs, C++ for vs SCF for); I could be wrong and it could be equally easy either way.

I think the line between helper function and dialect-specific builder is a bit thin in some of our discussions here (and helper functions are a thing). And being able to have consistent style is desirable - which I don’t know if that pushes for helpers to have same form as builders :slightly_smiling_face:

I agree with Jacques two points. I’d add for #1 that you can also using namespace LLVM to reduce another set of boilerplate. You can put using directives inside of functions and other scopes, so they don’t have to pollute your whole file.

The C++ order of evaluation issue is pernicious and subtle, we recently got bit by it in the CIRCT side of things. It is convenient that GCC and LLVM are different here, so we can at least have early warning about it, but anything that leads to a lot of nested expressions will have to be aware of this. To give you an explicit example with Eugene’s helper syntax, even this is not safe:

Value tmp = bitwiseOr(bitcast(i32, x), abs(arg));

because the abs and bitcast can be emitted in any order.

-Chris

Yeah I was very supportive the DSL approach (which became edsc) until I realized how hostile C++ is a host language for DSL: the order of evaluation issue can lead to tricky behavior! And since this all platform specific it only hits us in CI (or worse: later when someone tries the code on a new platform).

(Number of days since we rediscover that C++ is a challenging language to build a DSL in: 0)

1 Like

Yes, but it looks like you are missing a conclusion here! How do you avoid the presence of these counter-patterns (order of evaluation issues from such builders) in code to the extent possible? @ftynse already mentions about order of evaluation in his OP: but the key is that I haven’t seen a single instance of someone writing b.create(....., b.create(...), ..., b.create(...)). You don’t get line compaction nor readability. To the contrary, such forward substitution would be the norm with the DSL style builders being proposed - for eg.:

output(indices.outputs) = x + y + max(input(indices.inputs), output(indices.outputs));

Are you sure what IR it generates?! This was what I was mentioning in post #30.

IMO, there are two styles of DSL we can build: numpy/math-like with nested calls and overloaded operators or WYSIWYG that resembles the generated IR. In the latter case, you actually want one C++ statement per IR instruction, but it doesn’t look easy to enforce that in C++ without additional boilerplate.

I suppose we can somehow teach clangd to “see through” std::forward-like functions with variadic templates and suggest completions based on the signature of the final function. Same could be interesting for std::enable_if and other SFINAE tricks. This sounds like a good summer intern project, but I am not familiar enough with clangd to support it. We may also run into the need of making clangd aware of library semantics.

There’s also the issue of ordering between template argument deduction vs. implicit conversion.

That was one of my arguments: people will add helpers. So let’s have consistent helpers instead of having to discover a new style in every file :slight_smile: This can be also solved with a style guide or a “best practices” document.

One could argue that this is mostly a problem of using ordered textual checks in tests (and I have heard a similar complaint from Clang and Swift folks some time ago) when the actual order does not matter. The downstream passes should work equally well regardless of the order.

I have and complained about this in review :slight_smile: It was for subview-style ops that often use multiple constants as arguments if I am not mistaken, and before these ops could use attributes instead. There was also something like builder.create<SubViewOp>(loc, getZeros(builder, rank), sizes, getOnes(builder, rank))) where the nested calls are hidden in the helpers. I’m also quite certain we have b.create(...., b.create(...), ...), that is the “safe” single-nested case, in the code base for something like dim.

It feels like there is a surprisingly common subjective complexity metric that prevents folks from writing such code in most cases.

No, I don’t think this is an issue of tests and CHECK-NEXT vs CHECK-DAG. The issue is that it is really important for compilers to be determinstic and reproducible. If IRGen for some compiler produces different output depending on whether it was built with GCC or Clang, then that will eventually (after many passes) manifest as different miscompilations exposed, different performance of the generated code, etc.

-Chris

2 Likes

+1 on what Chris says: while it is semantically correct to emit the IR in any order here, we really don’t want the host compiler (or other host environment) used to build MLIR to change the code we emit.

2 Likes

The coding style doc can be updated to add a guideline avoid such nesting. It anyway doesn’t save or barely saves any LOC (the builder, loc etc. have to be passed to any helper methods for eg.) and it’s a big negative readability-wise since you no longer have a named variable for the intermediate value.

I agree that deterministic behavior is very important for unit tests, since they are closely associated with the code that they are testing. The flip side is that randomization can be very valuable for certain aspects of testing where seeing lots of different versions of the input is necessary. Hence, in some cases, forced randomization is a good thing for finding corner cases, but this is a different goal than unit tests.

A little side note regarding the order of execution - not sure if this has been discussed / is obvious. It seems intializer lists guarantee left to right execution order:

https://stackoverflow.com/questions/31333729/order-of-evaluation-in-initializer-list-c11

Maybe this could be useful to enforce an evaluation order? The disadvantage is some additional boilerplate and I at least doubt auto-completion works properly.