New proposal seems more flexible for future changes.
This is unavoidable. The point is that we should provide an alternative to the current state of affairs, where the compiler implicitly picks “vendorname-fast” by default.
We can have two modes; one where the user picks the exact algorithm, and one where we let the vendor pick an algorithm. So the vendor can automatically pick vendorname-fast in the latter mode.
There is a
VFDatabase in LLVM, a Vector Function Database: ⚙ D67572 [VectorUtils] Introduce the Vector Function Database (VFDatabase).
Maybe there is space for a similar design for floating point. Vendors could provide several variants of the same operation, i.e, cosine. During compilation the database is queried for the most suitable variant. Fast or not-fast is probably not precise enough.
@tschuett Thanks for pointing that out. There is a fair amount of overlap between my proposal and the vector function handling that I need to figure out. I definitely want them to be aligned.
It is about separation of concerns and being independent of TLI.
Imagine there is a Floating point function database (FPFDatabase). You are a vendor of an FPGA and you probably offer more than one cosine with different accuracies. It is your job to populate the database with your functions, accuracy, rounding mode, and how to invoke your functions.
The customers can query the database: I need a cosine with this accuracy. The selection algorithm is up to you.
If you are a vendor of an Nvidia H100, then you will probably only offer one cosine with complete different properties.
The customers do the same queries to the database.
Is there a design document somewhere for the Vector Function Database or is the code and review discussion the only documentation?
I’ve been modeling my implementation on the veclib handling, and it isn’t clear to me how that is supposed to fit in with the Vector Function Database. The general idea sounds like what I want, but I haven’t quite processed how it all is supposed to fit together.
There were several RFCs on the mailing lists:
I found it at the top of the review.
BTW, that reminds me of that bug in Chrome that fdlibm pow is worse precision.
Still not fixed.
I’ve posted a preliminary implementation of what I’m proposing here: ⚙ D138867 [RFC] Add new intrinsics and attribute to control accuracy of FP calls
This doesn’t yet incorporate the suggestion to follow the design of the vector function database. My proposed design is based on the implementations of constrained FP intrinsics and veclib handling.