Description: Bfloat16 is a recently developed floating point format tailored to machine learning and AI, and in the latest C++23 standard, it is officially standardized as std::bfloat16_t. Its support could be found in many modern hardware, ranging from CPUs of all the major vendors including Intel, AMD, Apple, and Amazon, to GPUs (nVidia and AMD GPUs) and Google TPUs. On the software side, it is supported in all major accelerator libraries, such as CUDA, ROCm, oneAPI, PyTorch, and Tensorflow. The goal for this project is to implement bfloat16 math functions in the LLVM libc library.
Expected Results:
Setup the generated headers properly so that the type and the functions can be used with various compilers (+versions) and architectures.
Implement generic basic math operations supporting bfloat16 data types that work on supported architectures: x86_64, arm (32 + 64), risc-v (32 + 64), and GPUs.
Implement specializations using compiler builtins or special hardware instructions to improve their performance whenever possible.
If time permits, we can start investigating higher math functions for bfloat16.
Project Size: Large
Requirement: Basic C & C++ skills + Interest in knowing / learning more about the delicacy of floating point formats.
This seems like something Iād love to work on. Iāve looked at the mathematical representation of Bfloat16, and have some experience contributing to LLVM (albeit in a different area). Is there any other prerequisite reading or such youād suggest?
Hi, I am very interest in this projectļ¼I have a solid foundation in computer architecture, especially RISC-V and GPU-related architecture. Alsoļ¼I have a strong interest in machine learning and have taken relevant courses on my own. I think this is the perfect project for me.
HI! I am pursuing Electrical Engineer and Computer Science majors at Boise State and I think I would work well with this project. I am very familiar with floating point representation and have coded an extremely fast double floating point parser that always returns the closest representation to the string passed in. Iāve also worked on making my own programming languages before and have had PRās merged into the Zig project. Iāll try to compile LLVM tonight and get up to speed but Iād also like to know what I can do in the mean time to get more prepared (like issues I could work on now, etc)
Hi! Iām a Masterās student in Computer and Information Technology at UPenn with a strong foundation in C/C++ and a background in AI and machine learning. Iāve been exploring floating-point representations and SIMD optimizations, and Iām really interested in contributing to this project.
Iāve been looking into LLVM libcās floating-point functions and optimizations, and Iād love to know how I can get more prepared. I also noticed that the project mentions investigating higher math functions for Bfloat16 if time permitsāI have plenty of time and would be eager to explore that as well.
Handbook of Floating-Point Arithmetic by Jean-Michel Muller et al.,
Elementary Functions: Algorithms and Implementation by Jean-Michel Muller.
If you would like to start working on the LLVM libc codebase, you can search through open issues with the ālibcā label on GitHub: GitHub Ā· Where software is built, and in particular those that also have the āgood first issueā label: GitHub Ā· Where software is built, but the latter currently (at the time of writing) all have someone assigned to them already.
If you have further questions, you can either email us or ask in the #libc channel on the LLVM Discord server (LLVM).
Hi! Iām a Masterās student in Computer Engineering at TU Delft with a strong background in low-level systems programming, floating-point arithmetic, and AI acceleration. Iāve worked with both CPUs and GPUs, including optimizing deep learning workloads and understanding precision trade-offs in numerical computing.
Iām particularly excited about this project because it aligns perfectly with my interests in floating-point formats, hardware-aware optimizations, and compiler-level improvements. I have experience with LLVM and have explored floating-point representations in AI systems. Iām eager to contribute to implementing efficient bfloat16 math functions and leveraging hardware-specific optimizations for performance.
Hello! Iām a second year student at the University of Southern California studying Computer Science and Mathematics. Iām relatively new to LLVM but Iād love to explore working with compilers, GPUs, and in this case, floating point representation. My previous experience has been in full-stack web dev and ML research, but Iām very open to learning. Please keep us posted if there are additional ways to get involved (apart from the good first issues which are taken)!
Hello, this is one of the 2 projects Iād be interested in contributing to. Hereās a bit about meā¦
Iām a first year M.Sc. student in Engineering Mathematics. Iāve had experience mainly with C, C++ and Python as programming languages and tools such as CUDA, OpenMP, MPI and a bit of OpenCL.
Recently, Iāve been busy with more maths focused courses, but Iāve had a Parallel Computing course and Iāll start a Compiler Construction course next week. I should mention itād be the first time for me contributing to an open source project.
Currently, Iām also doing an internship in the machine learning field, but Iāll be quite free by June, ideally, with fresh knowledge about compilers. In the meantime, Iāll try to get up to speed about this project.
For those who would like to try to implement a higher math function already, Iāve just opened the following issues for some of the remaining _Float16 higher math functions:
I am very excited about this project and would love to contribute to it. I am currently a second-year graduate student in Computer Science at Sun Yat-sen University. While I am new to open-source communities, I have strong programming skills in C++ and Python, as well as experience with CUDA.
My academic background includes a course on compiler principle. And I implemented a compiler frontend using ANTLR and designed an IR (Intermediate Representation) inspired by LLVMās structure. On top of this IR, I developed a series of optimization passes. You can find my project here: YAT-CC Project Link. Additionally, I contributed to the development of a compiler teaching platform (YATCC-AI), where I was primarily responsible for Lab 3 (IR Generation) and Lab 4 (Optimization). I also worked on integrating LLM (Large Language Models) with the compiler, further deepening my familiarity with the LLVM framework.
Regarding this project, my understanding is that the input is restricted to bfloat16, but higher precision may be used during computation. Could you confirm if this is correct? Since the target architecture involves GPUs, Iād also like to know which GPUs are planned for support and whether there are any memory constraints. Given that bfloat16 is widely used in AI computing, will quantization techniques be a key focus?
I am highly interested in this project and would love to make it my first major open-source contribution. I have ample time to dedicate to the work and am eager to learn and collaborate under your guidance. Looking forward to your insights and the opportunity to contribute!
LLVM libc supports AMD and NVIDIA GPUs (AMDGPU and NVPTX targets in Clang/LLVM). We donāt have an exact list of supported GPUs but anything past gfx803 for AMD and sm_52 for NVIDIA is probably supported by LLVM libc. Iām not aware of specific memory constraints.
Hi, I am Aditya. I am a bit late but I have been spending the last couple days going over the libc codebase and am very interested in this project. Hereās a bit about me:
I am a 4th year CS Undergrad at IIIT-H. I am an undergraduate researcher at the Computer Systems Group at my uni, and have taken courses on compilers and computer architecture in the past. I have also worked on implementing the AST translation passes on a toy compiler of my own, for a statically typed lisp.
Over the last couple days I merged my first PR, contributing to libc (#134167). I am excited to continue contributing and hope to do so through gsoc over the summer.
Is it okay if I share a draft of my proposal for review before submitting it?
Hi, Iām Maaz. Iām very interested in this project and wanted to ask:
Will optimization using parallelization strategies (e.g., OpenMP)
x86 vector instructions (like AVX intrinsics)
be encouraged or considered for this project? I have experience with both and would love to explore them further in the context of bfloat16.