Compile GPU kernels using ClangIR

August 29, 2024

During the last year of my undergraduate studies, I found myself with some extra time and decided to participate in the LLVM community's Google Summer of Code (GSoC) program. My project for the summer was titled Compile GPU kernels using ClangIR. The primary focus of this project was to design a HIR for heterogeneous programming models. In addition to ensuring functional completeness, the work aimed to lay the groundwork for supporting more complex and modern languages like CUDA and SYCL in the future.

This experience marked my first deep dive into the workings of an open-source community. The ClangIR community, a smaller sub-community within the larger LLVM family, provided a welcoming environment. We have monthly meetings that are relatively informal, making it easier for everyone to communicate and collaborate.

Background

The ClangIR project aims to establish a new IR for Clang, built on top of MLIR. As part of the ongoing effort to support heterogeneous programming models, this project focuses on integrating OpenCL C language support into ClangIR. The ultimate goal is to enable the compilation of GPU kernels written in OpenCL C into LLVM IR targeting the SPIR-V architecture, laying the groundwork for future enhancements in SYCL and CUDA support.

What We Did

Our work involved several key areas:

Address Space Support: One of the fundamental tasks was teaching ClangIR to handle address spaces, a vital feature for languages like OpenCL. Initially, we considered mimicking LLVM's approach, but this proved inadequate for ClangIR's goals. After thorough discussion and an RFC, we implemented a unified address space design that aligns with ClangIR's objectives, ensuring a clean and maintainable code structure.
OpenCL Language and SPIR-V Target Integration: We extended ClangIR to support the OpenCL language and the SPIR-V target. This involved enhancing the pipeline to accommodate the latest OpenCL 3.0 specification and implementing hooks for language-specific and target-specific customizations.
Vector Type Support: OpenCL vector types, a critical feature for GPU programming, were integrated into ClangIR. We leveraged ClangIR's existing cir.vector type to generate the necessary code, ensuring consistent compilation results.
Kernel and Module Metadata Emission: We added support for emitting OpenCL kernel and module metadata in ClangIR, a necessary step for proper integration with the SPIR-V target. This included the creation of structured attributes to represent metadata, following MLIR's preferences for well-defined structures.
Global and Static Variables with Qualifiers: We implemented support for global and static variables with qualifiers like global, constant, and local, ensuring that these constructs are correctly represented and lowered in the ClangIR pipeline.
Calling Conventions: We adjusted the calling conventions in ClangIR to align with SPIR-V requirements, migrating from the default cdecl to SPIR-V-specific conventions like SpirKernel and SpirFunction. This also enables most OpenCL built-in functions like barrier and get_global_id.
User Experience Enhancements: Finally, we ensured that the end-to-end kernel compilation experience using ClangIR was smooth and intuitive, with minimal manual intervention required.

Results

The project successfully met its primary goals. OpenCL kernels from the Polybench-GPU benchmark suite can now be compiled using ClangIR into LLVM IR for SPIR-V. All patches have been merged into the main ClangIR repository, and the project’s progress has been well-documented in the overview issue. I believe the work not only advanced OpenCL support but also laid a solid foundation for future enhancements, such as SYCL and CUDA support in ClangIR.

We have successfully compiled and passed all 20 OpenCL C test cases from the polybenchGpu repository. Please refer to our artifact evaluation repository for detailed instructions on how to experiment with our work.

Future Works

As we look forward, there are two key areas that require further development:

Function Attribute Consistency: For example, the convergent function attribute is crucial for preventing misoptimizations in SIMT languages like OpenCL. ClangIR currently lacks this attribute, which could lead to issues in parallel computing contexts. Addressing this is a priority to ensure correct optimization behavior.
Support for OpenCL Built-in Types: Another critical area for future work is the support for OpenCL built-in types, such as pipe and image. These types are essential for handling data streams and image processing tasks in various specialized OpenCL applications. Supporting these types will significantly enhance ClangIR's adherence to the OpenCL standard, broadening its applicability and ensuring better compatibility with a wide range of OpenCL programs.

Acknowledgements

This project would not have been possible without the guidance and support of the LLVM community. I extend my deepest gratitude to my mentors, Julian Oppermann, Victor Lomüller, and Bruno Cardoso Lopes, whose expertise and encouragement were instrumental throughout this journey. Additionally, I would like to thank Vinicius Couto Espindola for his collaboration on ABI-related work. This experience has been immensely rewarding, both technically and in terms of community engagement.

Background

What We Did

Results

Future Works

Acknowledgements

Appendix