The LLVM Compiler Infrastructure
Site Map:
Download!
Search this Site


Useful Links
Release Emails
21.1.8: Dec 2025
21.1.7: Dec 2025
21.1.6: Nov 2025
21.1.5: Nov 2025
21.1.4: Oct 2025
21.1.3: Oct 2025
21.1.2: Sep 2025
21.1.1: Sep 2025
21.1.0: Aug 2025
20.1.8: Jul 2025
20.1.7: Jun 2025
20.1.6: May 2025
20.1.5: May 2025
20.1.4: Apr 2025
20.1.3: Apr 2025
20.1.2: Apr 2025
20.1.1: Mar 2025
20.1.0: Mar 2025
All Announcements

Maintained by the
llvm-admin team
Tenth LLVM Performance Workshop at CGO
  • What: Tenth LLVM Performance Workshop at CGO
  • When: January 31st - February 4th, 2026 (Saturday - Wednesday)
  • Where: Sydney, Australia
  • International Convention Centre Sydney, Sydney, Australia [In person]
  • Proposals should be submitted to: Easychair
  • The deadline for receiving submissions is: December 23rd, 2025 December 16th, 2025
  • Speakers will be notified of acceptance or rejection by: December 24th, 2025 December 23rd, 2025
  • Note: Travel grants are available to eligible candidates upon request. Please reach out to the program committee if you need a travel grant.
  • Note: Invitation letters for visa application are available upon request. Please reach out to the program committee if you need invitation letter for visa application.

The Tenth LLVM Performance Workshop will be held at (CGO 2026). The workshop is co-located with CC, HPCA, and PPoPP. If you are interested in attending the workshop, please register at the (CGO website). The LLVM workshop at CGO will be in-person.

Program Committee:

  • Aditya (hiraditya at msn.com)
  • Jose M Monsalve Diaz (josem.monsalvediaz at amd.com)
  • Shilei Tian (i at tianshilei.me)
  • Rafael Andres Herrera Guaitero (rafaelhg at udel.edu)
  • Kevin Sala (salapenades1 at llnl.gov)

Schedule

Time (AEDT)

Speaker

Title

Topic

8:30 - 9:30 (60 min)

Jose M Monsalve Diaz
Shilei Tian
Aditya
Rafael A Herrera Guaitero
Kevin Sala

Opening Remarks. Keynote TBA

Welcome, Introduction and Keynote

9:30 - 10:00 (30 min)

Mircea Trofin

ML Optimizations in Production LLVM: The Next Research Challenges (an engineer's opinion)

ML, Optimization, Industry

10:00 - 10:30 (30 min)

Miguel Andrés Cárdenas Sierra, Rafael A Herrera Guaitero, Isaac David Bermudez Lara, Jose M Monsalve Diaz

Compiling Agentic AI Programs for Dataflow Execution: An MLIR Approach

MLIR, Dataflow Compilation, Agentic AI, Compiler Optimization, Concurrency

10:30 - 11:00 (30 min)

-

Coffee Break

-

11:00 - 11:30 (30 min)

Yizhe Zhang, Bocheng Han, Zhengyi Yang, Boge Liu, Yi Jin

Polymer: An explainable database execution engine based on MLIR

MLIR, Database Execution Engine, JIT Compilation, Query Optimization, Pipeline Execution

11:30 - 12:00 (30 min)

Lei Qiu, Yikang Fan, Yanxia Wu, Fang Lyu

Profile Once, Optimize Anywhere: Architecture-agnostic Profile-Guided Optimization

Profile-Guided Optimization, Cross-Architecture Optimization, Compiler Optimization, HPC

12:00 - 13:00 (60 min)

-

Lunch Break

-

13:00 - 13:30 (30 min)

Yingchi Long, Jun Jiang, Yanhe Zhai, Yaohui Han, Ying Liu, Zheng Lin, Yuyang Zhang, Zhongcheng Zhang, Jiahao Shan, Zhenchuan Chen, Xiaobing Feng, Huimin Cui

Practice on Optimizing SPEC CPU 2017 for Sunway Architecture

Sunway Architecture, LLVM Compiler, Vectorization, Partial Redundancy Elimination, Constant Propagation

13:30 - 14:00 (30 min)

Zhantong Qiu, Mahyar Samani, Jason Lowe-Power

Nugget: Portable Program Snippets

Computer Architecture, Simulation, Workload Reduction, Sampling Methodology, Portable Program Snippets

14:00 - 14:30 (30 min)

-

Coffee Break

-

14:30 - 15:00 (30 min)

Kevin Sala, Krishna Chaitanya Sankisa, Krzysztof Parzyszek, Michael Klemm

Equipping LLVM/OpenMP with Advanced OpenMP GPU Offloading Features

OpenMP, GPU, Accelerator, Target, LLVM, Clang

15:00 - 15:10 (10 min)

Jose M Monsalve Diaz
Shilei Tian
Aditya
Rafael A Herrera Guaitero
Kevin Sala

Closing Remarks

Getting Feedback

Abstracts

ML Optimizations in Production LLVM: The Next Research Challenges (an engineer's opinion)
▲ back to schedule

Mircea Trofin

LLVM has had support for ML-Guided Optimizations for the last 5 years, applied first to a size problem (via the inliner), and then for a performance problem (via the register allocator). At Google, we've been using both in production workloads: the former, for Chrome on Android; Fuchsia OS; and for cloud infrastructure. The latter, for our instrumented profiling binaries, including search; and for the Android compiler toolchain and within AOSP. This talk is about open problems from the challenges we encountered. Specifically, it is a call for collaboration between academia and industry on addressing what we learned to be the key challenging compiler problems which, once solved, can unlock the wide-spread replacement of optimization decisions with policies trained via automatic techniques (ML or AI), at large scale in the industry.

Compiling Agentic AI Programs for Dataflow Execution: An MLIR Approach
▲ back to schedule

Miguel Andrés Cárdenas Sierra, Rafael A Herrera Guaitero, Isaac David Bermudez Lara, Jose M Monsalve Diaz

Agentic AI programs orchestrate inference, memory, and external tools to accomplish complex tasks. Compiling these programs presents distinct challenges: individual operations may take seconds to complete, execution depends on remote services, and the primary opportunity for optimization lies in exploiting concurrency among independent operations rather than traditional instruction-level techniques. We introduce an MLIR dialect for agentic AI that represents data dependencies explicitly through dataflow semantics. The dialect defines 18 operations covering inference, three-tier memory, tool invocation, and synchronization. Three optimization passes exploit the structure of agent programs: reasoning fusion reduces inference round-trips by merging sequential operations, context deduplication eliminates redundant inputs across operations, and capability scheduling enables cost-aware execution ordering through operation classification. The compiler performs dependency analysis to identify concurrent execution opportunities and lowers programs to dataflow graphs where execution is driven by data availability rather than program order. This work demonstrates that domain-specific MLIR dialects enable effective compiler optimization for AI workloads where latency dominates execution time.

Polymer: An explainable database execution engine based on MLIR
▲ back to schedule

Yizhe Zhang, Bocheng Han, Zhengyi Yang, Boge Liu, Yi Jin

Despite significant advances in database execution engine performance through Just-in-Time (JIT) compilation and optimized execution strategies, database systems continue to suffer from limited explainability and extensibility. Evaluating individual operator implementations typically requires modifying source code, and database operation reuse remains constrained by language boundaries. We present Polymer, a framework that leverages MLIR's hierarchical intermediate representation to model database execution engines. Polymer treats database operations as composable MLIR operators, enabling fine-grained debugging and systematic optimization across operator boundaries. By representing query execution plans as MLIR modules and lowering them to LLVM IR for execution via LLVM's ORC JIT runtime, Polymer provides a unified platform for evaluating query optimizers, comparing data format I/O performance, and identifying performance bottlenecks at the operator level. Our approach transforms database execution into a compiler-centric problem, enabling the application of mature compiler optimization techniques to database systems while preserving explainability through MLIR's multi-level representation.

Profile Once, Optimize Anywhere: Architecture-agnostic Profile-Guided Optimization
▲ back to schedule

Lei Qiu, Yikang Fan, Yanxia Wu, Fang Lyu

Profile-Guided Optimization (PGO) is a widely used technique for guiding compiler optimizations with runtime behavior. However, its adoption is limited by the high cost of collecting architecture-specific profiles, especially on resource-constrained devices or slow simulation platforms. We propose APGO, an architecture-agnostic PGO framework that enables "profile once, optimize anywhere". APGO treats profile transfer as a region-level alignment problem: APGO directly reuses matched profile regions and employs an AI-Guided Synthesizer to reconstruct missing profile data through architecture-aware layout adaptation and profile mapping. This approach removes the requirement for native profiling on divergent targets, enabling efficient profile reuse. Experimental results show that APGO reduces up to 32.78x profiling time compared to native architecture profiling, while achieving comparable performance on the majority of workloads (34/54 single-core and 11/20 multi-threaded) across RISC-V and ARM. Notably, APGO even surpasses native PGO on 14 single-core and 6 multi-threaded workloads, with peak improvements of 13%.

Practice on Optimizing SPEC CPU 2017 for Sunway Architecture
▲ back to schedule

Yingchi Long, Jun Jiang, Yanhe Zhai, Yaohui Han, Ying Liu, Zheng Lin, Yuyang Zhang, Zhongcheng Zhang, Jiahao Shan, Zhenchuan Chen, Xiaobing Feng, Huimin Cui

Sunway architecture requires tailored compiler optimizations to be performed to achieve peak performance on Sunway CPUs. In the practice of optimizing SPECCPU 2017 benchmark suite, a few compiler optimizations targeting Sunway architecture have been implemented, involving both approaches dedicated for Sunway instruction set and micro-architecture, and common methods that may also benefit other architectures (e.g., X86, ARM, RISC-V) but not included in mainstream LLVM yet. This work introduces four such optimizations integrated into the LLVM: 1) customized instruction selection for vectorized zero-extending load and truncating store, 2) vectorization factor calculation based on bit-width in vector register, 3) loop-carried partial redundancy elimination and 4) constant propagation of fortran arguments. Evaluated on the SPEC CPU 2017 benchmark suite across two 64-core Sunway CPUs (SW3231 and WX-H8000), the enhanced compiler achieves ratio increases of 20.62% for integer and 28.28% for floating-point workloads, compared to a non-vectorized baseline.

Nugget: Portable Program Snippets
▲ back to schedule

Zhantong Qiu, Mahyar Samani, Jason Lowe-Power

Evaluating architectural ideas on realistic workloads is increasingly challenging due to the prohibitive cost of detailed simulation and the lack of portable sampling tools. Existing targeted sampling techniques are often tied to specific binaries, incur significant overhead, and make rapid validation across systems infeasible. To address these limitations, we introduce Nugget, a flexible framework that enables portable sampling across simulators, hardware, architectural differences, and libraries. Nugget leverages LLVM IR to perform binary-independent interval analysis, then generates lightweight, cross-platform executable snippets (nuggets), that can be validated natively on real hardware before use in simulation. This approach decouples samples from specific binaries, dramatically reduces analysis overhead, and allows researchers to iterate on sampling methodologies while efficiently validating samples across diverse systems.

Equipping LLVM/OpenMP with Advanced OpenMP GPU Offloading Features
▲ back to schedule

Kevin Sala, Krishna Chaitanya Sankisa, Krzysztof Parzyszek, Michael Klemm

OpenMP is the de facto standard parallel programming model for shared-memory systems. With the introduction of OpenMP 4.0 over a decade ago, the specification was extended to support accelerators, including GPUs, through the target offloading model. This model provides a portable approach to accelerating code regions in C, C++, and Fortran across multiple GPU vendors and other state-of-the-art accelerators. Despite these advantages, some performance overheads and the lack of essential GPU-specific features have limited widespread adoption. Consequently, many HPC developers continue to rely on vendor-specific programming models such as CUDA and HIP to achieve peak application performance. To address these challenges, the OpenMP language committee is actively working on extending the specification to better expose GPU-oriented capabilities. In this talk, we cover several of these upcoming OpenMP features, discuss their work-in-progress implementation in Clang and LLVM/OpenMP, and provide a preliminary performance comparison against native GPU APIs on a few C/C++ benchmarks.

Call for Speakers

We invite speakers from academia and industry to present their work on the following list of topics (including and not limited to:):

  • Improving performance and code-size of applications built by LLVM toolchains
  • Improving performance of LLVM's runtime libraries
  • Improving the security of generated code
  • Any tools or products developed by using one of the libraries in LLVM infrastructure
  • Performance tracking over time
  • Compiler flags, annotations and remarks to understand and improve performance
  • Any other topic related to improving and maintaining the performance and quality of LLVM generated code

While the primary focus of the workshop is on these topics, we welcome any submission related to the LLVM project, its sub-projects (clang, mlir, lldb, Polly, lld, openmp, pstl, compiler-rt, etc.), as well as their use in industry and academia.

We are looking for:

  • keynote speakers (30-60 minutes),
  • technical presentations (25 minutes plus questions and discussion),
  • tutorials (30-60 minutes),
  • panels (30-60 minutes),
  • BOFs (30-60 minutes)

Proposals should provide sufficient information for the review committee to be able to judge the quality of the submission. Proposals can be submitted under the form of an extended abstract, full paper, or slides. Accepted presentations can be presented in-person or online. The presentations will be publicly available on https://llvm.org/devmtg/, and recordings will be available on LLVM's YouTube channel (https://www.youtube.com/channel/UCv2_41bSAa5Y_8BacJUZfjQ)

In case of any queries please reach out to the workshop organizers: Aditya (hiraditya at msn.com), Jose M Monsalve Diaz (josem.monsalvediaz at amd.com), Shilei Tian (i at tianshilei.me), Rafael Andres Herrera Guaitero (rafaelhg at udel.edu), or Kevin Sala (salapenades1 at llnl.gov).

What types of people attend?

  • Active developers of projects in the LLVM Umbrella (LLVM core, Clang, LLDB, libc++, compiler_rt, klee, lld, OpenMP, MLIR etc).
  • Anyone interested in using these as part of another project.
  • Students and Researchers.
  • Compiler, programming language, and runtime enthusiasts.
  • Those interested in using compiler and toolchain technology in novel and interesting ways.

Panels

Panel sessions are guided discussions about a specific topic. The panel consists of ~3 developers who discuss a topic through prepared questions from a moderator. The audience is also given the opportunity to ask questions of the panel.

Birds of a Feather (BoF)

A BoF session, an informal meeting at conferences, where the attendees group together based on a shared interest and carry out discussions without any pre-planned agenda.

Technical Talks

These 25 minute talks cover all topics from core infrastructure talks, to project's using LLVM's infrastructure. Attendees will take away technical information that could be pertinent to their project or general interest.

Tutorials

Tutorials are 30-60 minute sessions that dive down deep into a technical topic. Expect in depth examples and explanations.
Code of Conduct

The LLVM Foundation is dedicated to providing an inclusive and safe experience for everyone. We do not tolerate harassment of participants in any form. By registering for this event, we expect you to have read and agree to the LLVM Code of Conduct. We also adhere to the CGO Code of Conduct.