llvm-offload-binary - LLVM Offload Binary Packager¶
SYNOPSIS¶
llvm-offload-binary [options] [input files…]
DESCRIPTION¶
llvm-offload-binary is a utility for bundling multiple device object files into a single binary container. The resulting binary can then be embedded into the host section table to form a fat binary containing offloading code for different targets. Conversely, it can also extract previously bundled device images from offload binaries.
When extracting images, if no --image filters are specified, all
offload images are automatically extracted with descriptive filenames. When
--image filters are provided, only matching images are extracted.
The tool supports nested OffloadBinary format, where device images can be wrapped in an inner OffloadBinary container. When extracting, the tool automatically detects and unwraps nested OffloadBinary images, making the format transparent to users.
The binary format begins with the magic bytes 0x10FF10AD, followed by a
version and size. Each binary contains its own header, allowing tools to locate
offloading sections even when merged by a linker. Each offload entry includes
metadata such as the device image kind, producer kind, and key-value string
metadata. Multiple offloading images are concatenated to form a fat binary.
EXAMPLE¶
# Package multiple device images into a fat binary:
$ llvm-offload-binary -o out.bin \
--image=file=input.o,triple=nvptx64,arch=sm_70
# Extract all offload images from an executable (no filters):
$ llvm-offload-binary in.bin
# Output:
# Extracted: in-nvptx64-nvidia-cuda-sm_70.0.bc
# Extracted: in-spirv64-intel-unknown.0.spv
# Extract only SPIR-V images using filters:
$ llvm-offload-binary in.bin --image=triple=spirv64-intel
# Output:
# Extracted: in-spirv64-intel-unknown.0.spv
# Extract filtered images to a specific file:
$ llvm-offload-binary in.bin --image=file=output.bc,arch=sm_70
# Extract filtered images to an archive:
$ llvm-offload-binary in.bin --image=file=output.a,triple=nvptx64 --archive
OPTIONS¶
- --archive¶
When extracting from an input binary, write all extracted images into a static archive instead of separate files.
- --image=<<key>=<value>,...>¶
Specify a set of arbitrary key-value arguments describing an image. Commonly used optional keys include
arch(e.g.sm_70for CUDA) andtriple(e.g. nvptx64-nvidia-cuda).When bundling, this option specifies images to include in the output binary. When extracting, this option acts as a filter: only images matching the specified keys are extracted. If no
--imageoptions are provided during extraction, all images are automatically extracted with descriptive filenames.
- -o <file>¶
Write output to <file>. When bundling, this specifies the fat binary filename. When extracting, this specifies the archive or output file destination.
- --help, -h¶
Display available options. Use
--help-hiddento show hidden options.
- --help-list¶
Display a list of all options. Use
--help-list-hiddento show hidden ones.
- --version¶
Display the version of the llvm-offload-binary executable.
- @<FILE>¶
Read command-line options from response file <FILE>.
BINARY FORMAT¶
The binary format is marked by the magic bytes 0x10FF10AD, followed by a
version number. Each created binary contains its own header. This allows tools
to locate offloading sections even after linker operations such as relocatable
linking. Conceptually, this binary format is a serialization of a string map and
an image buffer.
Type |
Identifier |
Description |
|---|---|---|
uint8_t |
magic |
The magic bytes for the binary format (0x10FF10AD) |
uint32_t |
version |
Version of this format (currently version 1) |
uint64_t |
size |
Size of this binary in bytes |
uint64_t |
entry offset |
Absolute offset of the offload entries in bytes |
uint64_t |
entry size |
Size of the offload entries in bytes |
Each offload entry describes a bundled image along with its associated metadata.
Type |
Identifier |
Description |
|---|---|---|
uint16_t |
image kind |
The kind of the device image (e.g. bc, cubin) |
uint16_t |
offload kind |
The producer of the image (e.g. openmp, cuda) |
uint32_t |
flags |
Generic flags for the image |
uint64_t |
string offset |
Absolute offset of the string metadata table |
uint64_t |
num strings |
Number of string entries in the table |
uint64_t |
image offset |
Absolute offset of the device image in bytes |
uint64_t |
image size |
Size of the device image in bytes |
The entry table refers to both a string table and the raw device image itself. The string table provides arbitrary key-value metadata.
Type |
Identifier |
Description |
|---|---|---|
uint64_t |
key offset |
Absolute byte offset of the key in the string table |
uint64_t |
value offset |
Absolute byte offset of the value in the string table |
The string table is a collection of null-terminated strings stored in the image. Offsets allow string entries to be interpreted as key-value pairs, enabling flexible metadata such as architecture or target triple.
The enumerated values for image kind and offload kind are:
Name |
Value |
Description |
|---|---|---|
IMG_None |
0x00 |
No image information provided |
IMG_Object |
0x01 |
The image is a generic object file |
IMG_Bitcode |
0x02 |
The image is an LLVM-IR bitcode file |
IMG_Cubin |
0x03 |
The image is a CUDA object file |
IMG_Fatbinary |
0x04 |
The image is a CUDA fatbinary file |
IMG_PTX |
0x05 |
The image is a CUDA PTX file |
Name |
Value |
Description |
|---|---|---|
OFK_None |
0x00 |
No offloading information provided |
OFK_OpenMP |
0x01 |
The producer was OpenMP offloading |
OFK_CUDA |
0x02 |
The producer was CUDA |
OFK_HIP |
0x03 |
The producer was HIP |
OFK_SYCL |
0x04 |
The producer was SYCL |
COMMON WORKFLOWS¶
Workflow 1: Explore Executable Contents
Extract all embedded offload images to see what’s inside:
$ clang++ -fopenmp -fopenmp-targets=nvptx64,spirv64-intel app.cpp -o myapp
$ llvm-offload-binary myapp
# Output:
# Extracted: myapp-nvptx64-nvidia-cuda-sm_70.0.bc
# Extracted: myapp-spirv64-intel-unknown.1.spv
Workflow 2: Extract Specific Target
Extract only images for a specific target:
$ llvm-offload-binary myapp --image=triple=spirv64-intel
# Output:
# Extracted: myapp-spirv64-intel-unknown.0.spv
Workflow 3: Create Device Image Archive
Extract filtered images into a static archive:
$ llvm-offload-binary myapp --image=file=nvptx.a,triple=nvptx64 --archive
$ ar t nvptx.a
# Shows extracted CUDA images
Workflow 4: Validate SPIR-V
Extract and validate SPIR-V binaries:
$ llvm-offload-binary myapp --image=triple=spirv64-intel
$ spirv-val myapp-spirv64-intel-unknown.0.spv
$ spirv-dis myapp-spirv64-intel-unknown.0.spv -o kernel.spvasm
Workflow 5: Bundle Multiple Targets
Create a fat binary from multiple device images:
$ clang++ -fopenmp -fopenmp-targets=nvptx64 --offload-device-only kernel.cpp -o kernel_nvptx.bc
$ clang++ -fopenmp -fopenmp-targets=spirv64-intel --offload-device-only kernel.cpp -o kernel_spirv.bc
$ llvm-offload-binary -o bundle.bin \
--image=file=kernel_nvptx.bc,triple=nvptx64,arch=sm_70 \
--image=file=kernel_spirv.bc,triple=spirv64-intel
Workflow 6: Extract and Rebundle
Extract images from one binary and rebundle with modifications:
$ llvm-offload-binary old_app
$ llvm-offload-binary -o new_bundle.bin \
--image=file=old_app-nvptx64-nvidia-cuda-sm_70.0.bc,triple=nvptx64,arch=sm_70 \
--image=file=new_kernel.bc,triple=nvptx64,arch=sm_80
