llvm-offload-binary - LLVM Offload Binary Packager¶

SYNOPSIS¶

llvm-offload-binary [options] [input files…]

DESCRIPTION¶

llvm-offload-binary is a utility for bundling multiple device object files into a single binary container. The resulting binary can then be embedded into the host section table to form a fat binary containing offloading code for different targets. Conversely, it can also extract previously bundled device images from offload binaries.

When extracting images, if no --image filters are specified, all offload images are automatically extracted with descriptive filenames. When --image filters are provided, only matching images are extracted.

The tool supports nested OffloadBinary format, where device images can be wrapped in an inner OffloadBinary container. When extracting, the tool automatically detects and unwraps nested OffloadBinary images, making the format transparent to users.

The binary format begins with the magic bytes 0x10FF10AD, followed by a version and size. Each binary contains its own header, allowing tools to locate offloading sections even when merged by a linker. Each offload entry includes metadata such as the device image kind, producer kind, and key-value string metadata. Multiple offloading images are concatenated to form a fat binary.

EXAMPLE¶

# Package multiple device images into a fat binary:
$ llvm-offload-binary -o out.bin \
      --image=file=input.o,triple=nvptx64,arch=sm_70

# Extract all offload images from an executable (no filters):
$ llvm-offload-binary in.bin
# Output:
# Extracted: in-nvptx64-nvidia-cuda-sm_70.0.bc
# Extracted: in-spirv64-intel-unknown.0.spv

# Extract only SPIR-V images using filters:
$ llvm-offload-binary in.bin --image=triple=spirv64-intel
# Output:
# Extracted: in-spirv64-intel-unknown.0.spv

# Extract filtered images to a specific file:
$ llvm-offload-binary in.bin --image=file=output.bc,arch=sm_70

# Extract filtered images to an archive:
$ llvm-offload-binary in.bin --image=file=output.a,triple=nvptx64 --archive

OPTIONS¶

--archive¶: When extracting from an input binary, write all extracted images into a static archive instead of separate files.

--image=<<key>=<value>,...>¶

Specify a set of arbitrary key-value arguments describing an image. Commonly used optional keys include arch (e.g. sm_70 for CUDA) and triple (e.g. nvptx64-nvidia-cuda).

When bundling, this option specifies images to include in the output binary. When extracting, this option acts as a filter: only images matching the specified keys are extracted. If no --image options are provided during extraction, all images are automatically extracted with descriptive filenames.

-o <file>¶: Write output to <file>. When bundling, this specifies the fat binary filename. When extracting, this specifies the archive or output file destination.

--help, -h¶: Display available options. Use --help-hidden to show hidden options.

--help-list¶: Display a list of all options. Use --help-list-hidden to show hidden ones.

--version¶: Display the version of the llvm-offload-binary executable.

@<FILE>¶: Read command-line options from response file <FILE>.

BINARY FORMAT¶

The binary format is marked by the magic bytes 0x10FF10AD, followed by a version number. Each created binary contains its own header. This allows tools to locate offloading sections even after linker operations such as relocatable linking. Conceptually, this binary format is a serialization of a string map and an image buffer.

Table 1 Offloading Binary Header¶
Type	Identifier	Description
uint8_t	magic	The magic bytes for the binary format (0x10FF10AD)
uint32_t	version	Version of this format (currently version 1)
uint64_t	size	Size of this binary in bytes
uint64_t	entry offset	Absolute offset of the offload entries in bytes
uint64_t	entry size	Size of the offload entries in bytes

Each offload entry describes a bundled image along with its associated metadata.

Table 2 Offloading Entry Table¶
Type	Identifier	Description
uint16_t	image kind	The kind of the device image (e.g. bc, cubin)
uint16_t	offload kind	The producer of the image (e.g. openmp, cuda)
uint32_t	flags	Generic flags for the image
uint64_t	string offset	Absolute offset of the string metadata table
uint64_t	num strings	Number of string entries in the table
uint64_t	image offset	Absolute offset of the device image in bytes
uint64_t	image size	Size of the device image in bytes

The entry table refers to both a string table and the raw device image itself. The string table provides arbitrary key-value metadata.

Table 3 Offloading String Entry¶
Type	Identifier	Description
uint64_t	key offset	Absolute byte offset of the key in the string table
uint64_t	value offset	Absolute byte offset of the value in the string table

The string table is a collection of null-terminated strings stored in the image. Offsets allow string entries to be interpreted as key-value pairs, enabling flexible metadata such as architecture or target triple.

The enumerated values for image kind and offload kind are:

Table 4 Image Kind¶
Name	Value	Description
IMG_None	0x00	No image information provided
IMG_Object	0x01	The image is a generic object file
IMG_Bitcode	0x02	The image is an LLVM-IR bitcode file
IMG_Cubin	0x03	The image is a CUDA object file
IMG_Fatbinary	0x04	The image is a CUDA fatbinary file
IMG_PTX	0x05	The image is a CUDA PTX file

Table 5 Offload Kind¶
Name	Value	Description
OFK_None	0x00	No offloading information provided
OFK_OpenMP	0x01	The producer was OpenMP offloading
OFK_CUDA	0x02	The producer was CUDA
OFK_HIP	0x03	The producer was HIP
OFK_SYCL	0x04	The producer was SYCL

COMMON WORKFLOWS¶

Workflow 1: Explore Executable Contents

Extract all embedded offload images to see what’s inside:

$ clang++ -fopenmp -fopenmp-targets=nvptx64,spirv64-intel app.cpp -o myapp
$ llvm-offload-binary myapp
# Output:
# Extracted: myapp-nvptx64-nvidia-cuda-sm_70.0.bc
# Extracted: myapp-spirv64-intel-unknown.1.spv

Workflow 2: Extract Specific Target

Extract only images for a specific target:

$ llvm-offload-binary myapp --image=triple=spirv64-intel
# Output:
# Extracted: myapp-spirv64-intel-unknown.0.spv

Workflow 3: Create Device Image Archive

Extract filtered images into a static archive:

$ llvm-offload-binary myapp --image=file=nvptx.a,triple=nvptx64 --archive
$ ar t nvptx.a
# Shows extracted CUDA images

Workflow 4: Validate SPIR-V

Extract and validate SPIR-V binaries:

$ llvm-offload-binary myapp --image=triple=spirv64-intel
$ spirv-val myapp-spirv64-intel-unknown.0.spv
$ spirv-dis myapp-spirv64-intel-unknown.0.spv -o kernel.spvasm

Workflow 5: Bundle Multiple Targets

Create a fat binary from multiple device images:

$ clang++ -fopenmp -fopenmp-targets=nvptx64 --offload-device-only kernel.cpp -o kernel_nvptx.bc
$ clang++ -fopenmp -fopenmp-targets=spirv64-intel --offload-device-only kernel.cpp -o kernel_spirv.bc
$ llvm-offload-binary -o bundle.bin \
    --image=file=kernel_nvptx.bc,triple=nvptx64,arch=sm_70 \
    --image=file=kernel_spirv.bc,triple=spirv64-intel

Workflow 6: Extract and Rebundle

Extract images from one binary and rebundle with modifications:

$ llvm-offload-binary old_app
$ llvm-offload-binary -o new_bundle.bin \
    --image=file=old_app-nvptx64-nvidia-cuda-sm_70.0.bc,triple=nvptx64,arch=sm_70 \
    --image=file=new_kernel.bc,triple=nvptx64,arch=sm_80

Documentation

Getting Involved

Additional Links

This Page