LLVM 23.0.0git
MemorySanitizer.cpp
Go to the documentation of this file.
1//===- MemorySanitizer.cpp - detector of uninitialized reads --------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9/// \file
10/// This file is a part of MemorySanitizer, a detector of uninitialized
11/// reads.
12///
13/// The algorithm of the tool is similar to Memcheck
14/// (https://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward_html/usenix2005.html)
15/// We associate a few shadow bits with every byte of the application memory,
16/// poison the shadow of the malloc-ed or alloca-ed memory, load the shadow,
17/// bits on every memory read, propagate the shadow bits through some of the
18/// arithmetic instruction (including MOV), store the shadow bits on every
19/// memory write, report a bug on some other instructions (e.g. JMP) if the
20/// associated shadow is poisoned.
21///
22/// But there are differences too. The first and the major one:
23/// compiler instrumentation instead of binary instrumentation. This
24/// gives us much better register allocation, possible compiler
25/// optimizations and a fast start-up. But this brings the major issue
26/// as well: msan needs to see all program events, including system
27/// calls and reads/writes in system libraries, so we either need to
28/// compile *everything* with msan or use a binary translation
29/// component (e.g. DynamoRIO) to instrument pre-built libraries.
30/// Another difference from Memcheck is that we use 8 shadow bits per
31/// byte of application memory and use a direct shadow mapping. This
32/// greatly simplifies the instrumentation code and avoids races on
33/// shadow updates (Memcheck is single-threaded so races are not a
34/// concern there. Memcheck uses 2 shadow bits per byte with a slow
35/// path storage that uses 8 bits per byte).
36///
37/// The default value of shadow is 0, which means "clean" (not poisoned).
38///
39/// Every module initializer should call __msan_init to ensure that the
40/// shadow memory is ready. On error, __msan_warning is called. Since
41/// parameters and return values may be passed via registers, we have a
42/// specialized thread-local shadow for return values
43/// (__msan_retval_tls) and parameters (__msan_param_tls).
44///
45/// Origin tracking.
46///
47/// MemorySanitizer can track origins (allocation points) of all uninitialized
48/// values. This behavior is controlled with a flag (msan-track-origins) and is
49/// disabled by default.
50///
51/// Origins are 4-byte values created and interpreted by the runtime library.
52/// They are stored in a second shadow mapping, one 4-byte value for 4 bytes
53/// of application memory. Propagation of origins is basically a bunch of
54/// "select" instructions that pick the origin of a dirty argument, if an
55/// instruction has one.
56///
57/// Every 4 aligned, consecutive bytes of application memory have one origin
58/// value associated with them. If these bytes contain uninitialized data
59/// coming from 2 different allocations, the last store wins. Because of this,
60/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
61/// practice.
62///
63/// Origins are meaningless for fully initialized values, so MemorySanitizer
64/// avoids storing origin to memory when a fully initialized value is stored.
65/// This way it avoids needless overwriting origin of the 4-byte region on
66/// a short (i.e. 1 byte) clean store, and it is also good for performance.
67///
68/// Atomic handling.
69///
70/// Ideally, every atomic store of application value should update the
71/// corresponding shadow location in an atomic way. Unfortunately, atomic store
72/// of two disjoint locations can not be done without severe slowdown.
73///
74/// Therefore, we implement an approximation that may err on the safe side.
75/// In this implementation, every atomically accessed location in the program
76/// may only change from (partially) uninitialized to fully initialized, but
77/// not the other way around. We load the shadow _after_ the application load,
78/// and we store the shadow _before_ the app store. Also, we always store clean
79/// shadow (if the application store is atomic). This way, if the store-load
80/// pair constitutes a happens-before arc, shadow store and load are correctly
81/// ordered such that the load will get either the value that was stored, or
82/// some later value (which is always clean).
83///
84/// This does not work very well with Compare-And-Swap (CAS) and
85/// Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW
86/// must store the new shadow before the app operation, and load the shadow
87/// after the app operation. Computers don't work this way. Current
88/// implementation ignores the load aspect of CAS/RMW, always returning a clean
89/// value. It implements the store part as a simple atomic store by storing a
90/// clean shadow.
91///
92/// Instrumenting inline assembly.
93///
94/// For inline assembly code LLVM has little idea about which memory locations
95/// become initialized depending on the arguments. It can be possible to figure
96/// out which arguments are meant to point to inputs and outputs, but the
97/// actual semantics can be only visible at runtime. In the Linux kernel it's
98/// also possible that the arguments only indicate the offset for a base taken
99/// from a segment register, so it's dangerous to treat any asm() arguments as
100/// pointers. We take a conservative approach generating calls to
101/// __msan_instrument_asm_store(ptr, size)
102/// , which defer the memory unpoisoning to the runtime library.
103/// The latter can perform more complex address checks to figure out whether
104/// it's safe to touch the shadow memory.
105/// Like with atomic operations, we call __msan_instrument_asm_store() before
106/// the assembly call, so that changes to the shadow memory will be seen by
107/// other threads together with main memory initialization.
108///
109/// KernelMemorySanitizer (KMSAN) implementation.
110///
111/// The major differences between KMSAN and MSan instrumentation are:
112/// - KMSAN always tracks the origins and implies msan-keep-going=true;
113/// - KMSAN allocates shadow and origin memory for each page separately, so
114/// there are no explicit accesses to shadow and origin in the
115/// instrumentation.
116/// Shadow and origin values for a particular X-byte memory location
117/// (X=1,2,4,8) are accessed through pointers obtained via the
118/// __msan_metadata_ptr_for_load_X(ptr)
119/// __msan_metadata_ptr_for_store_X(ptr)
120/// functions. The corresponding functions check that the X-byte accesses
121/// are possible and returns the pointers to shadow and origin memory.
122/// Arbitrary sized accesses are handled with:
123/// __msan_metadata_ptr_for_load_n(ptr, size)
124/// __msan_metadata_ptr_for_store_n(ptr, size);
125/// Note that the sanitizer code has to deal with how shadow/origin pairs
126/// returned by the these functions are represented in different ABIs. In
127/// the X86_64 ABI they are returned in RDX:RAX, in PowerPC64 they are
128/// returned in r3 and r4, and in the SystemZ ABI they are written to memory
129/// pointed to by a hidden parameter.
130/// - TLS variables are stored in a single per-task struct. A call to a
131/// function __msan_get_context_state() returning a pointer to that struct
132/// is inserted into every instrumented function before the entry block;
133/// - __msan_warning() takes a 32-bit origin parameter;
134/// - local variables are poisoned with __msan_poison_alloca() upon function
135/// entry and unpoisoned with __msan_unpoison_alloca() before leaving the
136/// function;
137/// - the pass doesn't declare any global variables or add global constructors
138/// to the translation unit.
139///
140/// Also, KMSAN currently ignores uninitialized memory passed into inline asm
141/// calls, making sure we're on the safe side wrt. possible false positives.
142///
143/// KernelMemorySanitizer only supports X86_64, SystemZ and PowerPC64 at the
144/// moment.
145///
146//
147// FIXME: This sanitizer does not yet handle scalable vectors
148//
149//===----------------------------------------------------------------------===//
150
152#include "llvm/ADT/APInt.h"
153#include "llvm/ADT/ArrayRef.h"
154#include "llvm/ADT/DenseMap.h"
156#include "llvm/ADT/SetVector.h"
157#include "llvm/ADT/SmallPtrSet.h"
158#include "llvm/ADT/SmallVector.h"
160#include "llvm/ADT/StringRef.h"
164#include "llvm/IR/Argument.h"
166#include "llvm/IR/Attributes.h"
167#include "llvm/IR/BasicBlock.h"
168#include "llvm/IR/CallingConv.h"
169#include "llvm/IR/Constant.h"
170#include "llvm/IR/Constants.h"
171#include "llvm/IR/DataLayout.h"
172#include "llvm/IR/DerivedTypes.h"
173#include "llvm/IR/Function.h"
174#include "llvm/IR/GlobalValue.h"
176#include "llvm/IR/IRBuilder.h"
177#include "llvm/IR/InlineAsm.h"
178#include "llvm/IR/InstVisitor.h"
179#include "llvm/IR/InstrTypes.h"
180#include "llvm/IR/Instruction.h"
181#include "llvm/IR/Instructions.h"
183#include "llvm/IR/Intrinsics.h"
184#include "llvm/IR/IntrinsicsAArch64.h"
185#include "llvm/IR/IntrinsicsX86.h"
186#include "llvm/IR/MDBuilder.h"
187#include "llvm/IR/Module.h"
188#include "llvm/IR/Type.h"
189#include "llvm/IR/Value.h"
190#include "llvm/IR/ValueMap.h"
193#include "llvm/Support/Casting.h"
195#include "llvm/Support/Debug.h"
205#include <algorithm>
206#include <cassert>
207#include <cstddef>
208#include <cstdint>
209#include <memory>
210#include <numeric>
211#include <string>
212#include <tuple>
213
214using namespace llvm;
215
216#define DEBUG_TYPE "msan"
217
218DEBUG_COUNTER(DebugInsertCheck, "msan-insert-check",
219 "Controls which checks to insert");
220
221DEBUG_COUNTER(DebugInstrumentInstruction, "msan-instrument-instruction",
222 "Controls which instruction to instrument");
223
224static const unsigned kOriginSize = 4;
227
228// These constants must be kept in sync with the ones in msan.h.
229// TODO: increase size to match SVE/SVE2/SME/SME2 limits
230static const unsigned kParamTLSSize = 800;
231static const unsigned kRetvalTLSSize = 800;
232
233// Accesses sizes are powers of two: 1, 2, 4, 8.
234static const size_t kNumberOfAccessSizes = 4;
235
236/// Track origins of uninitialized values.
237///
238/// Adds a section to MemorySanitizer report that points to the allocation
239/// (stack or heap) the uninitialized bits came from originally.
241 "msan-track-origins",
242 cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden,
243 cl::init(0));
244
245static cl::opt<bool> ClKeepGoing("msan-keep-going",
246 cl::desc("keep going after reporting a UMR"),
247 cl::Hidden, cl::init(false));
248
249static cl::opt<bool>
250 ClPoisonStack("msan-poison-stack",
251 cl::desc("poison uninitialized stack variables"), cl::Hidden,
252 cl::init(true));
253
255 "msan-poison-stack-with-call",
256 cl::desc("poison uninitialized stack variables with a call"), cl::Hidden,
257 cl::init(false));
258
260 "msan-poison-stack-pattern",
261 cl::desc("poison uninitialized stack variables with the given pattern"),
262 cl::Hidden, cl::init(0xff));
263
264static cl::opt<bool>
265 ClPrintStackNames("msan-print-stack-names",
266 cl::desc("Print name of local stack variable"),
267 cl::Hidden, cl::init(true));
268
269static cl::opt<bool>
270 ClPoisonUndef("msan-poison-undef",
271 cl::desc("Poison fully undef temporary values. "
272 "Partially undefined constant vectors "
273 "are unaffected by this flag (see "
274 "-msan-poison-undef-vectors)."),
275 cl::Hidden, cl::init(true));
276
278 "msan-poison-undef-vectors",
279 cl::desc("Precisely poison partially undefined constant vectors. "
280 "If false (legacy behavior), the entire vector is "
281 "considered fully initialized, which may lead to false "
282 "negatives. Fully undefined constant vectors are "
283 "unaffected by this flag (see -msan-poison-undef)."),
284 cl::Hidden, cl::init(false));
285
287 "msan-precise-disjoint-or",
288 cl::desc("Precisely poison disjoint OR. If false (legacy behavior), "
289 "disjointedness is ignored (i.e., 1|1 is initialized)."),
290 cl::Hidden, cl::init(false));
291
292static cl::opt<bool>
293 ClHandleICmp("msan-handle-icmp",
294 cl::desc("propagate shadow through ICmpEQ and ICmpNE"),
295 cl::Hidden, cl::init(true));
296
297static cl::opt<bool>
298 ClHandleICmpExact("msan-handle-icmp-exact",
299 cl::desc("exact handling of relational integer ICmp"),
300 cl::Hidden, cl::init(true));
301
303 "msan-switch-precision",
304 cl::desc("Controls the number of cases considered by MSan for LLVM switch "
305 "instructions. 0 means no UUMs detected. Higher values lead to "
306 "fewer false negatives but may impact compiler and/or "
307 "application performance. N.B. LLVM switch instructions do not "
308 "correspond exactly to C++ switch statements."),
309 cl::Hidden, cl::init(99));
310
312 "msan-handle-lifetime-intrinsics",
313 cl::desc(
314 "when possible, poison scoped variables at the beginning of the scope "
315 "(slower, but more precise)"),
316 cl::Hidden, cl::init(true));
317
318// When compiling the Linux kernel, we sometimes see false positives related to
319// MSan being unable to understand that inline assembly calls may initialize
320// local variables.
321// This flag makes the compiler conservatively unpoison every memory location
322// passed into an assembly call. Note that this may cause false positives.
323// Because it's impossible to figure out the array sizes, we can only unpoison
324// the first sizeof(type) bytes for each type* pointer.
326 "msan-handle-asm-conservative",
327 cl::desc("conservative handling of inline assembly"), cl::Hidden,
328 cl::init(true));
329
330// This flag controls whether we check the shadow of the address
331// operand of load or store. Such bugs are very rare, since load from
332// a garbage address typically results in SEGV, but still happen
333// (e.g. only lower bits of address are garbage, or the access happens
334// early at program startup where malloc-ed memory is more likely to
335// be zeroed. As of 2012-08-28 this flag adds 20% slowdown.
337 "msan-check-access-address",
338 cl::desc("report accesses through a pointer which has poisoned shadow"),
339 cl::Hidden, cl::init(true));
340
342 "msan-eager-checks",
343 cl::desc("check arguments and return values at function call boundaries"),
344 cl::Hidden, cl::init(false));
345
347 "msan-dump-strict-instructions",
348 cl::desc("print out instructions with default strict semantics i.e.,"
349 "check that all the inputs are fully initialized, and mark "
350 "the output as fully initialized. These semantics are applied "
351 "to instructions that could not be handled explicitly nor "
352 "heuristically."),
353 cl::Hidden, cl::init(false));
354
355// Currently, all the heuristically handled instructions are specifically
356// IntrinsicInst. However, we use the broader "HeuristicInstructions" name
357// to parallel 'msan-dump-strict-instructions', and to keep the door open to
358// handling non-intrinsic instructions heuristically.
360 "msan-dump-heuristic-instructions",
361 cl::desc("Prints 'unknown' instructions that were handled heuristically. "
362 "Use -msan-dump-strict-instructions to print instructions that "
363 "could not be handled explicitly nor heuristically."),
364 cl::Hidden, cl::init(false));
365
367 "msan-instrumentation-with-call-threshold",
368 cl::desc(
369 "If the function being instrumented requires more than "
370 "this number of checks and origin stores, use callbacks instead of "
371 "inline checks (-1 means never use callbacks)."),
372 cl::Hidden, cl::init(3500));
373
374static cl::opt<bool>
375 ClEnableKmsan("msan-kernel",
376 cl::desc("Enable KernelMemorySanitizer instrumentation"),
377 cl::Hidden, cl::init(false));
378
379static cl::opt<bool>
380 ClDisableChecks("msan-disable-checks",
381 cl::desc("Apply no_sanitize to the whole file"), cl::Hidden,
382 cl::init(false));
383
384static cl::opt<bool>
385 ClCheckConstantShadow("msan-check-constant-shadow",
386 cl::desc("Insert checks for constant shadow values"),
387 cl::Hidden, cl::init(true));
388
389// This is off by default because of a bug in gold:
390// https://sourceware.org/bugzilla/show_bug.cgi?id=19002
391static cl::opt<bool>
392 ClWithComdat("msan-with-comdat",
393 cl::desc("Place MSan constructors in comdat sections"),
394 cl::Hidden, cl::init(false));
395
396// These options allow to specify custom memory map parameters
397// See MemoryMapParams for details.
398static cl::opt<uint64_t> ClAndMask("msan-and-mask",
399 cl::desc("Define custom MSan AndMask"),
400 cl::Hidden, cl::init(0));
401
402static cl::opt<uint64_t> ClXorMask("msan-xor-mask",
403 cl::desc("Define custom MSan XorMask"),
404 cl::Hidden, cl::init(0));
405
406static cl::opt<uint64_t> ClShadowBase("msan-shadow-base",
407 cl::desc("Define custom MSan ShadowBase"),
408 cl::Hidden, cl::init(0));
409
410static cl::opt<uint64_t> ClOriginBase("msan-origin-base",
411 cl::desc("Define custom MSan OriginBase"),
412 cl::Hidden, cl::init(0));
413
414static cl::opt<int>
415 ClDisambiguateWarning("msan-disambiguate-warning-threshold",
416 cl::desc("Define threshold for number of checks per "
417 "debug location to force origin update."),
418 cl::Hidden, cl::init(3));
419
420const char kMsanModuleCtorName[] = "msan.module_ctor";
421const char kMsanInitName[] = "__msan_init";
422
423namespace {
424
425// Memory map parameters used in application-to-shadow address calculation.
426// Offset = (Addr & ~AndMask) ^ XorMask
427// Shadow = ShadowBase + Offset
428// Origin = OriginBase + Offset
429struct MemoryMapParams {
430 uint64_t AndMask;
431 uint64_t XorMask;
432 uint64_t ShadowBase;
433 uint64_t OriginBase;
434};
435
436struct PlatformMemoryMapParams {
437 const MemoryMapParams *bits32;
438 const MemoryMapParams *bits64;
439};
440
441} // end anonymous namespace
442
443// i386 Linux
444static const MemoryMapParams Linux_I386_MemoryMapParams = {
445 0x000080000000, // AndMask
446 0, // XorMask (not used)
447 0, // ShadowBase (not used)
448 0x000040000000, // OriginBase
449};
450
451// x86_64 Linux
452static const MemoryMapParams Linux_X86_64_MemoryMapParams = {
453 0, // AndMask (not used)
454 0x500000000000, // XorMask
455 0, // ShadowBase (not used)
456 0x100000000000, // OriginBase
457};
458
459// mips32 Linux
460// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
461// after picking good constants
462
463// mips64 Linux
464static const MemoryMapParams Linux_MIPS64_MemoryMapParams = {
465 0, // AndMask (not used)
466 0x008000000000, // XorMask
467 0, // ShadowBase (not used)
468 0x002000000000, // OriginBase
469};
470
471// ppc32 Linux
472// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
473// after picking good constants
474
475// ppc64 Linux
476static const MemoryMapParams Linux_PowerPC64_MemoryMapParams = {
477 0xE00000000000, // AndMask
478 0x100000000000, // XorMask
479 0x080000000000, // ShadowBase
480 0x1C0000000000, // OriginBase
481};
482
483// s390x Linux
484static const MemoryMapParams Linux_S390X_MemoryMapParams = {
485 0xC00000000000, // AndMask
486 0, // XorMask (not used)
487 0x080000000000, // ShadowBase
488 0x1C0000000000, // OriginBase
489};
490
491// arm32 Linux
492// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
493// after picking good constants
494
495// aarch64 Linux
496static const MemoryMapParams Linux_AArch64_MemoryMapParams = {
497 0, // AndMask (not used)
498 0x0B00000000000, // XorMask
499 0, // ShadowBase (not used)
500 0x0200000000000, // OriginBase
501};
502
503// loongarch64 Linux
504static const MemoryMapParams Linux_LoongArch64_MemoryMapParams = {
505 0, // AndMask (not used)
506 0x500000000000, // XorMask
507 0, // ShadowBase (not used)
508 0x100000000000, // OriginBase
509};
510
511// hexagon Linux
512static const MemoryMapParams Linux_Hexagon_MemoryMapParams = {
513 0, // AndMask (not used)
514 0x20000000, // XorMask
515 0, // ShadowBase (not used)
516 0x50000000, // OriginBase
517};
518
519// riscv32 Linux
520// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
521// after picking good constants
522
523// aarch64 FreeBSD
524static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams = {
525 0x1800000000000, // AndMask
526 0x0400000000000, // XorMask
527 0x0200000000000, // ShadowBase
528 0x0700000000000, // OriginBase
529};
530
531// i386 FreeBSD
532static const MemoryMapParams FreeBSD_I386_MemoryMapParams = {
533 0x000180000000, // AndMask
534 0x000040000000, // XorMask
535 0x000020000000, // ShadowBase
536 0x000700000000, // OriginBase
537};
538
539// x86_64 FreeBSD
540static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams = {
541 0xc00000000000, // AndMask
542 0x200000000000, // XorMask
543 0x100000000000, // ShadowBase
544 0x380000000000, // OriginBase
545};
546
547// x86_64 NetBSD
548static const MemoryMapParams NetBSD_X86_64_MemoryMapParams = {
549 0, // AndMask
550 0x500000000000, // XorMask
551 0, // ShadowBase
552 0x100000000000, // OriginBase
553};
554
555static const PlatformMemoryMapParams Linux_X86_MemoryMapParams = {
558};
559
560static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams = {
561 nullptr,
563};
564
565static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams = {
566 nullptr,
568};
569
570static const PlatformMemoryMapParams Linux_S390_MemoryMapParams = {
571 nullptr,
573};
574
575static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = {
576 nullptr,
578};
579
580static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams = {
581 nullptr,
583};
584
585static const PlatformMemoryMapParams Linux_Hexagon_MemoryMapParams_P = {
587 nullptr,
588};
589
590static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams = {
591 nullptr,
593};
594
595static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = {
598};
599
600static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams = {
601 nullptr,
603};
604
606
607namespace {
608
609/// Instrument functions of a module to detect uninitialized reads.
610///
611/// Instantiating MemorySanitizer inserts the msan runtime library API function
612/// declarations into the module if they don't exist already. Instantiating
613/// ensures the __msan_init function is in the list of global constructors for
614/// the module.
615class MemorySanitizer {
616public:
617 MemorySanitizer(Module &M, MemorySanitizerOptions Options)
618 : CompileKernel(Options.Kernel), TrackOrigins(Options.TrackOrigins),
619 Recover(Options.Recover), EagerChecks(Options.EagerChecks) {
620 initializeModule(M);
621 }
622
623 // MSan cannot be moved or copied because of MapParams.
624 MemorySanitizer(MemorySanitizer &&) = delete;
625 MemorySanitizer &operator=(MemorySanitizer &&) = delete;
626 MemorySanitizer(const MemorySanitizer &) = delete;
627 MemorySanitizer &operator=(const MemorySanitizer &) = delete;
628
629 bool sanitizeFunction(Function &F, TargetLibraryInfo &TLI);
630
631private:
632 friend struct MemorySanitizerVisitor;
633 friend struct VarArgHelperBase;
634 friend struct VarArgAMD64Helper;
635 friend struct VarArgAArch64Helper;
636 friend struct VarArgPowerPC64Helper;
637 friend struct VarArgPowerPC32Helper;
638 friend struct VarArgSystemZHelper;
639 friend struct VarArgI386Helper;
640 friend struct VarArgGenericHelper;
641
642 void initializeModule(Module &M);
643 void initializeCallbacks(Module &M, const TargetLibraryInfo &TLI);
644 void createKernelApi(Module &M, const TargetLibraryInfo &TLI);
645 void createUserspaceApi(Module &M, const TargetLibraryInfo &TLI);
646
647 template <typename... ArgsTy>
648 FunctionCallee getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
649 ArgsTy... Args);
650
651 /// True if we're compiling the Linux kernel.
652 bool CompileKernel;
653 /// Track origins (allocation points) of uninitialized values.
654 int TrackOrigins;
655 bool Recover;
656 bool EagerChecks;
657
658 Triple TargetTriple;
659 LLVMContext *C;
660 Type *IntptrTy; ///< Integer type with the size of a ptr in default AS.
661 Type *OriginTy;
662 PointerType *PtrTy; ///< Integer type with the size of a ptr in default AS.
663
664 // XxxTLS variables represent the per-thread state in MSan and per-task state
665 // in KMSAN.
666 // For the userspace these point to thread-local globals. In the kernel land
667 // they point to the members of a per-task struct obtained via a call to
668 // __msan_get_context_state().
669
670 /// Thread-local shadow storage for function parameters.
671 Value *ParamTLS;
672
673 /// Thread-local origin storage for function parameters.
674 Value *ParamOriginTLS;
675
676 /// Thread-local shadow storage for function return value.
677 Value *RetvalTLS;
678
679 /// Thread-local origin storage for function return value.
680 Value *RetvalOriginTLS;
681
682 /// Thread-local shadow storage for in-register va_arg function.
683 Value *VAArgTLS;
684
685 /// Thread-local shadow storage for in-register va_arg function.
686 Value *VAArgOriginTLS;
687
688 /// Thread-local shadow storage for va_arg overflow area.
689 Value *VAArgOverflowSizeTLS;
690
691 /// Are the instrumentation callbacks set up?
692 bool CallbacksInitialized = false;
693
694 /// The run-time callback to print a warning.
695 FunctionCallee WarningFn;
696
697 // These arrays are indexed by log2(AccessSize).
698 FunctionCallee MaybeWarningFn[kNumberOfAccessSizes];
699 FunctionCallee MaybeWarningVarSizeFn;
700 FunctionCallee MaybeStoreOriginFn[kNumberOfAccessSizes];
701
702 /// Run-time helper that generates a new origin value for a stack
703 /// allocation.
704 FunctionCallee MsanSetAllocaOriginWithDescriptionFn;
705 // No description version
706 FunctionCallee MsanSetAllocaOriginNoDescriptionFn;
707
708 /// Run-time helper that poisons stack on function entry.
709 FunctionCallee MsanPoisonStackFn;
710
711 /// Run-time helper that records a store (or any event) of an
712 /// uninitialized value and returns an updated origin id encoding this info.
713 FunctionCallee MsanChainOriginFn;
714
715 /// Run-time helper that paints an origin over a region.
716 FunctionCallee MsanSetOriginFn;
717
718 /// MSan runtime replacements for memmove, memcpy and memset.
719 FunctionCallee MemmoveFn, MemcpyFn, MemsetFn;
720
721 /// KMSAN callback for task-local function argument shadow.
722 StructType *MsanContextStateTy;
723 FunctionCallee MsanGetContextStateFn;
724
725 /// Functions for poisoning/unpoisoning local variables
726 FunctionCallee MsanPoisonAllocaFn, MsanUnpoisonAllocaFn;
727
728 /// Pair of shadow/origin pointers.
729 Type *MsanMetadata;
730
731 /// Each of the MsanMetadataPtrXxx functions returns a MsanMetadata.
732 FunctionCallee MsanMetadataPtrForLoadN, MsanMetadataPtrForStoreN;
733 FunctionCallee MsanMetadataPtrForLoad_1_8[4];
734 FunctionCallee MsanMetadataPtrForStore_1_8[4];
735 FunctionCallee MsanInstrumentAsmStoreFn;
736
737 /// Storage for return values of the MsanMetadataPtrXxx functions.
738 Value *MsanMetadataAlloca;
739
740 /// Helper to choose between different MsanMetadataPtrXxx().
741 FunctionCallee getKmsanShadowOriginAccessFn(bool isStore, int size);
742
743 /// Memory map parameters used in application-to-shadow calculation.
744 const MemoryMapParams *MapParams;
745
746 /// Custom memory map parameters used when -msan-shadow-base or
747 // -msan-origin-base is provided.
748 MemoryMapParams CustomMapParams;
749
750 MDNode *ColdCallWeights;
751
752 /// Branch weights for origin store.
753 MDNode *OriginStoreWeights;
754};
755
756void insertModuleCtor(Module &M) {
759 /*InitArgTypes=*/{},
760 /*InitArgs=*/{},
761 // This callback is invoked when the functions are created the first
762 // time. Hook them into the global ctors list in that case:
763 [&](Function *Ctor, FunctionCallee) {
764 if (!ClWithComdat) {
765 appendToGlobalCtors(M, Ctor, 0);
766 return;
767 }
768 Comdat *MsanCtorComdat = M.getOrInsertComdat(kMsanModuleCtorName);
769 Ctor->setComdat(MsanCtorComdat);
770 appendToGlobalCtors(M, Ctor, 0, Ctor);
771 });
772}
773
774template <class T> T getOptOrDefault(const cl::opt<T> &Opt, T Default) {
775 return (Opt.getNumOccurrences() > 0) ? Opt : Default;
776}
777
778} // end anonymous namespace
779
781 bool EagerChecks)
782 : Kernel(getOptOrDefault(ClEnableKmsan, K)),
783 TrackOrigins(getOptOrDefault(ClTrackOrigins, Kernel ? 2 : TO)),
784 Recover(getOptOrDefault(ClKeepGoing, Kernel || R)),
785 EagerChecks(getOptOrDefault(ClEagerChecks, EagerChecks)) {}
786
789 // Return early if nosanitize_memory module flag is present for the module.
790 if (checkIfAlreadyInstrumented(M, "nosanitize_memory"))
791 return PreservedAnalyses::all();
792 bool Modified = false;
793 if (!Options.Kernel) {
794 insertModuleCtor(M);
795 Modified = true;
796 }
797
798 auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
799 for (Function &F : M) {
800 if (F.empty())
801 continue;
802 MemorySanitizer Msan(*F.getParent(), Options);
803 Modified |=
804 Msan.sanitizeFunction(F, FAM.getResult<TargetLibraryAnalysis>(F));
805 }
806
807 if (!Modified)
808 return PreservedAnalyses::all();
809
811 // GlobalsAA is considered stateless and does not get invalidated unless
812 // explicitly invalidated; PreservedAnalyses::none() is not enough. Sanitizers
813 // make changes that require GlobalsAA to be invalidated.
814 PA.abandon<GlobalsAA>();
815 return PA;
816}
817
819 raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {
821 OS, MapClassName2PassName);
822 OS << '<';
823 if (Options.Recover)
824 OS << "recover;";
825 if (Options.Kernel)
826 OS << "kernel;";
827 if (Options.EagerChecks)
828 OS << "eager-checks;";
829 OS << "track-origins=" << Options.TrackOrigins;
830 OS << '>';
831}
832
833/// Create a non-const global initialized with the given string.
834///
835/// Creates a writable global for Str so that we can pass it to the
836/// run-time lib. Runtime uses first 4 bytes of the string to store the
837/// frame ID, so the string needs to be mutable.
839 StringRef Str) {
840 Constant *StrConst = ConstantDataArray::getString(M.getContext(), Str);
841 return new GlobalVariable(M, StrConst->getType(), /*isConstant=*/true,
842 GlobalValue::PrivateLinkage, StrConst, "");
843}
844
845template <typename... ArgsTy>
847MemorySanitizer::getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
848 ArgsTy... Args) {
849 if (TargetTriple.getArch() == Triple::systemz) {
850 // SystemZ ABI: shadow/origin pair is returned via a hidden parameter.
851 return M.getOrInsertFunction(Name, Type::getVoidTy(*C), PtrTy,
852 std::forward<ArgsTy>(Args)...);
853 }
854
855 return M.getOrInsertFunction(Name, MsanMetadata,
856 std::forward<ArgsTy>(Args)...);
857}
858
859/// Create KMSAN API callbacks.
860void MemorySanitizer::createKernelApi(Module &M, const TargetLibraryInfo &TLI) {
861 IRBuilder<> IRB(*C);
862
863 // These will be initialized in insertKmsanPrologue().
864 RetvalTLS = nullptr;
865 RetvalOriginTLS = nullptr;
866 ParamTLS = nullptr;
867 ParamOriginTLS = nullptr;
868 VAArgTLS = nullptr;
869 VAArgOriginTLS = nullptr;
870 VAArgOverflowSizeTLS = nullptr;
871
872 WarningFn = M.getOrInsertFunction("__msan_warning",
873 TLI.getAttrList(C, {0}, /*Signed=*/false),
874 IRB.getVoidTy(), IRB.getInt32Ty());
875
876 // Requests the per-task context state (kmsan_context_state*) from the
877 // runtime library.
878 MsanContextStateTy = StructType::get(
879 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
880 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8),
881 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
882 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8), /* va_arg_origin */
883 IRB.getInt64Ty(), ArrayType::get(OriginTy, kParamTLSSize / 4), OriginTy,
884 OriginTy);
885 MsanGetContextStateFn =
886 M.getOrInsertFunction("__msan_get_context_state", PtrTy);
887
888 MsanMetadata = StructType::get(PtrTy, PtrTy);
889
890 for (int ind = 0, size = 1; ind < 4; ind++, size <<= 1) {
891 std::string name_load =
892 "__msan_metadata_ptr_for_load_" + std::to_string(size);
893 std::string name_store =
894 "__msan_metadata_ptr_for_store_" + std::to_string(size);
895 MsanMetadataPtrForLoad_1_8[ind] =
896 getOrInsertMsanMetadataFunction(M, name_load, PtrTy);
897 MsanMetadataPtrForStore_1_8[ind] =
898 getOrInsertMsanMetadataFunction(M, name_store, PtrTy);
899 }
900
901 MsanMetadataPtrForLoadN = getOrInsertMsanMetadataFunction(
902 M, "__msan_metadata_ptr_for_load_n", PtrTy, IntptrTy);
903 MsanMetadataPtrForStoreN = getOrInsertMsanMetadataFunction(
904 M, "__msan_metadata_ptr_for_store_n", PtrTy, IntptrTy);
905
906 // Functions for poisoning and unpoisoning memory.
907 MsanPoisonAllocaFn = M.getOrInsertFunction(
908 "__msan_poison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
909 MsanUnpoisonAllocaFn = M.getOrInsertFunction(
910 "__msan_unpoison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy);
911}
912
914 return M.getOrInsertGlobal(Name, Ty, [&] {
915 return new GlobalVariable(M, Ty, false, GlobalVariable::ExternalLinkage,
916 nullptr, Name, nullptr,
918 });
919}
920
921/// Insert declarations for userspace-specific functions and globals.
922void MemorySanitizer::createUserspaceApi(Module &M,
923 const TargetLibraryInfo &TLI) {
924 IRBuilder<> IRB(*C);
925
926 // Create the callback.
927 // FIXME: this function should have "Cold" calling conv,
928 // which is not yet implemented.
929 if (TrackOrigins) {
930 StringRef WarningFnName = Recover ? "__msan_warning_with_origin"
931 : "__msan_warning_with_origin_noreturn";
932 WarningFn = M.getOrInsertFunction(WarningFnName,
933 TLI.getAttrList(C, {0}, /*Signed=*/false),
934 IRB.getVoidTy(), IRB.getInt32Ty());
935 } else {
936 StringRef WarningFnName =
937 Recover ? "__msan_warning" : "__msan_warning_noreturn";
938 WarningFn = M.getOrInsertFunction(WarningFnName, IRB.getVoidTy());
939 }
940
941 // Create the global TLS variables.
942 RetvalTLS =
943 getOrInsertGlobal(M, "__msan_retval_tls",
944 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8));
945
946 RetvalOriginTLS = getOrInsertGlobal(M, "__msan_retval_origin_tls", OriginTy);
947
948 ParamTLS =
949 getOrInsertGlobal(M, "__msan_param_tls",
950 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
951
952 ParamOriginTLS =
953 getOrInsertGlobal(M, "__msan_param_origin_tls",
954 ArrayType::get(OriginTy, kParamTLSSize / 4));
955
956 VAArgTLS =
957 getOrInsertGlobal(M, "__msan_va_arg_tls",
958 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
959
960 VAArgOriginTLS =
961 getOrInsertGlobal(M, "__msan_va_arg_origin_tls",
962 ArrayType::get(OriginTy, kParamTLSSize / 4));
963
964 VAArgOverflowSizeTLS = getOrInsertGlobal(M, "__msan_va_arg_overflow_size_tls",
965 IRB.getIntPtrTy(M.getDataLayout()));
966
967 for (size_t AccessSizeIndex = 0; AccessSizeIndex < kNumberOfAccessSizes;
968 AccessSizeIndex++) {
969 unsigned AccessSize = 1 << AccessSizeIndex;
970 std::string FunctionName = "__msan_maybe_warning_" + itostr(AccessSize);
971 MaybeWarningFn[AccessSizeIndex] = M.getOrInsertFunction(
972 FunctionName, TLI.getAttrList(C, {0, 1}, /*Signed=*/false),
973 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), IRB.getInt32Ty());
974 MaybeWarningVarSizeFn = M.getOrInsertFunction(
975 "__msan_maybe_warning_N", TLI.getAttrList(C, {}, /*Signed=*/false),
976 IRB.getVoidTy(), PtrTy, IRB.getInt64Ty(), IRB.getInt32Ty());
977 FunctionName = "__msan_maybe_store_origin_" + itostr(AccessSize);
978 MaybeStoreOriginFn[AccessSizeIndex] = M.getOrInsertFunction(
979 FunctionName, TLI.getAttrList(C, {0, 2}, /*Signed=*/false),
980 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), PtrTy,
981 IRB.getInt32Ty());
982 }
983
984 MsanSetAllocaOriginWithDescriptionFn =
985 M.getOrInsertFunction("__msan_set_alloca_origin_with_descr",
986 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy, PtrTy);
987 MsanSetAllocaOriginNoDescriptionFn =
988 M.getOrInsertFunction("__msan_set_alloca_origin_no_descr",
989 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
990 MsanPoisonStackFn = M.getOrInsertFunction("__msan_poison_stack",
991 IRB.getVoidTy(), PtrTy, IntptrTy);
992}
993
994/// Insert extern declaration of runtime-provided functions and globals.
995void MemorySanitizer::initializeCallbacks(Module &M,
996 const TargetLibraryInfo &TLI) {
997 // Only do this once.
998 if (CallbacksInitialized)
999 return;
1000
1001 IRBuilder<> IRB(*C);
1002 // Initialize callbacks that are common for kernel and userspace
1003 // instrumentation.
1004 MsanChainOriginFn = M.getOrInsertFunction(
1005 "__msan_chain_origin",
1006 TLI.getAttrList(C, {0}, /*Signed=*/false, /*Ret=*/true), IRB.getInt32Ty(),
1007 IRB.getInt32Ty());
1008 MsanSetOriginFn = M.getOrInsertFunction(
1009 "__msan_set_origin", TLI.getAttrList(C, {2}, /*Signed=*/false),
1010 IRB.getVoidTy(), PtrTy, IntptrTy, IRB.getInt32Ty());
1011 MemmoveFn =
1012 M.getOrInsertFunction("__msan_memmove", PtrTy, PtrTy, PtrTy, IntptrTy);
1013 MemcpyFn =
1014 M.getOrInsertFunction("__msan_memcpy", PtrTy, PtrTy, PtrTy, IntptrTy);
1015 MemsetFn = M.getOrInsertFunction("__msan_memset",
1016 TLI.getAttrList(C, {1}, /*Signed=*/true),
1017 PtrTy, PtrTy, IRB.getInt32Ty(), IntptrTy);
1018
1019 MsanInstrumentAsmStoreFn = M.getOrInsertFunction(
1020 "__msan_instrument_asm_store", IRB.getVoidTy(), PtrTy, IntptrTy);
1021
1022 if (CompileKernel) {
1023 createKernelApi(M, TLI);
1024 } else {
1025 createUserspaceApi(M, TLI);
1026 }
1027 CallbacksInitialized = true;
1028}
1029
1030FunctionCallee MemorySanitizer::getKmsanShadowOriginAccessFn(bool isStore,
1031 int size) {
1032 FunctionCallee *Fns =
1033 isStore ? MsanMetadataPtrForStore_1_8 : MsanMetadataPtrForLoad_1_8;
1034 switch (size) {
1035 case 1:
1036 return Fns[0];
1037 case 2:
1038 return Fns[1];
1039 case 4:
1040 return Fns[2];
1041 case 8:
1042 return Fns[3];
1043 default:
1044 return nullptr;
1045 }
1046}
1047
1048/// Module-level initialization.
1049///
1050/// inserts a call to __msan_init to the module's constructor list.
1051void MemorySanitizer::initializeModule(Module &M) {
1052 auto &DL = M.getDataLayout();
1053
1054 TargetTriple = M.getTargetTriple();
1055
1056 bool ShadowPassed = ClShadowBase.getNumOccurrences() > 0;
1057 bool OriginPassed = ClOriginBase.getNumOccurrences() > 0;
1058 // Check the overrides first
1059 if (ShadowPassed || OriginPassed) {
1060 CustomMapParams.AndMask = ClAndMask;
1061 CustomMapParams.XorMask = ClXorMask;
1062 CustomMapParams.ShadowBase = ClShadowBase;
1063 CustomMapParams.OriginBase = ClOriginBase;
1064 MapParams = &CustomMapParams;
1065 } else {
1066 switch (TargetTriple.getOS()) {
1067 case Triple::FreeBSD:
1068 switch (TargetTriple.getArch()) {
1069 case Triple::aarch64:
1070 MapParams = FreeBSD_ARM_MemoryMapParams.bits64;
1071 break;
1072 case Triple::x86_64:
1073 MapParams = FreeBSD_X86_MemoryMapParams.bits64;
1074 break;
1075 case Triple::x86:
1076 MapParams = FreeBSD_X86_MemoryMapParams.bits32;
1077 break;
1078 default:
1079 report_fatal_error("unsupported architecture");
1080 }
1081 break;
1082 case Triple::NetBSD:
1083 switch (TargetTriple.getArch()) {
1084 case Triple::x86_64:
1085 MapParams = NetBSD_X86_MemoryMapParams.bits64;
1086 break;
1087 default:
1088 report_fatal_error("unsupported architecture");
1089 }
1090 break;
1091 case Triple::Linux:
1092 switch (TargetTriple.getArch()) {
1093 case Triple::x86_64:
1094 MapParams = Linux_X86_MemoryMapParams.bits64;
1095 break;
1096 case Triple::x86:
1097 MapParams = Linux_X86_MemoryMapParams.bits32;
1098 break;
1099 case Triple::mips64:
1100 case Triple::mips64el:
1101 MapParams = Linux_MIPS_MemoryMapParams.bits64;
1102 break;
1103 case Triple::ppc64:
1104 case Triple::ppc64le:
1105 MapParams = Linux_PowerPC_MemoryMapParams.bits64;
1106 break;
1107 case Triple::systemz:
1108 MapParams = Linux_S390_MemoryMapParams.bits64;
1109 break;
1110 case Triple::aarch64:
1111 case Triple::aarch64_be:
1112 MapParams = Linux_ARM_MemoryMapParams.bits64;
1113 break;
1115 MapParams = Linux_LoongArch_MemoryMapParams.bits64;
1116 break;
1117 case Triple::hexagon:
1118 MapParams = Linux_Hexagon_MemoryMapParams_P.bits32;
1119 break;
1120 default:
1121 report_fatal_error("unsupported architecture");
1122 }
1123 break;
1124 default:
1125 report_fatal_error("unsupported operating system");
1126 }
1127 }
1128
1129 C = &(M.getContext());
1130 IRBuilder<> IRB(*C);
1131 IntptrTy = IRB.getIntPtrTy(DL);
1132 OriginTy = IRB.getInt32Ty();
1133 PtrTy = IRB.getPtrTy();
1134
1135 ColdCallWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1136 OriginStoreWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1137
1138 if (!CompileKernel) {
1139 if (TrackOrigins)
1140 M.getOrInsertGlobal("__msan_track_origins", IRB.getInt32Ty(), [&] {
1141 return new GlobalVariable(
1142 M, IRB.getInt32Ty(), true, GlobalValue::WeakODRLinkage,
1143 IRB.getInt32(TrackOrigins), "__msan_track_origins");
1144 });
1145
1146 if (Recover)
1147 M.getOrInsertGlobal("__msan_keep_going", IRB.getInt32Ty(), [&] {
1148 return new GlobalVariable(M, IRB.getInt32Ty(), true,
1149 GlobalValue::WeakODRLinkage,
1150 IRB.getInt32(Recover), "__msan_keep_going");
1151 });
1152 }
1153}
1154
1155namespace {
1156
1157/// A helper class that handles instrumentation of VarArg
1158/// functions on a particular platform.
1159///
1160/// Implementations are expected to insert the instrumentation
1161/// necessary to propagate argument shadow through VarArg function
1162/// calls. Visit* methods are called during an InstVisitor pass over
1163/// the function, and should avoid creating new basic blocks. A new
1164/// instance of this class is created for each instrumented function.
1165struct VarArgHelper {
1166 virtual ~VarArgHelper() = default;
1167
1168 /// Visit a CallBase.
1169 virtual void visitCallBase(CallBase &CB, IRBuilder<> &IRB) = 0;
1170
1171 /// Visit a va_start call.
1172 virtual void visitVAStartInst(VAStartInst &I) = 0;
1173
1174 /// Visit a va_copy call.
1175 virtual void visitVACopyInst(VACopyInst &I) = 0;
1176
1177 /// Finalize function instrumentation.
1178 ///
1179 /// This method is called after visiting all interesting (see above)
1180 /// instructions in a function.
1181 virtual void finalizeInstrumentation() = 0;
1182};
1183
1184struct MemorySanitizerVisitor;
1185
1186} // end anonymous namespace
1187
1188static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
1189 MemorySanitizerVisitor &Visitor);
1190
1191static unsigned TypeSizeToSizeIndex(TypeSize TS) {
1192 if (TS.isScalable())
1193 // Scalable types unconditionally take slowpaths.
1194 return kNumberOfAccessSizes;
1195 unsigned TypeSizeFixed = TS.getFixedValue();
1196 if (TypeSizeFixed <= 8)
1197 return 0;
1198 return Log2_32_Ceil((TypeSizeFixed + 7) / 8);
1199}
1200
1201namespace {
1202
1203/// Helper class to attach debug information of the given instruction onto new
1204/// instructions inserted after.
1205class NextNodeIRBuilder : public IRBuilder<> {
1206public:
1207 explicit NextNodeIRBuilder(Instruction *IP) : IRBuilder<>(IP->getNextNode()) {
1208 SetCurrentDebugLocation(IP->getDebugLoc());
1209 }
1210};
1211
1212/// This class does all the work for a given function. Store and Load
1213/// instructions store and load corresponding shadow and origin
1214/// values. Most instructions propagate shadow from arguments to their
1215/// return values. Certain instructions (most importantly, BranchInst)
1216/// test their argument shadow and print reports (with a runtime call) if it's
1217/// non-zero.
1218struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
1219 Function &F;
1220 MemorySanitizer &MS;
1221 SmallVector<PHINode *, 16> ShadowPHINodes, OriginPHINodes;
1222 ValueMap<Value *, Value *> ShadowMap, OriginMap;
1223 std::unique_ptr<VarArgHelper> VAHelper;
1224 const TargetLibraryInfo *TLI;
1225 Instruction *FnPrologueEnd;
1226 SmallVector<Instruction *, 16> Instructions;
1227
1228 // The following flags disable parts of MSan instrumentation based on
1229 // exclusion list contents and command-line options.
1230 bool InsertChecks;
1231 bool PropagateShadow;
1232 bool PoisonStack;
1233 bool PoisonUndef;
1234 bool PoisonUndefVectors;
1235
1236 struct ShadowOriginAndInsertPoint {
1237 Value *Shadow;
1238 Value *Origin;
1239 Instruction *OrigIns;
1240
1241 ShadowOriginAndInsertPoint(Value *S, Value *O, Instruction *I)
1242 : Shadow(S), Origin(O), OrigIns(I) {}
1243 };
1245 DenseMap<const DILocation *, int> LazyWarningDebugLocationCount;
1246 SmallSetVector<AllocaInst *, 16> AllocaSet;
1249 int64_t SplittableBlocksCount = 0;
1250
1251 MemorySanitizerVisitor(Function &F, MemorySanitizer &MS,
1252 const TargetLibraryInfo &TLI)
1253 : F(F), MS(MS), VAHelper(CreateVarArgHelper(F, MS, *this)), TLI(&TLI) {
1254 bool SanitizeFunction =
1255 F.hasFnAttribute(Attribute::SanitizeMemory) && !ClDisableChecks;
1256 InsertChecks = SanitizeFunction;
1257 PropagateShadow = SanitizeFunction;
1258 PoisonStack = SanitizeFunction && ClPoisonStack;
1259 PoisonUndef = SanitizeFunction && ClPoisonUndef;
1260 PoisonUndefVectors = SanitizeFunction && ClPoisonUndefVectors;
1261
1262 // In the presence of unreachable blocks, we may see Phi nodes with
1263 // incoming nodes from such blocks. Since InstVisitor skips unreachable
1264 // blocks, such nodes will not have any shadow value associated with them.
1265 // It's easier to remove unreachable blocks than deal with missing shadow.
1267
1268 MS.initializeCallbacks(*F.getParent(), TLI);
1269 FnPrologueEnd =
1270 IRBuilder<>(&F.getEntryBlock(), F.getEntryBlock().getFirstNonPHIIt())
1271 .CreateIntrinsic(Intrinsic::donothing, {});
1272
1273 if (MS.CompileKernel) {
1274 IRBuilder<> IRB(FnPrologueEnd);
1275 insertKmsanPrologue(IRB);
1276 }
1277
1278 LLVM_DEBUG(if (!InsertChecks) dbgs()
1279 << "MemorySanitizer is not inserting checks into '"
1280 << F.getName() << "'\n");
1281 }
1282
1283 bool instrumentWithCalls(Value *V) {
1284 // Constants likely will be eliminated by follow-up passes.
1285 if (isa<Constant>(V))
1286 return false;
1287 ++SplittableBlocksCount;
1289 SplittableBlocksCount > ClInstrumentationWithCallThreshold;
1290 }
1291
1292 bool isInPrologue(Instruction &I) {
1293 return I.getParent() == FnPrologueEnd->getParent() &&
1294 (&I == FnPrologueEnd || I.comesBefore(FnPrologueEnd));
1295 }
1296
1297 // Creates a new origin and records the stack trace. In general we can call
1298 // this function for any origin manipulation we like. However it will cost
1299 // runtime resources. So use this wisely only if it can provide additional
1300 // information helpful to a user.
1301 Value *updateOrigin(Value *V, IRBuilder<> &IRB) {
1302 if (MS.TrackOrigins <= 1)
1303 return V;
1304 return IRB.CreateCall(MS.MsanChainOriginFn, V);
1305 }
1306
1307 Value *originToIntptr(IRBuilder<> &IRB, Value *Origin) {
1308 const DataLayout &DL = F.getDataLayout();
1309 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1310 if (IntptrSize == kOriginSize)
1311 return Origin;
1312 assert(IntptrSize == kOriginSize * 2);
1313 Origin = IRB.CreateIntCast(Origin, MS.IntptrTy, /* isSigned */ false);
1314 return IRB.CreateOr(Origin, IRB.CreateShl(Origin, kOriginSize * 8));
1315 }
1316
1317 /// Fill memory range with the given origin value.
1318 void paintOrigin(IRBuilder<> &IRB, Value *Origin, Value *OriginPtr,
1319 TypeSize TS, Align Alignment) {
1320 const DataLayout &DL = F.getDataLayout();
1321 const Align IntptrAlignment = DL.getABITypeAlign(MS.IntptrTy);
1322 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1323 assert(IntptrAlignment >= kMinOriginAlignment);
1324 assert(IntptrSize >= kOriginSize);
1325
1326 // Note: The loop based formation works for fixed length vectors too,
1327 // however we prefer to unroll and specialize alignment below.
1328 if (TS.isScalable()) {
1329 Value *Size = IRB.CreateTypeSize(MS.IntptrTy, TS);
1330 Value *RoundUp =
1331 IRB.CreateAdd(Size, ConstantInt::get(MS.IntptrTy, kOriginSize - 1));
1332 Value *End =
1333 IRB.CreateUDiv(RoundUp, ConstantInt::get(MS.IntptrTy, kOriginSize));
1334 auto [InsertPt, Index] =
1336 IRB.SetInsertPoint(InsertPt);
1337
1338 Value *GEP = IRB.CreateGEP(MS.OriginTy, OriginPtr, Index);
1340 return;
1341 }
1342
1343 unsigned Size = TS.getFixedValue();
1344
1345 unsigned Ofs = 0;
1346 Align CurrentAlignment = Alignment;
1347 if (Alignment >= IntptrAlignment && IntptrSize > kOriginSize) {
1348 Value *IntptrOrigin = originToIntptr(IRB, Origin);
1349 Value *IntptrOriginPtr = IRB.CreatePointerCast(OriginPtr, MS.PtrTy);
1350 for (unsigned i = 0; i < Size / IntptrSize; ++i) {
1351 Value *Ptr = i ? IRB.CreateConstGEP1_32(MS.IntptrTy, IntptrOriginPtr, i)
1352 : IntptrOriginPtr;
1353 IRB.CreateAlignedStore(IntptrOrigin, Ptr, CurrentAlignment);
1354 Ofs += IntptrSize / kOriginSize;
1355 CurrentAlignment = IntptrAlignment;
1356 }
1357 }
1358
1359 for (unsigned i = Ofs; i < (Size + kOriginSize - 1) / kOriginSize; ++i) {
1360 Value *GEP =
1361 i ? IRB.CreateConstGEP1_32(MS.OriginTy, OriginPtr, i) : OriginPtr;
1362 IRB.CreateAlignedStore(Origin, GEP, CurrentAlignment);
1363 CurrentAlignment = kMinOriginAlignment;
1364 }
1365 }
1366
1367 void storeOrigin(IRBuilder<> &IRB, Value *Addr, Value *Shadow, Value *Origin,
1368 Value *OriginPtr, Align Alignment) {
1369 const DataLayout &DL = F.getDataLayout();
1370 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1371 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
1372 // ZExt cannot convert between vector and scalar
1373 Value *ConvertedShadow = convertShadowToScalar(Shadow, IRB);
1374 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1375 if (!ClCheckConstantShadow || ConstantShadow->isNullValue()) {
1376 // Origin is not needed: value is initialized or const shadow is
1377 // ignored.
1378 return;
1379 }
1380 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1381 // Copy origin as the value is definitely uninitialized.
1382 paintOrigin(IRB, updateOrigin(Origin, IRB), OriginPtr, StoreSize,
1383 OriginAlignment);
1384 return;
1385 }
1386 // Fallback to runtime check, which still can be optimized out later.
1387 }
1388
1389 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1390 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1391 if (instrumentWithCalls(ConvertedShadow) &&
1392 SizeIndex < kNumberOfAccessSizes && !MS.CompileKernel) {
1393 FunctionCallee Fn = MS.MaybeStoreOriginFn[SizeIndex];
1394 Value *ConvertedShadow2 =
1395 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1396 CallBase *CB = IRB.CreateCall(Fn, {ConvertedShadow2, Addr, Origin});
1397 CB->addParamAttr(0, Attribute::ZExt);
1398 CB->addParamAttr(2, Attribute::ZExt);
1399 } else {
1400 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1402 Cmp, &*IRB.GetInsertPoint(), false, MS.OriginStoreWeights);
1403 IRBuilder<> IRBNew(CheckTerm);
1404 paintOrigin(IRBNew, updateOrigin(Origin, IRBNew), OriginPtr, StoreSize,
1405 OriginAlignment);
1406 }
1407 }
1408
1409 void materializeStores() {
1410 for (StoreInst *SI : StoreList) {
1411 IRBuilder<> IRB(SI);
1412 Value *Val = SI->getValueOperand();
1413 Value *Addr = SI->getPointerOperand();
1414 Value *Shadow = SI->isAtomic() ? getCleanShadow(Val) : getShadow(Val);
1415 Value *ShadowPtr, *OriginPtr;
1416 Type *ShadowTy = Shadow->getType();
1417 const Align Alignment = SI->getAlign();
1418 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1419 std::tie(ShadowPtr, OriginPtr) =
1420 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ true);
1421
1422 [[maybe_unused]] StoreInst *NewSI =
1423 IRB.CreateAlignedStore(Shadow, ShadowPtr, Alignment);
1424 LLVM_DEBUG(dbgs() << " STORE: " << *NewSI << "\n");
1425
1426 if (SI->isAtomic())
1427 SI->setOrdering(addReleaseOrdering(SI->getOrdering()));
1428
1429 if (MS.TrackOrigins && !SI->isAtomic())
1430 storeOrigin(IRB, Addr, Shadow, getOrigin(Val), OriginPtr,
1431 OriginAlignment);
1432 }
1433 }
1434
1435 // Returns true if Debug Location corresponds to multiple warnings.
1436 bool shouldDisambiguateWarningLocation(const DebugLoc &DebugLoc) {
1437 if (MS.TrackOrigins < 2)
1438 return false;
1439
1440 if (LazyWarningDebugLocationCount.empty())
1441 for (const auto &I : InstrumentationList)
1442 ++LazyWarningDebugLocationCount[I.OrigIns->getDebugLoc()];
1443
1444 return LazyWarningDebugLocationCount[DebugLoc] >= ClDisambiguateWarning;
1445 }
1446
1447 /// Helper function to insert a warning at IRB's current insert point.
1448 void insertWarningFn(IRBuilder<> &IRB, Value *Origin) {
1449 if (!Origin)
1450 Origin = (Value *)IRB.getInt32(0);
1451 assert(Origin->getType()->isIntegerTy());
1452
1453 if (shouldDisambiguateWarningLocation(IRB.getCurrentDebugLocation())) {
1454 // Try to create additional origin with debug info of the last origin
1455 // instruction. It may provide additional information to the user.
1456 if (Instruction *OI = dyn_cast_or_null<Instruction>(Origin)) {
1457 assert(MS.TrackOrigins);
1458 auto NewDebugLoc = OI->getDebugLoc();
1459 // Origin update with missing or the same debug location provides no
1460 // additional value.
1461 if (NewDebugLoc && NewDebugLoc != IRB.getCurrentDebugLocation()) {
1462 // Insert update just before the check, so we call runtime only just
1463 // before the report.
1464 IRBuilder<> IRBOrigin(&*IRB.GetInsertPoint());
1465 IRBOrigin.SetCurrentDebugLocation(NewDebugLoc);
1466 Origin = updateOrigin(Origin, IRBOrigin);
1467 }
1468 }
1469 }
1470
1471 if (MS.CompileKernel || MS.TrackOrigins)
1472 IRB.CreateCall(MS.WarningFn, Origin)->setCannotMerge();
1473 else
1474 IRB.CreateCall(MS.WarningFn)->setCannotMerge();
1475 // FIXME: Insert UnreachableInst if !MS.Recover?
1476 // This may invalidate some of the following checks and needs to be done
1477 // at the very end.
1478 }
1479
1480 void materializeOneCheck(IRBuilder<> &IRB, Value *ConvertedShadow,
1481 Value *Origin) {
1482 const DataLayout &DL = F.getDataLayout();
1483 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1484 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1485 if (instrumentWithCalls(ConvertedShadow) && !MS.CompileKernel) {
1486 // ZExt cannot convert between vector and scalar
1487 ConvertedShadow = convertShadowToScalar(ConvertedShadow, IRB);
1488 Value *ConvertedShadow2 =
1489 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1490
1491 if (SizeIndex < kNumberOfAccessSizes) {
1492 FunctionCallee Fn = MS.MaybeWarningFn[SizeIndex];
1493 CallBase *CB = IRB.CreateCall(
1494 Fn,
1495 {ConvertedShadow2,
1496 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1497 CB->addParamAttr(0, Attribute::ZExt);
1498 CB->addParamAttr(1, Attribute::ZExt);
1499 } else {
1500 FunctionCallee Fn = MS.MaybeWarningVarSizeFn;
1501 Value *ShadowAlloca = IRB.CreateAlloca(ConvertedShadow2->getType(), 0u);
1502 IRB.CreateStore(ConvertedShadow2, ShadowAlloca);
1503 unsigned ShadowSize = DL.getTypeAllocSize(ConvertedShadow2->getType());
1504 CallBase *CB = IRB.CreateCall(
1505 Fn,
1506 {ShadowAlloca, ConstantInt::get(IRB.getInt64Ty(), ShadowSize),
1507 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1508 CB->addParamAttr(1, Attribute::ZExt);
1509 CB->addParamAttr(2, Attribute::ZExt);
1510 }
1511 } else {
1512 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1514 Cmp, &*IRB.GetInsertPoint(),
1515 /* Unreachable */ !MS.Recover, MS.ColdCallWeights);
1516
1517 IRB.SetInsertPoint(CheckTerm);
1518 insertWarningFn(IRB, Origin);
1519 LLVM_DEBUG(dbgs() << " CHECK: " << *Cmp << "\n");
1520 }
1521 }
1522
1523 void materializeInstructionChecks(
1524 ArrayRef<ShadowOriginAndInsertPoint> InstructionChecks) {
1525 const DataLayout &DL = F.getDataLayout();
1526 // Disable combining in some cases. TrackOrigins checks each shadow to pick
1527 // correct origin.
1528 bool Combine = !MS.TrackOrigins;
1529 Instruction *Instruction = InstructionChecks.front().OrigIns;
1530 Value *Shadow = nullptr;
1531 for (const auto &ShadowData : InstructionChecks) {
1532 assert(ShadowData.OrigIns == Instruction);
1533 IRBuilder<> IRB(Instruction);
1534
1535 Value *ConvertedShadow = ShadowData.Shadow;
1536
1537 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1538 if (!ClCheckConstantShadow || ConstantShadow->isNullValue()) {
1539 // Skip, value is initialized or const shadow is ignored.
1540 continue;
1541 }
1542 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1543 // Report as the value is definitely uninitialized.
1544 insertWarningFn(IRB, ShadowData.Origin);
1545 if (!MS.Recover)
1546 return; // Always fail and stop here, not need to check the rest.
1547 // Skip entire instruction,
1548 continue;
1549 }
1550 // Fallback to runtime check, which still can be optimized out later.
1551 }
1552
1553 if (!Combine) {
1554 materializeOneCheck(IRB, ConvertedShadow, ShadowData.Origin);
1555 continue;
1556 }
1557
1558 if (!Shadow) {
1559 Shadow = ConvertedShadow;
1560 continue;
1561 }
1562
1563 Shadow = convertToBool(Shadow, IRB, "_mscmp");
1564 ConvertedShadow = convertToBool(ConvertedShadow, IRB, "_mscmp");
1565 Shadow = IRB.CreateOr(Shadow, ConvertedShadow, "_msor");
1566 }
1567
1568 if (Shadow) {
1569 assert(Combine);
1570 IRBuilder<> IRB(Instruction);
1571 materializeOneCheck(IRB, Shadow, nullptr);
1572 }
1573 }
1574
1575 static bool isAArch64SVCount(Type *Ty) {
1576 if (TargetExtType *TTy = dyn_cast<TargetExtType>(Ty))
1577 return TTy->getName() == "aarch64.svcount";
1578 return false;
1579 }
1580
1581 // This is intended to match the "AArch64 Predicate-as-Counter Type" (aka
1582 // 'target("aarch64.svcount")', but not e.g., <vscale x 4 x i32>.
1583 static bool isScalableNonVectorType(Type *Ty) {
1584 if (!isAArch64SVCount(Ty))
1585 LLVM_DEBUG(dbgs() << "isScalableNonVectorType: Unexpected type " << *Ty
1586 << "\n");
1587
1588 return Ty->isScalableTy() && !isa<VectorType>(Ty);
1589 }
1590
1591 void materializeChecks() {
1592#ifndef NDEBUG
1593 // For assert below.
1594 SmallPtrSet<Instruction *, 16> Done;
1595#endif
1596
1597 for (auto I = InstrumentationList.begin();
1598 I != InstrumentationList.end();) {
1599 auto OrigIns = I->OrigIns;
1600 // Checks are grouped by the original instruction. We call all
1601 // `insertShadowCheck` for an instruction at once.
1602 assert(Done.insert(OrigIns).second);
1603 auto J = std::find_if(I + 1, InstrumentationList.end(),
1604 [OrigIns](const ShadowOriginAndInsertPoint &R) {
1605 return OrigIns != R.OrigIns;
1606 });
1607 // Process all checks of instruction at once.
1608 materializeInstructionChecks(ArrayRef<ShadowOriginAndInsertPoint>(I, J));
1609 I = J;
1610 }
1611
1612 LLVM_DEBUG(dbgs() << "DONE:\n" << F);
1613 }
1614
1615 // Returns the last instruction in the new prologue
1616 void insertKmsanPrologue(IRBuilder<> &IRB) {
1617 Value *ContextState = IRB.CreateCall(MS.MsanGetContextStateFn, {});
1618 Constant *Zero = IRB.getInt32(0);
1619 MS.ParamTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1620 {Zero, IRB.getInt32(0)}, "param_shadow");
1621 MS.RetvalTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1622 {Zero, IRB.getInt32(1)}, "retval_shadow");
1623 MS.VAArgTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1624 {Zero, IRB.getInt32(2)}, "va_arg_shadow");
1625 MS.VAArgOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1626 {Zero, IRB.getInt32(3)}, "va_arg_origin");
1627 MS.VAArgOverflowSizeTLS =
1628 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1629 {Zero, IRB.getInt32(4)}, "va_arg_overflow_size");
1630 MS.ParamOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1631 {Zero, IRB.getInt32(5)}, "param_origin");
1632 MS.RetvalOriginTLS =
1633 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1634 {Zero, IRB.getInt32(6)}, "retval_origin");
1635 if (MS.TargetTriple.getArch() == Triple::systemz)
1636 MS.MsanMetadataAlloca = IRB.CreateAlloca(MS.MsanMetadata, 0u);
1637 }
1638
1639 /// Add MemorySanitizer instrumentation to a function.
1640 bool runOnFunction() {
1641 // Iterate all BBs in depth-first order and create shadow instructions
1642 // for all instructions (where applicable).
1643 // For PHI nodes we create dummy shadow PHIs which will be finalized later.
1644 for (BasicBlock *BB : depth_first(FnPrologueEnd->getParent()))
1645 visit(*BB);
1646
1647 // `visit` above only collects instructions. Process them after iterating
1648 // CFG to avoid requirement on CFG transformations.
1649 for (Instruction *I : Instructions)
1651
1652 // Finalize PHI nodes.
1653 for (PHINode *PN : ShadowPHINodes) {
1654 PHINode *PNS = cast<PHINode>(getShadow(PN));
1655 PHINode *PNO = MS.TrackOrigins ? cast<PHINode>(getOrigin(PN)) : nullptr;
1656 size_t NumValues = PN->getNumIncomingValues();
1657 for (size_t v = 0; v < NumValues; v++) {
1658 PNS->addIncoming(getShadow(PN, v), PN->getIncomingBlock(v));
1659 if (PNO)
1660 PNO->addIncoming(getOrigin(PN, v), PN->getIncomingBlock(v));
1661 }
1662 }
1663
1664 VAHelper->finalizeInstrumentation();
1665
1666 // Poison llvm.lifetime.start intrinsics, if we haven't fallen back to
1667 // instrumenting only allocas.
1669 for (auto Item : LifetimeStartList) {
1670 instrumentAlloca(*Item.second, Item.first);
1671 AllocaSet.remove(Item.second);
1672 }
1673 }
1674 // Poison the allocas for which we didn't instrument the corresponding
1675 // lifetime intrinsics.
1676 for (AllocaInst *AI : AllocaSet)
1677 instrumentAlloca(*AI);
1678
1679 // Insert shadow value checks.
1680 materializeChecks();
1681
1682 // Delayed instrumentation of StoreInst.
1683 // This may not add new address checks.
1684 materializeStores();
1685
1686 return true;
1687 }
1688
1689 /// Compute the shadow type that corresponds to a given Value.
1690 Type *getShadowTy(Value *V) { return getShadowTy(V->getType()); }
1691
1692 /// Compute the shadow type that corresponds to a given Type.
1693 Type *getShadowTy(Type *OrigTy) {
1694 if (!OrigTy->isSized()) {
1695 return nullptr;
1696 }
1697 // For integer type, shadow is the same as the original type.
1698 // This may return weird-sized types like i1.
1699 if (IntegerType *IT = dyn_cast<IntegerType>(OrigTy))
1700 return IT;
1701 const DataLayout &DL = F.getDataLayout();
1702 if (VectorType *VT = dyn_cast<VectorType>(OrigTy)) {
1703 uint32_t EltSize = DL.getTypeSizeInBits(VT->getElementType());
1704 return VectorType::get(IntegerType::get(*MS.C, EltSize),
1705 VT->getElementCount());
1706 }
1707 if (ArrayType *AT = dyn_cast<ArrayType>(OrigTy)) {
1708 return ArrayType::get(getShadowTy(AT->getElementType()),
1709 AT->getNumElements());
1710 }
1711 if (StructType *ST = dyn_cast<StructType>(OrigTy)) {
1713 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1714 Elements.push_back(getShadowTy(ST->getElementType(i)));
1715 StructType *Res = StructType::get(*MS.C, Elements, ST->isPacked());
1716 LLVM_DEBUG(dbgs() << "getShadowTy: " << *ST << " ===> " << *Res << "\n");
1717 return Res;
1718 }
1719 if (isScalableNonVectorType(OrigTy)) {
1720 LLVM_DEBUG(dbgs() << "getShadowTy: Scalable non-vector type: " << *OrigTy
1721 << "\n");
1722 return OrigTy;
1723 }
1724
1725 uint32_t TypeSize = DL.getTypeSizeInBits(OrigTy);
1726 return IntegerType::get(*MS.C, TypeSize);
1727 }
1728
1729 /// Extract combined shadow of struct elements as a bool
1730 Value *collapseStructShadow(StructType *Struct, Value *Shadow,
1731 IRBuilder<> &IRB) {
1732 Value *FalseVal = IRB.getIntN(/* width */ 1, /* value */ 0);
1733 Value *Aggregator = FalseVal;
1734
1735 for (unsigned Idx = 0; Idx < Struct->getNumElements(); Idx++) {
1736 // Combine by ORing together each element's bool shadow
1737 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1738 Value *ShadowBool = convertToBool(ShadowItem, IRB);
1739
1740 if (Aggregator != FalseVal)
1741 Aggregator = IRB.CreateOr(Aggregator, ShadowBool);
1742 else
1743 Aggregator = ShadowBool;
1744 }
1745
1746 return Aggregator;
1747 }
1748
1749 // Extract combined shadow of array elements
1750 Value *collapseArrayShadow(ArrayType *Array, Value *Shadow,
1751 IRBuilder<> &IRB) {
1752 if (!Array->getNumElements())
1753 return IRB.getIntN(/* width */ 1, /* value */ 0);
1754
1755 Value *FirstItem = IRB.CreateExtractValue(Shadow, 0);
1756 Value *Aggregator = convertShadowToScalar(FirstItem, IRB);
1757
1758 for (unsigned Idx = 1; Idx < Array->getNumElements(); Idx++) {
1759 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1760 Value *ShadowInner = convertShadowToScalar(ShadowItem, IRB);
1761 Aggregator = IRB.CreateOr(Aggregator, ShadowInner);
1762 }
1763 return Aggregator;
1764 }
1765
1766 /// Convert a shadow value to it's flattened variant. The resulting
1767 /// shadow may not necessarily have the same bit width as the input
1768 /// value, but it will always be comparable to zero.
1769 Value *convertShadowToScalar(Value *V, IRBuilder<> &IRB) {
1770 if (StructType *Struct = dyn_cast<StructType>(V->getType()))
1771 return collapseStructShadow(Struct, V, IRB);
1772 if (ArrayType *Array = dyn_cast<ArrayType>(V->getType()))
1773 return collapseArrayShadow(Array, V, IRB);
1774 if (isa<VectorType>(V->getType())) {
1775 if (isa<ScalableVectorType>(V->getType()))
1776 return convertShadowToScalar(IRB.CreateOrReduce(V), IRB);
1777 unsigned BitWidth =
1778 V->getType()->getPrimitiveSizeInBits().getFixedValue();
1779 return IRB.CreateBitCast(V, IntegerType::get(*MS.C, BitWidth));
1780 }
1781 return V;
1782 }
1783
1784 // Convert a scalar value to an i1 by comparing with 0
1785 Value *convertToBool(Value *V, IRBuilder<> &IRB, const Twine &name = "") {
1786 Type *VTy = V->getType();
1787 if (!VTy->isIntegerTy())
1788 return convertToBool(convertShadowToScalar(V, IRB), IRB, name);
1789 if (VTy->getIntegerBitWidth() == 1)
1790 // Just converting a bool to a bool, so do nothing.
1791 return V;
1792 return IRB.CreateICmpNE(V, ConstantInt::get(VTy, 0), name);
1793 }
1794
1795 Type *ptrToIntPtrType(Type *PtrTy) const {
1796 if (VectorType *VectTy = dyn_cast<VectorType>(PtrTy)) {
1797 return VectorType::get(ptrToIntPtrType(VectTy->getElementType()),
1798 VectTy->getElementCount());
1799 }
1800 assert(PtrTy->isIntOrPtrTy());
1801 return MS.IntptrTy;
1802 }
1803
1804 Type *getPtrToShadowPtrType(Type *IntPtrTy, Type *ShadowTy) const {
1805 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1806 return VectorType::get(
1807 getPtrToShadowPtrType(VectTy->getElementType(), ShadowTy),
1808 VectTy->getElementCount());
1809 }
1810 assert(IntPtrTy == MS.IntptrTy);
1811 return MS.PtrTy;
1812 }
1813
1814 Constant *constToIntPtr(Type *IntPtrTy, uint64_t C) const {
1815 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1817 VectTy->getElementCount(),
1818 constToIntPtr(VectTy->getElementType(), C));
1819 }
1820 assert(IntPtrTy == MS.IntptrTy);
1821 // TODO: Avoid implicit trunc?
1822 // See https://github.com/llvm/llvm-project/issues/112510.
1823 return ConstantInt::get(MS.IntptrTy, C, /*IsSigned=*/false,
1824 /*ImplicitTrunc=*/true);
1825 }
1826
1827 /// Returns the integer shadow offset that corresponds to a given
1828 /// application address, whereby:
1829 ///
1830 /// Offset = (Addr & ~AndMask) ^ XorMask
1831 /// Shadow = ShadowBase + Offset
1832 /// Origin = (OriginBase + Offset) & ~Alignment
1833 ///
1834 /// Note: for efficiency, many shadow mappings only require use the XorMask
1835 /// and OriginBase; the AndMask and ShadowBase are often zero.
1836 Value *getShadowPtrOffset(Value *Addr, IRBuilder<> &IRB) {
1837 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1838 Value *OffsetLong = IRB.CreatePointerCast(Addr, IntptrTy);
1839
1840 if (uint64_t AndMask = MS.MapParams->AndMask)
1841 OffsetLong = IRB.CreateAnd(OffsetLong, constToIntPtr(IntptrTy, ~AndMask));
1842
1843 if (uint64_t XorMask = MS.MapParams->XorMask)
1844 OffsetLong = IRB.CreateXor(OffsetLong, constToIntPtr(IntptrTy, XorMask));
1845 return OffsetLong;
1846 }
1847
1848 /// Compute the shadow and origin addresses corresponding to a given
1849 /// application address.
1850 ///
1851 /// Shadow = ShadowBase + Offset
1852 /// Origin = (OriginBase + Offset) & ~3ULL
1853 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1854 /// a single pointee.
1855 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1856 std::pair<Value *, Value *>
1857 getShadowOriginPtrUserspace(Value *Addr, IRBuilder<> &IRB, Type *ShadowTy,
1858 MaybeAlign Alignment) {
1859 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1860 if (!VectTy) {
1861 assert(Addr->getType()->isPointerTy());
1862 } else {
1863 assert(VectTy->getElementType()->isPointerTy());
1864 }
1865 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1866 Value *ShadowOffset = getShadowPtrOffset(Addr, IRB);
1867 Value *ShadowLong = ShadowOffset;
1868 if (uint64_t ShadowBase = MS.MapParams->ShadowBase) {
1869 ShadowLong =
1870 IRB.CreateAdd(ShadowLong, constToIntPtr(IntptrTy, ShadowBase));
1871 }
1872 Value *ShadowPtr = IRB.CreateIntToPtr(
1873 ShadowLong, getPtrToShadowPtrType(IntptrTy, ShadowTy));
1874
1875 Value *OriginPtr = nullptr;
1876 if (MS.TrackOrigins) {
1877 Value *OriginLong = ShadowOffset;
1878 uint64_t OriginBase = MS.MapParams->OriginBase;
1879 if (OriginBase != 0)
1880 OriginLong =
1881 IRB.CreateAdd(OriginLong, constToIntPtr(IntptrTy, OriginBase));
1882 if (!Alignment || *Alignment < kMinOriginAlignment) {
1883 uint64_t Mask = kMinOriginAlignment.value() - 1;
1884 OriginLong = IRB.CreateAnd(OriginLong, constToIntPtr(IntptrTy, ~Mask));
1885 }
1886 OriginPtr = IRB.CreateIntToPtr(
1887 OriginLong, getPtrToShadowPtrType(IntptrTy, MS.OriginTy));
1888 }
1889 return std::make_pair(ShadowPtr, OriginPtr);
1890 }
1891
1892 template <typename... ArgsTy>
1893 Value *createMetadataCall(IRBuilder<> &IRB, FunctionCallee Callee,
1894 ArgsTy... Args) {
1895 if (MS.TargetTriple.getArch() == Triple::systemz) {
1896 IRB.CreateCall(Callee,
1897 {MS.MsanMetadataAlloca, std::forward<ArgsTy>(Args)...});
1898 return IRB.CreateLoad(MS.MsanMetadata, MS.MsanMetadataAlloca);
1899 }
1900
1901 return IRB.CreateCall(Callee, {std::forward<ArgsTy>(Args)...});
1902 }
1903
1904 std::pair<Value *, Value *> getShadowOriginPtrKernelNoVec(Value *Addr,
1905 IRBuilder<> &IRB,
1906 Type *ShadowTy,
1907 bool isStore) {
1908 Value *ShadowOriginPtrs;
1909 const DataLayout &DL = F.getDataLayout();
1910 TypeSize Size = DL.getTypeStoreSize(ShadowTy);
1911
1912 FunctionCallee Getter = MS.getKmsanShadowOriginAccessFn(isStore, Size);
1913 Value *AddrCast = IRB.CreatePointerCast(Addr, MS.PtrTy);
1914 if (Getter) {
1915 ShadowOriginPtrs = createMetadataCall(IRB, Getter, AddrCast);
1916 } else {
1917 Value *SizeVal = ConstantInt::get(MS.IntptrTy, Size);
1918 ShadowOriginPtrs = createMetadataCall(
1919 IRB,
1920 isStore ? MS.MsanMetadataPtrForStoreN : MS.MsanMetadataPtrForLoadN,
1921 AddrCast, SizeVal);
1922 }
1923 Value *ShadowPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 0);
1924 ShadowPtr = IRB.CreatePointerCast(ShadowPtr, MS.PtrTy);
1925 Value *OriginPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 1);
1926
1927 return std::make_pair(ShadowPtr, OriginPtr);
1928 }
1929
1930 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1931 /// a single pointee.
1932 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1933 std::pair<Value *, Value *> getShadowOriginPtrKernel(Value *Addr,
1934 IRBuilder<> &IRB,
1935 Type *ShadowTy,
1936 bool isStore) {
1937 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1938 if (!VectTy) {
1939 assert(Addr->getType()->isPointerTy());
1940 return getShadowOriginPtrKernelNoVec(Addr, IRB, ShadowTy, isStore);
1941 }
1942
1943 // TODO: Support callbacs with vectors of addresses.
1944 unsigned NumElements = cast<FixedVectorType>(VectTy)->getNumElements();
1945 Value *ShadowPtrs = ConstantInt::getNullValue(
1946 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1947 Value *OriginPtrs = nullptr;
1948 if (MS.TrackOrigins)
1949 OriginPtrs = ConstantInt::getNullValue(
1950 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1951 for (unsigned i = 0; i < NumElements; ++i) {
1952 Value *OneAddr =
1953 IRB.CreateExtractElement(Addr, ConstantInt::get(IRB.getInt32Ty(), i));
1954 auto [ShadowPtr, OriginPtr] =
1955 getShadowOriginPtrKernelNoVec(OneAddr, IRB, ShadowTy, isStore);
1956
1957 ShadowPtrs = IRB.CreateInsertElement(
1958 ShadowPtrs, ShadowPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1959 if (MS.TrackOrigins)
1960 OriginPtrs = IRB.CreateInsertElement(
1961 OriginPtrs, OriginPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1962 }
1963 return {ShadowPtrs, OriginPtrs};
1964 }
1965
1966 std::pair<Value *, Value *> getShadowOriginPtr(Value *Addr, IRBuilder<> &IRB,
1967 Type *ShadowTy,
1968 MaybeAlign Alignment,
1969 bool isStore) {
1970 if (MS.CompileKernel)
1971 return getShadowOriginPtrKernel(Addr, IRB, ShadowTy, isStore);
1972 return getShadowOriginPtrUserspace(Addr, IRB, ShadowTy, Alignment);
1973 }
1974
1975 /// Compute the shadow address for a given function argument.
1976 ///
1977 /// Shadow = ParamTLS+ArgOffset.
1978 Value *getShadowPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1979 return IRB.CreatePtrAdd(MS.ParamTLS,
1980 ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg");
1981 }
1982
1983 /// Compute the origin address for a given function argument.
1984 Value *getOriginPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1985 if (!MS.TrackOrigins)
1986 return nullptr;
1987 return IRB.CreatePtrAdd(MS.ParamOriginTLS,
1988 ConstantInt::get(MS.IntptrTy, ArgOffset),
1989 "_msarg_o");
1990 }
1991
1992 /// Compute the shadow address for a retval.
1993 Value *getShadowPtrForRetval(IRBuilder<> &IRB) {
1994 return IRB.CreatePointerCast(MS.RetvalTLS, IRB.getPtrTy(0), "_msret");
1995 }
1996
1997 /// Compute the origin address for a retval.
1998 Value *getOriginPtrForRetval() {
1999 // We keep a single origin for the entire retval. Might be too optimistic.
2000 return MS.RetvalOriginTLS;
2001 }
2002
2003 /// Set SV to be the shadow value for V.
2004 void setShadow(Value *V, Value *SV) {
2005 assert(!ShadowMap.count(V) && "Values may only have one shadow");
2006 ShadowMap[V] = PropagateShadow ? SV : getCleanShadow(V);
2007 }
2008
2009 /// Set Origin to be the origin value for V.
2010 void setOrigin(Value *V, Value *Origin) {
2011 if (!MS.TrackOrigins)
2012 return;
2013 assert(!OriginMap.count(V) && "Values may only have one origin");
2014 LLVM_DEBUG(dbgs() << "ORIGIN: " << *V << " ==> " << *Origin << "\n");
2015 OriginMap[V] = Origin;
2016 }
2017
2018 Constant *getCleanShadow(Type *OrigTy) {
2019 Type *ShadowTy = getShadowTy(OrigTy);
2020 if (!ShadowTy)
2021 return nullptr;
2022 return Constant::getNullValue(ShadowTy);
2023 }
2024
2025 /// Create a clean shadow value for a given value.
2026 ///
2027 /// Clean shadow (all zeroes) means all bits of the value are defined
2028 /// (initialized).
2029 Constant *getCleanShadow(Value *V) { return getCleanShadow(V->getType()); }
2030
2031 /// Create a dirty shadow of a given shadow type.
2032 Constant *getPoisonedShadow(Type *ShadowTy) {
2033 assert(ShadowTy);
2034 if (isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy))
2035 return Constant::getAllOnesValue(ShadowTy);
2036 if (ArrayType *AT = dyn_cast<ArrayType>(ShadowTy)) {
2037 SmallVector<Constant *, 4> Vals(AT->getNumElements(),
2038 getPoisonedShadow(AT->getElementType()));
2039 return ConstantArray::get(AT, Vals);
2040 }
2041 if (StructType *ST = dyn_cast<StructType>(ShadowTy)) {
2042 SmallVector<Constant *, 4> Vals;
2043 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
2044 Vals.push_back(getPoisonedShadow(ST->getElementType(i)));
2045 return ConstantStruct::get(ST, Vals);
2046 }
2047 llvm_unreachable("Unexpected shadow type");
2048 }
2049
2050 /// Create a dirty shadow for a given value.
2051 Constant *getPoisonedShadow(Value *V) {
2052 Type *ShadowTy = getShadowTy(V);
2053 if (!ShadowTy)
2054 return nullptr;
2055 return getPoisonedShadow(ShadowTy);
2056 }
2057
2058 /// Create a clean (zero) origin.
2059 Value *getCleanOrigin() { return Constant::getNullValue(MS.OriginTy); }
2060
2061 /// Get the shadow value for a given Value.
2062 ///
2063 /// This function either returns the value set earlier with setShadow,
2064 /// or extracts if from ParamTLS (for function arguments).
2065 Value *getShadow(Value *V) {
2066 if (Instruction *I = dyn_cast<Instruction>(V)) {
2067 if (!PropagateShadow || I->getMetadata(LLVMContext::MD_nosanitize))
2068 return getCleanShadow(V);
2069 // For instructions the shadow is already stored in the map.
2070 Value *Shadow = ShadowMap[V];
2071 if (!Shadow) {
2072 LLVM_DEBUG(dbgs() << "No shadow: " << *V << "\n" << *(I->getParent()));
2073 assert(Shadow && "No shadow for a value");
2074 }
2075 return Shadow;
2076 }
2077 // Handle fully undefined values
2078 // (partially undefined constant vectors are handled later)
2079 if ([[maybe_unused]] UndefValue *U = dyn_cast<UndefValue>(V)) {
2080 Value *AllOnes = (PropagateShadow && PoisonUndef) ? getPoisonedShadow(V)
2081 : getCleanShadow(V);
2082 LLVM_DEBUG(dbgs() << "Undef: " << *U << " ==> " << *AllOnes << "\n");
2083 return AllOnes;
2084 }
2085 if (Argument *A = dyn_cast<Argument>(V)) {
2086 // For arguments we compute the shadow on demand and store it in the map.
2087 Value *&ShadowPtr = ShadowMap[V];
2088 if (ShadowPtr)
2089 return ShadowPtr;
2090 Function *F = A->getParent();
2091 IRBuilder<> EntryIRB(FnPrologueEnd);
2092 unsigned ArgOffset = 0;
2093 const DataLayout &DL = F->getDataLayout();
2094 for (auto &FArg : F->args()) {
2095 if (!FArg.getType()->isSized() || FArg.getType()->isScalableTy()) {
2096 LLVM_DEBUG(dbgs() << (FArg.getType()->isScalableTy()
2097 ? "vscale not fully supported\n"
2098 : "Arg is not sized\n"));
2099 if (A == &FArg) {
2100 ShadowPtr = getCleanShadow(V);
2101 setOrigin(A, getCleanOrigin());
2102 break;
2103 }
2104 continue;
2105 }
2106
2107 unsigned Size = FArg.hasByValAttr()
2108 ? DL.getTypeAllocSize(FArg.getParamByValType())
2109 : DL.getTypeAllocSize(FArg.getType());
2110
2111 if (A == &FArg) {
2112 bool Overflow = ArgOffset + Size > kParamTLSSize;
2113 if (FArg.hasByValAttr()) {
2114 // ByVal pointer itself has clean shadow. We copy the actual
2115 // argument shadow to the underlying memory.
2116 // Figure out maximal valid memcpy alignment.
2117 const Align ArgAlign = DL.getValueOrABITypeAlignment(
2118 FArg.getParamAlign(), FArg.getParamByValType());
2119 Value *CpShadowPtr, *CpOriginPtr;
2120 std::tie(CpShadowPtr, CpOriginPtr) =
2121 getShadowOriginPtr(V, EntryIRB, EntryIRB.getInt8Ty(), ArgAlign,
2122 /*isStore*/ true);
2123 if (!PropagateShadow || Overflow) {
2124 // ParamTLS overflow.
2125 EntryIRB.CreateMemSet(
2126 CpShadowPtr, Constant::getNullValue(EntryIRB.getInt8Ty()),
2127 Size, ArgAlign);
2128 } else {
2129 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2130 const Align CopyAlign = std::min(ArgAlign, kShadowTLSAlignment);
2131 [[maybe_unused]] Value *Cpy = EntryIRB.CreateMemCpy(
2132 CpShadowPtr, CopyAlign, Base, CopyAlign, Size);
2133 LLVM_DEBUG(dbgs() << " ByValCpy: " << *Cpy << "\n");
2134
2135 if (MS.TrackOrigins) {
2136 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2137 // FIXME: OriginSize should be:
2138 // alignTo(V % kMinOriginAlignment + Size, kMinOriginAlignment)
2139 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
2140 EntryIRB.CreateMemCpy(
2141 CpOriginPtr,
2142 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginPtr,
2143 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
2144 OriginSize);
2145 }
2146 }
2147 }
2148
2149 if (!PropagateShadow || Overflow || FArg.hasByValAttr() ||
2150 (MS.EagerChecks && FArg.hasAttribute(Attribute::NoUndef))) {
2151 ShadowPtr = getCleanShadow(V);
2152 setOrigin(A, getCleanOrigin());
2153 } else {
2154 // Shadow over TLS
2155 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2156 ShadowPtr = EntryIRB.CreateAlignedLoad(getShadowTy(&FArg), Base,
2158 if (MS.TrackOrigins) {
2159 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2160 setOrigin(A, EntryIRB.CreateLoad(MS.OriginTy, OriginPtr));
2161 }
2162 }
2164 << " ARG: " << FArg << " ==> " << *ShadowPtr << "\n");
2165 break;
2166 }
2167
2168 ArgOffset += alignTo(Size, kShadowTLSAlignment);
2169 }
2170 assert(ShadowPtr && "Could not find shadow for an argument");
2171 return ShadowPtr;
2172 }
2173
2174 // Check for partially-undefined constant vectors
2175 // TODO: scalable vectors (this is hard because we do not have IRBuilder)
2176 if (isa<FixedVectorType>(V->getType()) && isa<Constant>(V) &&
2177 cast<Constant>(V)->containsUndefOrPoisonElement() && PropagateShadow &&
2178 PoisonUndefVectors) {
2179 unsigned NumElems = cast<FixedVectorType>(V->getType())->getNumElements();
2180 SmallVector<Constant *, 32> ShadowVector(NumElems);
2181 for (unsigned i = 0; i != NumElems; ++i) {
2182 Constant *Elem = cast<Constant>(V)->getAggregateElement(i);
2183 ShadowVector[i] = isa<UndefValue>(Elem) ? getPoisonedShadow(Elem)
2184 : getCleanShadow(Elem);
2185 }
2186
2187 Value *ShadowConstant = ConstantVector::get(ShadowVector);
2188 LLVM_DEBUG(dbgs() << "Partial undef constant vector: " << *V << " ==> "
2189 << *ShadowConstant << "\n");
2190
2191 return ShadowConstant;
2192 }
2193
2194 // TODO: partially-undefined constant arrays, structures, and nested types
2195
2196 // For everything else the shadow is zero.
2197 return getCleanShadow(V);
2198 }
2199
2200 /// Get the shadow for i-th argument of the instruction I.
2201 Value *getShadow(Instruction *I, int i) {
2202 return getShadow(I->getOperand(i));
2203 }
2204
2205 /// Get the origin for a value.
2206 Value *getOrigin(Value *V) {
2207 if (!MS.TrackOrigins)
2208 return nullptr;
2209 if (!PropagateShadow || isa<Constant>(V) || isa<InlineAsm>(V))
2210 return getCleanOrigin();
2212 "Unexpected value type in getOrigin()");
2213 if (Instruction *I = dyn_cast<Instruction>(V)) {
2214 if (I->getMetadata(LLVMContext::MD_nosanitize))
2215 return getCleanOrigin();
2216 }
2217 Value *Origin = OriginMap[V];
2218 assert(Origin && "Missing origin");
2219 return Origin;
2220 }
2221
2222 /// Get the origin for i-th argument of the instruction I.
2223 Value *getOrigin(Instruction *I, int i) {
2224 return getOrigin(I->getOperand(i));
2225 }
2226
2227 /// Remember the place where a shadow check should be inserted.
2228 ///
2229 /// This location will be later instrumented with a check that will print a
2230 /// UMR warning in runtime if the shadow value is not 0.
2231 void insertCheckShadow(Value *Shadow, Value *Origin, Instruction *OrigIns) {
2232 assert(Shadow);
2233 if (!InsertChecks)
2234 return;
2235
2236 if (!DebugCounter::shouldExecute(DebugInsertCheck)) {
2237 LLVM_DEBUG(dbgs() << "Skipping check of " << *Shadow << " before "
2238 << *OrigIns << "\n");
2239 return;
2240 }
2241
2242 Type *ShadowTy = Shadow->getType();
2243 if (isScalableNonVectorType(ShadowTy)) {
2244 LLVM_DEBUG(dbgs() << "Skipping check of scalable non-vector " << *Shadow
2245 << " before " << *OrigIns << "\n");
2246 return;
2247 }
2248#ifndef NDEBUG
2249 assert((isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy) ||
2250 isa<StructType>(ShadowTy) || isa<ArrayType>(ShadowTy)) &&
2251 "Can only insert checks for integer, vector, and aggregate shadow "
2252 "types");
2253#endif
2254 InstrumentationList.push_back(
2255 ShadowOriginAndInsertPoint(Shadow, Origin, OrigIns));
2256 }
2257
2258 /// Get shadow for value, and remember the place where a shadow check should
2259 /// be inserted.
2260 ///
2261 /// This location will be later instrumented with a check that will print a
2262 /// UMR warning in runtime if the value is not fully defined.
2263 void insertCheckShadowOf(Value *Val, Instruction *OrigIns) {
2264 assert(Val);
2265 Value *Shadow, *Origin;
2267 Shadow = getShadow(Val);
2268 if (!Shadow)
2269 return;
2270 Origin = getOrigin(Val);
2271 } else {
2272 Shadow = dyn_cast_or_null<Instruction>(getShadow(Val));
2273 if (!Shadow)
2274 return;
2275 Origin = dyn_cast_or_null<Instruction>(getOrigin(Val));
2276 }
2277 insertCheckShadow(Shadow, Origin, OrigIns);
2278 }
2279
2281 switch (a) {
2282 case AtomicOrdering::NotAtomic:
2283 return AtomicOrdering::NotAtomic;
2284 case AtomicOrdering::Unordered:
2285 case AtomicOrdering::Monotonic:
2286 case AtomicOrdering::Release:
2287 return AtomicOrdering::Release;
2288 case AtomicOrdering::Acquire:
2289 case AtomicOrdering::AcquireRelease:
2290 return AtomicOrdering::AcquireRelease;
2291 case AtomicOrdering::SequentiallyConsistent:
2292 return AtomicOrdering::SequentiallyConsistent;
2293 }
2294 llvm_unreachable("Unknown ordering");
2295 }
2296
2297 Value *makeAddReleaseOrderingTable(IRBuilder<> &IRB) {
2298 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2299 uint32_t OrderingTable[NumOrderings] = {};
2300
2301 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2302 OrderingTable[(int)AtomicOrderingCABI::release] =
2303 (int)AtomicOrderingCABI::release;
2304 OrderingTable[(int)AtomicOrderingCABI::consume] =
2305 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2306 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2307 (int)AtomicOrderingCABI::acq_rel;
2308 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2309 (int)AtomicOrderingCABI::seq_cst;
2310
2311 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2312 }
2313
2315 switch (a) {
2316 case AtomicOrdering::NotAtomic:
2317 return AtomicOrdering::NotAtomic;
2318 case AtomicOrdering::Unordered:
2319 case AtomicOrdering::Monotonic:
2320 case AtomicOrdering::Acquire:
2321 return AtomicOrdering::Acquire;
2322 case AtomicOrdering::Release:
2323 case AtomicOrdering::AcquireRelease:
2324 return AtomicOrdering::AcquireRelease;
2325 case AtomicOrdering::SequentiallyConsistent:
2326 return AtomicOrdering::SequentiallyConsistent;
2327 }
2328 llvm_unreachable("Unknown ordering");
2329 }
2330
2331 Value *makeAddAcquireOrderingTable(IRBuilder<> &IRB) {
2332 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2333 uint32_t OrderingTable[NumOrderings] = {};
2334
2335 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2336 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2337 OrderingTable[(int)AtomicOrderingCABI::consume] =
2338 (int)AtomicOrderingCABI::acquire;
2339 OrderingTable[(int)AtomicOrderingCABI::release] =
2340 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2341 (int)AtomicOrderingCABI::acq_rel;
2342 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2343 (int)AtomicOrderingCABI::seq_cst;
2344
2345 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2346 }
2347
2348 // ------------------- Visitors.
2349 using InstVisitor<MemorySanitizerVisitor>::visit;
2350 void visit(Instruction &I) {
2351 if (I.getMetadata(LLVMContext::MD_nosanitize))
2352 return;
2353 // Don't want to visit if we're in the prologue
2354 if (isInPrologue(I))
2355 return;
2356 if (!DebugCounter::shouldExecute(DebugInstrumentInstruction)) {
2357 LLVM_DEBUG(dbgs() << "Skipping instruction: " << I << "\n");
2358 // We still need to set the shadow and origin to clean values.
2359 setShadow(&I, getCleanShadow(&I));
2360 setOrigin(&I, getCleanOrigin());
2361 return;
2362 }
2363
2364 Instructions.push_back(&I);
2365 }
2366
2367 /// Instrument LoadInst
2368 ///
2369 /// Loads the corresponding shadow and (optionally) origin.
2370 /// Optionally, checks that the load address is fully defined.
2371 void visitLoadInst(LoadInst &I) {
2372 assert(I.getType()->isSized() && "Load type must have size");
2373 assert(!I.getMetadata(LLVMContext::MD_nosanitize));
2374 NextNodeIRBuilder IRB(&I);
2375 Type *ShadowTy = getShadowTy(&I);
2376 Value *Addr = I.getPointerOperand();
2377 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
2378 const Align Alignment = I.getAlign();
2379 if (PropagateShadow) {
2380 std::tie(ShadowPtr, OriginPtr) =
2381 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
2382 setShadow(&I,
2383 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
2384 } else {
2385 setShadow(&I, getCleanShadow(&I));
2386 }
2387
2389 insertCheckShadowOf(I.getPointerOperand(), &I);
2390
2391 if (I.isAtomic())
2392 I.setOrdering(addAcquireOrdering(I.getOrdering()));
2393
2394 if (MS.TrackOrigins) {
2395 if (PropagateShadow) {
2396 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
2397 setOrigin(
2398 &I, IRB.CreateAlignedLoad(MS.OriginTy, OriginPtr, OriginAlignment));
2399 } else {
2400 setOrigin(&I, getCleanOrigin());
2401 }
2402 }
2403 }
2404
2405 /// Instrument StoreInst
2406 ///
2407 /// Stores the corresponding shadow and (optionally) origin.
2408 /// Optionally, checks that the store address is fully defined.
2409 void visitStoreInst(StoreInst &I) {
2410 StoreList.push_back(&I);
2412 insertCheckShadowOf(I.getPointerOperand(), &I);
2413 }
2414
2415 void handleCASOrRMW(Instruction &I) {
2417
2418 IRBuilder<> IRB(&I);
2419 Value *Addr = I.getOperand(0);
2420 Value *Val = I.getOperand(1);
2421 Value *ShadowPtr = getShadowOriginPtr(Addr, IRB, getShadowTy(Val), Align(1),
2422 /*isStore*/ true)
2423 .first;
2424
2426 insertCheckShadowOf(Addr, &I);
2427
2428 // Only test the conditional argument of cmpxchg instruction.
2429 // The other argument can potentially be uninitialized, but we can not
2430 // detect this situation reliably without possible false positives.
2432 insertCheckShadowOf(Val, &I);
2433
2434 IRB.CreateStore(getCleanShadow(Val), ShadowPtr);
2435
2436 setShadow(&I, getCleanShadow(&I));
2437 setOrigin(&I, getCleanOrigin());
2438 }
2439
2440 void visitAtomicRMWInst(AtomicRMWInst &I) {
2441 handleCASOrRMW(I);
2442 I.setOrdering(addReleaseOrdering(I.getOrdering()));
2443 }
2444
2445 void visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
2446 handleCASOrRMW(I);
2447 I.setSuccessOrdering(addReleaseOrdering(I.getSuccessOrdering()));
2448 }
2449
2450 /// Generic handler to compute shadow for == and != comparisons.
2451 ///
2452 /// This function is used by handleEqualityComparison and visitSwitchInst.
2453 ///
2454 /// Sometimes the comparison result is known even if some of the bits of the
2455 /// arguments are not.
2456 Value *propagateEqualityComparison(IRBuilder<> &IRB, Value *A, Value *B,
2457 Value *Sa, Value *Sb) {
2458 assert(getShadowTy(A) == Sa->getType());
2459 assert(getShadowTy(B) == Sb->getType());
2460
2461 // Get rid of pointers and vectors of pointers.
2462 // For ints (and vectors of ints), types of A and Sa match,
2463 // and this is a no-op.
2464 A = IRB.CreatePointerCast(A, Sa->getType());
2465 B = IRB.CreatePointerCast(B, Sb->getType());
2466
2467 // A == B <==> (C = A^B) == 0
2468 // A != B <==> (C = A^B) != 0
2469 // Sc = Sa | Sb
2470 Value *C = IRB.CreateXor(A, B);
2471 Value *Sc = IRB.CreateOr(Sa, Sb);
2472 // Now dealing with i = (C == 0) comparison (or C != 0, does not matter now)
2473 // Result is defined if one of the following is true
2474 // * there is a defined 1 bit in C
2475 // * C is fully defined
2476 // Si = !(C & ~Sc) && Sc
2478 Value *MinusOne = Constant::getAllOnesValue(Sc->getType());
2479 Value *LHS = IRB.CreateICmpNE(Sc, Zero);
2480 Value *RHS =
2481 IRB.CreateICmpEQ(IRB.CreateAnd(IRB.CreateXor(Sc, MinusOne), C), Zero);
2482 Value *Si = IRB.CreateAnd(LHS, RHS);
2483 Si->setName("_msprop_icmp");
2484
2485 return Si;
2486 }
2487
2488 // Instrument:
2489 // switch i32 %Val, label %else [ i32 0, label %A
2490 // i32 1, label %B
2491 // i32 2, label %C ]
2492 //
2493 // Typically, the switch input value (%Val) is fully initialized.
2494 //
2495 // Sometimes the compiler may convert (icmp + br) into a switch statement.
2496 // MSan allows icmp eq/ne with partly initialized inputs to still result in a
2497 // fully initialized output, if there exists a bit that is initialized in
2498 // both inputs with a differing value. For compatibility, we support this in
2499 // the switch instrumentation as well. Note that this edge case only applies
2500 // if the switch input value does not match *any* of the cases (matching any
2501 // of the cases requires an exact, fully initialized match).
2502 //
2503 // ShadowCases = 0
2504 // | propagateEqualityComparison(Val, 0)
2505 // | propagateEqualityComparison(Val, 1)
2506 // | propagateEqualityComparison(Val, 2))
2507 void visitSwitchInst(SwitchInst &SI) {
2508 IRBuilder<> IRB(&SI);
2509
2510 Value *Val = SI.getCondition();
2511 Value *ShadowVal = getShadow(Val);
2512 // TODO: add fast path - if the condition is fully initialized, we know
2513 // there is no UUM, without needing to consider the case values below.
2514
2515 // Some code (e.g., AMDGPUGenMCCodeEmitter.inc) has tens of thousands of
2516 // cases. This results in an extremely long chained expression for MSan's
2517 // switch instrumentation, which can cause the JumpThreadingPass to have a
2518 // stack overflow or excessive runtime. We limit the number of cases
2519 // considered, with the tradeoff of niche false negatives.
2520 // TODO: figure out a better solution.
2521 int casesToConsider = ClSwitchPrecision;
2522
2523 Value *ShadowCases = nullptr;
2524 for (auto Case : SI.cases()) {
2525 if (casesToConsider <= 0)
2526 break;
2527
2528 Value *Comparator = Case.getCaseValue();
2529 // TODO: some simplification is possible when comparing multiple cases
2530 // simultaneously.
2531 Value *ComparisonShadow = propagateEqualityComparison(
2532 IRB, Val, Comparator, ShadowVal, getShadow(Comparator));
2533
2534 if (ShadowCases)
2535 ShadowCases = IRB.CreateOr(ShadowCases, ComparisonShadow);
2536 else
2537 ShadowCases = ComparisonShadow;
2538
2539 casesToConsider--;
2540 }
2541
2542 if (ShadowCases)
2543 insertCheckShadow(ShadowCases, getOrigin(Val), &SI);
2544 }
2545
2546 // Vector manipulation.
2547 void visitExtractElementInst(ExtractElementInst &I) {
2548 insertCheckShadowOf(I.getOperand(1), &I);
2549 IRBuilder<> IRB(&I);
2550 setShadow(&I, IRB.CreateExtractElement(getShadow(&I, 0), I.getOperand(1),
2551 "_msprop"));
2552 setOrigin(&I, getOrigin(&I, 0));
2553 }
2554
2555 void visitInsertElementInst(InsertElementInst &I) {
2556 insertCheckShadowOf(I.getOperand(2), &I);
2557 IRBuilder<> IRB(&I);
2558 auto *Shadow0 = getShadow(&I, 0);
2559 auto *Shadow1 = getShadow(&I, 1);
2560 setShadow(&I, IRB.CreateInsertElement(Shadow0, Shadow1, I.getOperand(2),
2561 "_msprop"));
2562 setOriginForNaryOp(I);
2563 }
2564
2565 void visitShuffleVectorInst(ShuffleVectorInst &I) {
2566 IRBuilder<> IRB(&I);
2567 auto *Shadow0 = getShadow(&I, 0);
2568 auto *Shadow1 = getShadow(&I, 1);
2569 setShadow(&I, IRB.CreateShuffleVector(Shadow0, Shadow1, I.getShuffleMask(),
2570 "_msprop"));
2571 setOriginForNaryOp(I);
2572 }
2573
2574 // Casts.
2575 void visitSExtInst(SExtInst &I) {
2576 IRBuilder<> IRB(&I);
2577 setShadow(&I, IRB.CreateSExt(getShadow(&I, 0), I.getType(), "_msprop"));
2578 setOrigin(&I, getOrigin(&I, 0));
2579 }
2580
2581 void visitZExtInst(ZExtInst &I) {
2582 IRBuilder<> IRB(&I);
2583 setShadow(&I, IRB.CreateZExt(getShadow(&I, 0), I.getType(), "_msprop"));
2584 setOrigin(&I, getOrigin(&I, 0));
2585 }
2586
2587 void visitTruncInst(TruncInst &I) {
2588 IRBuilder<> IRB(&I);
2589 setShadow(&I, IRB.CreateTrunc(getShadow(&I, 0), I.getType(), "_msprop"));
2590 setOrigin(&I, getOrigin(&I, 0));
2591 }
2592
2593 void visitBitCastInst(BitCastInst &I) {
2594 // Special case: if this is the bitcast (there is exactly 1 allowed) between
2595 // a musttail call and a ret, don't instrument. New instructions are not
2596 // allowed after a musttail call.
2597 if (auto *CI = dyn_cast<CallInst>(I.getOperand(0)))
2598 if (CI->isMustTailCall())
2599 return;
2600 IRBuilder<> IRB(&I);
2601 setShadow(&I, IRB.CreateBitCast(getShadow(&I, 0), getShadowTy(&I)));
2602 setOrigin(&I, getOrigin(&I, 0));
2603 }
2604
2605 void visitPtrToIntInst(PtrToIntInst &I) {
2606 IRBuilder<> IRB(&I);
2607 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2608 "_msprop_ptrtoint"));
2609 setOrigin(&I, getOrigin(&I, 0));
2610 }
2611
2612 void visitIntToPtrInst(IntToPtrInst &I) {
2613 IRBuilder<> IRB(&I);
2614 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2615 "_msprop_inttoptr"));
2616 setOrigin(&I, getOrigin(&I, 0));
2617 }
2618
2619 void visitFPToSIInst(CastInst &I) { handleShadowOr(I); }
2620 void visitFPToUIInst(CastInst &I) { handleShadowOr(I); }
2621 void visitSIToFPInst(CastInst &I) { handleShadowOr(I); }
2622 void visitUIToFPInst(CastInst &I) { handleShadowOr(I); }
2623 void visitFPExtInst(CastInst &I) { handleShadowOr(I); }
2624 void visitFPTruncInst(CastInst &I) { handleShadowOr(I); }
2625
2626 /// Generic handler to compute shadow for bitwise AND.
2627 ///
2628 /// This is used by 'visitAnd' but also as a primitive for other handlers.
2629 ///
2630 /// This code is precise: it implements the rule that "And" of an initialized
2631 /// zero bit always results in an initialized value:
2632 // 1&1 => 1; 0&1 => 0; p&1 => p;
2633 // 1&0 => 0; 0&0 => 0; p&0 => 0;
2634 // 1&p => p; 0&p => 0; p&p => p;
2635 //
2636 // S = (S1 & S2) | (V1 & S2) | (S1 & V2)
2637 Value *handleBitwiseAnd(IRBuilder<> &IRB, Value *V1, Value *V2, Value *S1,
2638 Value *S2) {
2639 // "The two arguments to the ‘and’ instruction must be integer or vector
2640 // of integer values. Both arguments must have identical types."
2641 //
2642 // We enforce this condition for all callers to handleBitwiseAnd(); callers
2643 // with non-integer types should call CreateAppToShadowCast() themselves.
2645 assert(V1->getType() == V2->getType());
2646
2647 // Conveniently, getShadowTy() of Int/IntVector returns the original type.
2648 assert(V1->getType() == S1->getType());
2649 assert(V2->getType() == S2->getType());
2650
2651 Value *S1S2 = IRB.CreateAnd(S1, S2);
2652 Value *V1S2 = IRB.CreateAnd(V1, S2);
2653 Value *S1V2 = IRB.CreateAnd(S1, V2);
2654
2655 return IRB.CreateOr({S1S2, V1S2, S1V2});
2656 }
2657
2658 /// Handler for bitwise AND operator.
2659 void visitAnd(BinaryOperator &I) {
2660 IRBuilder<> IRB(&I);
2661 Value *V1 = I.getOperand(0);
2662 Value *V2 = I.getOperand(1);
2663 Value *S1 = getShadow(&I, 0);
2664 Value *S2 = getShadow(&I, 1);
2665
2666 Value *OutShadow = handleBitwiseAnd(IRB, V1, V2, S1, S2);
2667
2668 setShadow(&I, OutShadow);
2669 setOriginForNaryOp(I);
2670 }
2671
2672 void visitOr(BinaryOperator &I) {
2673 IRBuilder<> IRB(&I);
2674 // "Or" of 1 and a poisoned value results in unpoisoned value:
2675 // 1|1 => 1; 0|1 => 1; p|1 => 1;
2676 // 1|0 => 1; 0|0 => 0; p|0 => p;
2677 // 1|p => 1; 0|p => p; p|p => p;
2678 //
2679 // S = (S1 & S2) | (~V1 & S2) | (S1 & ~V2)
2680 //
2681 // If the "disjoint OR" property is violated, the result is poison, and
2682 // hence the entire shadow is uninitialized:
2683 // S = S | SignExt(V1 & V2 != 0)
2684 Value *S1 = getShadow(&I, 0);
2685 Value *S2 = getShadow(&I, 1);
2686 Value *V1 = I.getOperand(0);
2687 Value *V2 = I.getOperand(1);
2688
2689 // "The two arguments to the ‘or’ instruction must be integer or vector
2690 // of integer values. Both arguments must have identical types."
2692 assert(V1->getType() == V2->getType());
2693
2694 // Conveniently, getShadowTy() of Int/IntVector returns the original type.
2695 assert(V1->getType() == S1->getType());
2696 assert(V2->getType() == S2->getType());
2697
2698 Value *NotV1 = IRB.CreateNot(V1);
2699 Value *NotV2 = IRB.CreateNot(V2);
2700
2701 Value *S1S2 = IRB.CreateAnd(S1, S2);
2702 Value *S2NotV1 = IRB.CreateAnd(NotV1, S2);
2703 Value *S1NotV2 = IRB.CreateAnd(S1, NotV2);
2704
2705 Value *S = IRB.CreateOr({S1S2, S2NotV1, S1NotV2});
2706
2707 if (ClPreciseDisjointOr && cast<PossiblyDisjointInst>(&I)->isDisjoint()) {
2708 Value *V1V2 = IRB.CreateAnd(V1, V2);
2709 Value *DisjointOrShadow = IRB.CreateSExt(
2710 IRB.CreateICmpNE(V1V2, getCleanShadow(V1V2)), V1V2->getType());
2711 S = IRB.CreateOr(S, DisjointOrShadow, "_ms_disjoint");
2712 }
2713
2714 setShadow(&I, S);
2715 setOriginForNaryOp(I);
2716 }
2717
2718 /// Default propagation of shadow and/or origin.
2719 ///
2720 /// This class implements the general case of shadow propagation, used in all
2721 /// cases where we don't know and/or don't care about what the operation
2722 /// actually does. It converts all input shadow values to a common type
2723 /// (extending or truncating as necessary), and bitwise OR's them.
2724 ///
2725 /// This is much cheaper than inserting checks (i.e. requiring inputs to be
2726 /// fully initialized), and less prone to false positives.
2727 ///
2728 /// This class also implements the general case of origin propagation. For a
2729 /// Nary operation, result origin is set to the origin of an argument that is
2730 /// not entirely initialized. If there is more than one such arguments, the
2731 /// rightmost of them is picked. It does not matter which one is picked if all
2732 /// arguments are initialized.
2733 template <bool CombineShadow> class Combiner {
2734 Value *Shadow = nullptr;
2735 Value *Origin = nullptr;
2736 IRBuilder<> &IRB;
2737 MemorySanitizerVisitor *MSV;
2738
2739 public:
2740 Combiner(MemorySanitizerVisitor *MSV, IRBuilder<> &IRB)
2741 : IRB(IRB), MSV(MSV) {}
2742
2743 /// Add a pair of shadow and origin values to the mix.
2744 Combiner &Add(Value *OpShadow, Value *OpOrigin) {
2745 if (CombineShadow) {
2746 assert(OpShadow);
2747 if (!Shadow)
2748 Shadow = OpShadow;
2749 else {
2750 OpShadow = MSV->CreateShadowCast(IRB, OpShadow, Shadow->getType());
2751 Shadow = IRB.CreateOr(Shadow, OpShadow, "_msprop");
2752 }
2753 }
2754
2755 if (MSV->MS.TrackOrigins) {
2756 assert(OpOrigin);
2757 if (!Origin) {
2758 Origin = OpOrigin;
2759 } else {
2760 Constant *ConstOrigin = dyn_cast<Constant>(OpOrigin);
2761 // No point in adding something that might result in 0 origin value.
2762 if (!ConstOrigin || !ConstOrigin->isNullValue()) {
2763 Value *Cond = MSV->convertToBool(OpShadow, IRB);
2764 Origin = IRB.CreateSelect(Cond, OpOrigin, Origin);
2765 }
2766 }
2767 }
2768 return *this;
2769 }
2770
2771 /// Add an application value to the mix.
2772 Combiner &Add(Value *V) {
2773 Value *OpShadow = MSV->getShadow(V);
2774 Value *OpOrigin = MSV->MS.TrackOrigins ? MSV->getOrigin(V) : nullptr;
2775 return Add(OpShadow, OpOrigin);
2776 }
2777
2778 /// Set the current combined values as the given instruction's shadow
2779 /// and origin.
2780 void Done(Instruction *I) {
2781 if (CombineShadow) {
2782 assert(Shadow);
2783 Shadow = MSV->CreateShadowCast(IRB, Shadow, MSV->getShadowTy(I));
2784 MSV->setShadow(I, Shadow);
2785 }
2786 if (MSV->MS.TrackOrigins) {
2787 assert(Origin);
2788 MSV->setOrigin(I, Origin);
2789 }
2790 }
2791
2792 /// Store the current combined value at the specified origin
2793 /// location.
2794 void DoneAndStoreOrigin(TypeSize TS, Value *OriginPtr) {
2795 if (MSV->MS.TrackOrigins) {
2796 assert(Origin);
2797 MSV->paintOrigin(IRB, Origin, OriginPtr, TS, kMinOriginAlignment);
2798 }
2799 }
2800 };
2801
2802 using ShadowAndOriginCombiner = Combiner<true>;
2803 using OriginCombiner = Combiner<false>;
2804
2805 /// Propagate origin for arbitrary operation.
2806 void setOriginForNaryOp(Instruction &I) {
2807 if (!MS.TrackOrigins)
2808 return;
2809 IRBuilder<> IRB(&I);
2810 OriginCombiner OC(this, IRB);
2811 for (Use &Op : I.operands())
2812 OC.Add(Op.get());
2813 OC.Done(&I);
2814 }
2815
2816 size_t VectorOrPrimitiveTypeSizeInBits(Type *Ty) {
2817 assert(!(Ty->isVectorTy() && Ty->getScalarType()->isPointerTy()) &&
2818 "Vector of pointers is not a valid shadow type");
2819 return Ty->isVectorTy() ? cast<FixedVectorType>(Ty)->getNumElements() *
2821 : Ty->getPrimitiveSizeInBits();
2822 }
2823
2824 /// Cast between two shadow types, extending or truncating as
2825 /// necessary.
2826 Value *CreateShadowCast(IRBuilder<> &IRB, Value *V, Type *dstTy,
2827 bool Signed = false) {
2828 Type *srcTy = V->getType();
2829 if (srcTy == dstTy)
2830 return V;
2831 size_t srcSizeInBits = VectorOrPrimitiveTypeSizeInBits(srcTy);
2832 size_t dstSizeInBits = VectorOrPrimitiveTypeSizeInBits(dstTy);
2833 if (srcSizeInBits > 1 && dstSizeInBits == 1)
2834 return IRB.CreateICmpNE(V, getCleanShadow(V));
2835
2836 if (dstTy->isIntegerTy() && srcTy->isIntegerTy())
2837 return IRB.CreateIntCast(V, dstTy, Signed);
2838 if (dstTy->isVectorTy() && srcTy->isVectorTy() &&
2839 cast<VectorType>(dstTy)->getElementCount() ==
2840 cast<VectorType>(srcTy)->getElementCount())
2841 return IRB.CreateIntCast(V, dstTy, Signed);
2842 Value *V1 = IRB.CreateBitCast(V, Type::getIntNTy(*MS.C, srcSizeInBits));
2843 Value *V2 =
2844 IRB.CreateIntCast(V1, Type::getIntNTy(*MS.C, dstSizeInBits), Signed);
2845 return IRB.CreateBitCast(V2, dstTy);
2846 // TODO: handle struct types.
2847 }
2848
2849 /// Cast an application value to the type of its own shadow.
2850 Value *CreateAppToShadowCast(IRBuilder<> &IRB, Value *V) {
2851 Type *ShadowTy = getShadowTy(V);
2852 if (V->getType() == ShadowTy)
2853 return V;
2854 if (V->getType()->isPtrOrPtrVectorTy())
2855 return IRB.CreatePtrToInt(V, ShadowTy);
2856 else
2857 return IRB.CreateBitCast(V, ShadowTy);
2858 }
2859
2860 /// Propagate shadow for arbitrary operation.
2861 void handleShadowOr(Instruction &I) {
2862 IRBuilder<> IRB(&I);
2863 ShadowAndOriginCombiner SC(this, IRB);
2864 for (Use &Op : I.operands())
2865 SC.Add(Op.get());
2866 SC.Done(&I);
2867 }
2868
2869 // Perform a bitwise OR on the horizontal pairs (or other specified grouping)
2870 // of elements.
2871 //
2872 // For example, suppose we have:
2873 // VectorA: <a0, a1, a2, a3, a4, a5>
2874 // VectorB: <b0, b1, b2, b3, b4, b5>
2875 // ReductionFactor: 3
2876 // Shards: 1
2877 // The output would be:
2878 // <a0|a1|a2, a3|a4|a5, b0|b1|b2, b3|b4|b5>
2879 //
2880 // If we have:
2881 // VectorA: <a0, a1, a2, a3, a4, a5, a6, a7>
2882 // VectorB: <b0, b1, b2, b3, b4, b5, b6, b7>
2883 // ReductionFactor: 2
2884 // Shards: 2
2885 // then a and be each have 2 "shards", resulting in the output being
2886 // interleaved:
2887 // <a0|a1, a2|a3, b0|b1, b2|b3, a4|a5, a6|a7, b4|b5, b6|b7>
2888 //
2889 // This is convenient for instrumenting horizontal add/sub.
2890 // For bitwise OR on "vertical" pairs, see maybeHandleSimpleNomemIntrinsic().
2891 Value *horizontalReduce(IntrinsicInst &I, unsigned ReductionFactor,
2892 unsigned Shards, Value *VectorA, Value *VectorB) {
2893 assert(isa<FixedVectorType>(VectorA->getType()));
2894 unsigned NumElems =
2895 cast<FixedVectorType>(VectorA->getType())->getNumElements();
2896
2897 [[maybe_unused]] unsigned TotalNumElems = NumElems;
2898 if (VectorB) {
2899 assert(VectorA->getType() == VectorB->getType());
2900 TotalNumElems *= 2;
2901 }
2902
2903 assert(NumElems % (ReductionFactor * Shards) == 0);
2904
2905 Value *Or = nullptr;
2906
2907 IRBuilder<> IRB(&I);
2908 for (unsigned i = 0; i < ReductionFactor; i++) {
2909 SmallVector<int, 16> Mask;
2910
2911 for (unsigned j = 0; j < Shards; j++) {
2912 unsigned Offset = NumElems / Shards * j;
2913
2914 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2915 Mask.push_back(Offset + X + i);
2916
2917 if (VectorB) {
2918 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2919 Mask.push_back(NumElems + Offset + X + i);
2920 }
2921 }
2922
2923 Value *Masked;
2924 if (VectorB)
2925 Masked = IRB.CreateShuffleVector(VectorA, VectorB, Mask);
2926 else
2927 Masked = IRB.CreateShuffleVector(VectorA, Mask);
2928
2929 if (Or)
2930 Or = IRB.CreateOr(Or, Masked);
2931 else
2932 Or = Masked;
2933 }
2934
2935 return Or;
2936 }
2937
2938 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2939 /// fields.
2940 ///
2941 /// e.g., <2 x i32> @llvm.aarch64.neon.saddlp.v2i32.v4i16(<4 x i16>)
2942 /// <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8>, <16 x i8>)
2943 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards) {
2944 assert(I.arg_size() == 1 || I.arg_size() == 2);
2945
2946 assert(I.getType()->isVectorTy());
2947 assert(I.getArgOperand(0)->getType()->isVectorTy());
2948
2949 [[maybe_unused]] FixedVectorType *ParamType =
2950 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2951 assert((I.arg_size() != 2) ||
2952 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2953 [[maybe_unused]] FixedVectorType *ReturnType =
2954 cast<FixedVectorType>(I.getType());
2955 assert(ParamType->getNumElements() * I.arg_size() ==
2956 2 * ReturnType->getNumElements());
2957
2958 IRBuilder<> IRB(&I);
2959
2960 // Horizontal OR of shadow
2961 Value *FirstArgShadow = getShadow(&I, 0);
2962 Value *SecondArgShadow = nullptr;
2963 if (I.arg_size() == 2)
2964 SecondArgShadow = getShadow(&I, 1);
2965
2966 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
2967 FirstArgShadow, SecondArgShadow);
2968
2969 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2970
2971 setShadow(&I, OrShadow);
2972 setOriginForNaryOp(I);
2973 }
2974
2975 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2976 /// fields, with the parameters reinterpreted to have elements of a specified
2977 /// width. For example:
2978 /// @llvm.x86.ssse3.phadd.w(<1 x i64> [[VAR1]], <1 x i64> [[VAR2]])
2979 /// conceptually operates on
2980 /// (<4 x i16> [[VAR1]], <4 x i16> [[VAR2]])
2981 /// and can be handled with ReinterpretElemWidth == 16.
2982 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards,
2983 int ReinterpretElemWidth) {
2984 assert(I.arg_size() == 1 || I.arg_size() == 2);
2985
2986 assert(I.getType()->isVectorTy());
2987 assert(I.getArgOperand(0)->getType()->isVectorTy());
2988
2989 FixedVectorType *ParamType =
2990 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2991 assert((I.arg_size() != 2) ||
2992 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2993
2994 [[maybe_unused]] FixedVectorType *ReturnType =
2995 cast<FixedVectorType>(I.getType());
2996 assert(ParamType->getNumElements() * I.arg_size() ==
2997 2 * ReturnType->getNumElements());
2998
2999 IRBuilder<> IRB(&I);
3000
3001 FixedVectorType *ReinterpretShadowTy = nullptr;
3002 assert(isAligned(Align(ReinterpretElemWidth),
3003 ParamType->getPrimitiveSizeInBits()));
3004 ReinterpretShadowTy = FixedVectorType::get(
3005 IRB.getIntNTy(ReinterpretElemWidth),
3006 ParamType->getPrimitiveSizeInBits() / ReinterpretElemWidth);
3007
3008 // Horizontal OR of shadow
3009 Value *FirstArgShadow = getShadow(&I, 0);
3010 FirstArgShadow = IRB.CreateBitCast(FirstArgShadow, ReinterpretShadowTy);
3011
3012 // If we had two parameters each with an odd number of elements, the total
3013 // number of elements is even, but we have never seen this in extant
3014 // instruction sets, so we enforce that each parameter must have an even
3015 // number of elements.
3017 Align(2),
3018 cast<FixedVectorType>(FirstArgShadow->getType())->getNumElements()));
3019
3020 Value *SecondArgShadow = nullptr;
3021 if (I.arg_size() == 2) {
3022 SecondArgShadow = getShadow(&I, 1);
3023 SecondArgShadow = IRB.CreateBitCast(SecondArgShadow, ReinterpretShadowTy);
3024 }
3025
3026 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
3027 FirstArgShadow, SecondArgShadow);
3028
3029 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
3030
3031 setShadow(&I, OrShadow);
3032 setOriginForNaryOp(I);
3033 }
3034
3035 void visitFNeg(UnaryOperator &I) { handleShadowOr(I); }
3036
3037 // Handle multiplication by constant.
3038 //
3039 // Handle a special case of multiplication by constant that may have one or
3040 // more zeros in the lower bits. This makes corresponding number of lower bits
3041 // of the result zero as well. We model it by shifting the other operand
3042 // shadow left by the required number of bits. Effectively, we transform
3043 // (X * (A * 2**B)) to ((X << B) * A) and instrument (X << B) as (Sx << B).
3044 // We use multiplication by 2**N instead of shift to cover the case of
3045 // multiplication by 0, which may occur in some elements of a vector operand.
3046 void handleMulByConstant(BinaryOperator &I, Constant *ConstArg,
3047 Value *OtherArg) {
3048 Constant *ShadowMul;
3049 Type *Ty = ConstArg->getType();
3050 if (auto *VTy = dyn_cast<VectorType>(Ty)) {
3051 unsigned NumElements = cast<FixedVectorType>(VTy)->getNumElements();
3052 Type *EltTy = VTy->getElementType();
3054 for (unsigned Idx = 0; Idx < NumElements; ++Idx) {
3055 if (ConstantInt *Elt =
3057 const APInt &V = Elt->getValue();
3058 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
3059 Elements.push_back(ConstantInt::get(EltTy, V2));
3060 } else {
3061 Elements.push_back(ConstantInt::get(EltTy, 1));
3062 }
3063 }
3064 ShadowMul = ConstantVector::get(Elements);
3065 } else {
3066 if (ConstantInt *Elt = dyn_cast<ConstantInt>(ConstArg)) {
3067 const APInt &V = Elt->getValue();
3068 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
3069 ShadowMul = ConstantInt::get(Ty, V2);
3070 } else {
3071 ShadowMul = ConstantInt::get(Ty, 1);
3072 }
3073 }
3074
3075 IRBuilder<> IRB(&I);
3076 setShadow(&I,
3077 IRB.CreateMul(getShadow(OtherArg), ShadowMul, "msprop_mul_cst"));
3078 setOrigin(&I, getOrigin(OtherArg));
3079 }
3080
3081 void visitMul(BinaryOperator &I) {
3082 Constant *constOp0 = dyn_cast<Constant>(I.getOperand(0));
3083 Constant *constOp1 = dyn_cast<Constant>(I.getOperand(1));
3084 if (constOp0 && !constOp1)
3085 handleMulByConstant(I, constOp0, I.getOperand(1));
3086 else if (constOp1 && !constOp0)
3087 handleMulByConstant(I, constOp1, I.getOperand(0));
3088 else
3089 handleShadowOr(I);
3090 }
3091
3092 void visitFAdd(BinaryOperator &I) { handleShadowOr(I); }
3093 void visitFSub(BinaryOperator &I) { handleShadowOr(I); }
3094 void visitFMul(BinaryOperator &I) { handleShadowOr(I); }
3095 void visitAdd(BinaryOperator &I) { handleShadowOr(I); }
3096 void visitSub(BinaryOperator &I) { handleShadowOr(I); }
3097 void visitXor(BinaryOperator &I) { handleShadowOr(I); }
3098
3099 void handleIntegerDiv(Instruction &I) {
3100 IRBuilder<> IRB(&I);
3101 // Strict on the second argument.
3102 insertCheckShadowOf(I.getOperand(1), &I);
3103 setShadow(&I, getShadow(&I, 0));
3104 setOrigin(&I, getOrigin(&I, 0));
3105 }
3106
3107 void visitUDiv(BinaryOperator &I) { handleIntegerDiv(I); }
3108 void visitSDiv(BinaryOperator &I) { handleIntegerDiv(I); }
3109 void visitURem(BinaryOperator &I) { handleIntegerDiv(I); }
3110 void visitSRem(BinaryOperator &I) { handleIntegerDiv(I); }
3111
3112 // Floating point division is side-effect free. We can not require that the
3113 // divisor is fully initialized and must propagate shadow. See PR37523.
3114 void visitFDiv(BinaryOperator &I) { handleShadowOr(I); }
3115 void visitFRem(BinaryOperator &I) { handleShadowOr(I); }
3116
3117 /// Instrument == and != comparisons.
3118 ///
3119 /// Sometimes the comparison result is known even if some of the bits of the
3120 /// arguments are not.
3121 void handleEqualityComparison(ICmpInst &I) {
3122 IRBuilder<> IRB(&I);
3123 Value *A = I.getOperand(0);
3124 Value *B = I.getOperand(1);
3125 Value *Sa = getShadow(A);
3126 Value *Sb = getShadow(B);
3127
3128 Value *Si = propagateEqualityComparison(IRB, A, B, Sa, Sb);
3129
3130 setShadow(&I, Si);
3131 setOriginForNaryOp(I);
3132 }
3133
3134 /// Instrument relational comparisons.
3135 ///
3136 /// This function does exact shadow propagation for all relational
3137 /// comparisons of integers, pointers and vectors of those.
3138 /// FIXME: output seems suboptimal when one of the operands is a constant
3139 void handleRelationalComparisonExact(ICmpInst &I) {
3140 IRBuilder<> IRB(&I);
3141 Value *A = I.getOperand(0);
3142 Value *B = I.getOperand(1);
3143 Value *Sa = getShadow(A);
3144 Value *Sb = getShadow(B);
3145
3146 // Get rid of pointers and vectors of pointers.
3147 // For ints (and vectors of ints), types of A and Sa match,
3148 // and this is a no-op.
3149 A = IRB.CreatePointerCast(A, Sa->getType());
3150 B = IRB.CreatePointerCast(B, Sb->getType());
3151
3152 // Let [a0, a1] be the interval of possible values of A, taking into account
3153 // its undefined bits. Let [b0, b1] be the interval of possible values of B.
3154 // Then (A cmp B) is defined iff (a0 cmp b1) == (a1 cmp b0).
3155 bool IsSigned = I.isSigned();
3156
3157 auto GetMinMaxUnsigned = [&](Value *V, Value *S) {
3158 if (IsSigned) {
3159 // Sign-flip to map from signed range to unsigned range. Relation A vs B
3160 // should be preserved, if checked with `getUnsignedPredicate()`.
3161 // Relationship between Amin, Amax, Bmin, Bmax also will not be
3162 // affected, as they are created by effectively adding/substructing from
3163 // A (or B) a value, derived from shadow, with no overflow, either
3164 // before or after sign flip.
3165 APInt MinVal =
3166 APInt::getSignedMinValue(V->getType()->getScalarSizeInBits());
3167 V = IRB.CreateXor(V, ConstantInt::get(V->getType(), MinVal));
3168 }
3169 // Minimize undefined bits.
3170 Value *Min = IRB.CreateAnd(V, IRB.CreateNot(S));
3171 Value *Max = IRB.CreateOr(V, S);
3172 return std::make_pair(Min, Max);
3173 };
3174
3175 auto [Amin, Amax] = GetMinMaxUnsigned(A, Sa);
3176 auto [Bmin, Bmax] = GetMinMaxUnsigned(B, Sb);
3177 Value *S1 = IRB.CreateICmp(I.getUnsignedPredicate(), Amin, Bmax);
3178 Value *S2 = IRB.CreateICmp(I.getUnsignedPredicate(), Amax, Bmin);
3179
3180 Value *Si = IRB.CreateXor(S1, S2);
3181 setShadow(&I, Si);
3182 setOriginForNaryOp(I);
3183 }
3184
3185 /// Instrument signed relational comparisons.
3186 ///
3187 /// Handle sign bit tests: x<0, x>=0, x<=-1, x>-1 by propagating the highest
3188 /// bit of the shadow. Everything else is delegated to handleShadowOr().
3189 void handleSignedRelationalComparison(ICmpInst &I) {
3190 Constant *constOp;
3191 Value *op = nullptr;
3193 if ((constOp = dyn_cast<Constant>(I.getOperand(1)))) {
3194 op = I.getOperand(0);
3195 pre = I.getPredicate();
3196 } else if ((constOp = dyn_cast<Constant>(I.getOperand(0)))) {
3197 op = I.getOperand(1);
3198 pre = I.getSwappedPredicate();
3199 } else {
3200 handleShadowOr(I);
3201 return;
3202 }
3203
3204 if ((constOp->isNullValue() &&
3205 (pre == CmpInst::ICMP_SLT || pre == CmpInst::ICMP_SGE)) ||
3206 (constOp->isAllOnesValue() &&
3207 (pre == CmpInst::ICMP_SGT || pre == CmpInst::ICMP_SLE))) {
3208 IRBuilder<> IRB(&I);
3209 Value *Shadow = IRB.CreateICmpSLT(getShadow(op), getCleanShadow(op),
3210 "_msprop_icmp_s");
3211 setShadow(&I, Shadow);
3212 setOrigin(&I, getOrigin(op));
3213 } else {
3214 handleShadowOr(I);
3215 }
3216 }
3217
3218 void visitICmpInst(ICmpInst &I) {
3219 if (!ClHandleICmp) {
3220 handleShadowOr(I);
3221 return;
3222 }
3223 if (I.isEquality()) {
3224 handleEqualityComparison(I);
3225 return;
3226 }
3227
3228 assert(I.isRelational());
3229 if (ClHandleICmpExact) {
3230 handleRelationalComparisonExact(I);
3231 return;
3232 }
3233 if (I.isSigned()) {
3234 handleSignedRelationalComparison(I);
3235 return;
3236 }
3237
3238 assert(I.isUnsigned());
3239 if ((isa<Constant>(I.getOperand(0)) || isa<Constant>(I.getOperand(1)))) {
3240 handleRelationalComparisonExact(I);
3241 return;
3242 }
3243
3244 handleShadowOr(I);
3245 }
3246
3247 void visitFCmpInst(FCmpInst &I) { handleShadowOr(I); }
3248
3249 void handleShift(BinaryOperator &I) {
3250 IRBuilder<> IRB(&I);
3251 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3252 // Otherwise perform the same shift on S1.
3253 Value *S1 = getShadow(&I, 0);
3254 Value *S2 = getShadow(&I, 1);
3255 Value *S2Conv =
3256 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3257 Value *V2 = I.getOperand(1);
3258 Value *Shift = IRB.CreateBinOp(I.getOpcode(), S1, V2);
3259 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3260 setOriginForNaryOp(I);
3261 }
3262
3263 void visitShl(BinaryOperator &I) { handleShift(I); }
3264 void visitAShr(BinaryOperator &I) { handleShift(I); }
3265 void visitLShr(BinaryOperator &I) { handleShift(I); }
3266
3267 void handleFunnelShift(IntrinsicInst &I) {
3268 IRBuilder<> IRB(&I);
3269 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3270 // Otherwise perform the same shift on S0 and S1.
3271 Value *S0 = getShadow(&I, 0);
3272 Value *S1 = getShadow(&I, 1);
3273 Value *S2 = getShadow(&I, 2);
3274 Value *S2Conv =
3275 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3276 Value *V2 = I.getOperand(2);
3277 Value *Shift = IRB.CreateIntrinsic(I.getIntrinsicID(), S2Conv->getType(),
3278 {S0, S1, V2});
3279 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3280 setOriginForNaryOp(I);
3281 }
3282
3283 /// Instrument llvm.memmove
3284 ///
3285 /// At this point we don't know if llvm.memmove will be inlined or not.
3286 /// If we don't instrument it and it gets inlined,
3287 /// our interceptor will not kick in and we will lose the memmove.
3288 /// If we instrument the call here, but it does not get inlined,
3289 /// we will memmove the shadow twice: which is bad in case
3290 /// of overlapping regions. So, we simply lower the intrinsic to a call.
3291 ///
3292 /// Similar situation exists for memcpy and memset.
3293 void visitMemMoveInst(MemMoveInst &I) {
3294 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3295 IRBuilder<> IRB(&I);
3296 IRB.CreateCall(MS.MemmoveFn,
3297 {I.getArgOperand(0), I.getArgOperand(1),
3298 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3300 }
3301
3302 /// Instrument memcpy
3303 ///
3304 /// Similar to memmove: avoid copying shadow twice. This is somewhat
3305 /// unfortunate as it may slowdown small constant memcpys.
3306 /// FIXME: consider doing manual inline for small constant sizes and proper
3307 /// alignment.
3308 ///
3309 /// Note: This also handles memcpy.inline, which promises no calls to external
3310 /// functions as an optimization. However, with instrumentation enabled this
3311 /// is difficult to promise; additionally, we know that the MSan runtime
3312 /// exists and provides __msan_memcpy(). Therefore, we assume that with
3313 /// instrumentation it's safe to turn memcpy.inline into a call to
3314 /// __msan_memcpy(). Should this be wrong, such as when implementing memcpy()
3315 /// itself, instrumentation should be disabled with the no_sanitize attribute.
3316 void visitMemCpyInst(MemCpyInst &I) {
3317 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3318 IRBuilder<> IRB(&I);
3319 IRB.CreateCall(MS.MemcpyFn,
3320 {I.getArgOperand(0), I.getArgOperand(1),
3321 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3323 }
3324
3325 // Same as memcpy.
3326 void visitMemSetInst(MemSetInst &I) {
3327 IRBuilder<> IRB(&I);
3328 IRB.CreateCall(
3329 MS.MemsetFn,
3330 {I.getArgOperand(0),
3331 IRB.CreateIntCast(I.getArgOperand(1), IRB.getInt32Ty(), false),
3332 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3334 }
3335
3336 void visitVAStartInst(VAStartInst &I) { VAHelper->visitVAStartInst(I); }
3337
3338 void visitVACopyInst(VACopyInst &I) { VAHelper->visitVACopyInst(I); }
3339
3340 /// Handle vector store-like intrinsics.
3341 ///
3342 /// Instrument intrinsics that look like a simple SIMD store: writes memory,
3343 /// has 1 pointer argument and 1 vector argument, returns void.
3344 bool handleVectorStoreIntrinsic(IntrinsicInst &I) {
3345 assert(I.arg_size() == 2);
3346
3347 IRBuilder<> IRB(&I);
3348 Value *Addr = I.getArgOperand(0);
3349 Value *Shadow = getShadow(&I, 1);
3350 Value *ShadowPtr, *OriginPtr;
3351
3352 // We don't know the pointer alignment (could be unaligned SSE store!).
3353 // Have to assume to worst case.
3354 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
3355 Addr, IRB, Shadow->getType(), Align(1), /*isStore*/ true);
3356 IRB.CreateAlignedStore(Shadow, ShadowPtr, Align(1));
3357
3359 insertCheckShadowOf(Addr, &I);
3360
3361 // FIXME: factor out common code from materializeStores
3362 if (MS.TrackOrigins)
3363 IRB.CreateStore(getOrigin(&I, 1), OriginPtr);
3364 return true;
3365 }
3366
3367 /// Handle vector load-like intrinsics.
3368 ///
3369 /// Instrument intrinsics that look like a simple SIMD load: reads memory,
3370 /// has 1 pointer argument, returns a vector.
3371 bool handleVectorLoadIntrinsic(IntrinsicInst &I) {
3372 assert(I.arg_size() == 1);
3373
3374 IRBuilder<> IRB(&I);
3375 Value *Addr = I.getArgOperand(0);
3376
3377 Type *ShadowTy = getShadowTy(&I);
3378 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
3379 if (PropagateShadow) {
3380 // We don't know the pointer alignment (could be unaligned SSE load!).
3381 // Have to assume to worst case.
3382 const Align Alignment = Align(1);
3383 std::tie(ShadowPtr, OriginPtr) =
3384 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
3385 setShadow(&I,
3386 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
3387 } else {
3388 setShadow(&I, getCleanShadow(&I));
3389 }
3390
3392 insertCheckShadowOf(Addr, &I);
3393
3394 if (MS.TrackOrigins) {
3395 if (PropagateShadow)
3396 setOrigin(&I, IRB.CreateLoad(MS.OriginTy, OriginPtr));
3397 else
3398 setOrigin(&I, getCleanOrigin());
3399 }
3400 return true;
3401 }
3402
3403 /// Handle (SIMD arithmetic)-like intrinsics.
3404 ///
3405 /// Instrument intrinsics with any number of arguments of the same type [*],
3406 /// equal to the return type, plus a specified number of trailing flags of
3407 /// any type.
3408 ///
3409 /// [*] The type should be simple (no aggregates or pointers; vectors are
3410 /// fine).
3411 ///
3412 /// Caller guarantees that this intrinsic does not access memory.
3413 ///
3414 /// TODO: "horizontal"/"pairwise" intrinsics are often incorrectly matched by
3415 /// by this handler. See horizontalReduce().
3416 ///
3417 /// TODO: permutation intrinsics are also often incorrectly matched.
3418 [[maybe_unused]] bool
3419 maybeHandleSimpleNomemIntrinsic(IntrinsicInst &I,
3420 unsigned int trailingFlags) {
3421 Type *RetTy = I.getType();
3422 if (!(RetTy->isIntOrIntVectorTy() || RetTy->isFPOrFPVectorTy()))
3423 return false;
3424
3425 unsigned NumArgOperands = I.arg_size();
3426 assert(NumArgOperands >= trailingFlags);
3427 for (unsigned i = 0; i < NumArgOperands - trailingFlags; ++i) {
3428 Type *Ty = I.getArgOperand(i)->getType();
3429 if (Ty != RetTy)
3430 return false;
3431 }
3432
3433 IRBuilder<> IRB(&I);
3434 ShadowAndOriginCombiner SC(this, IRB);
3435 for (unsigned i = 0; i < NumArgOperands; ++i)
3436 SC.Add(I.getArgOperand(i));
3437 SC.Done(&I);
3438
3439 return true;
3440 }
3441
3442 /// Returns whether it was able to heuristically instrument unknown
3443 /// intrinsics.
3444 ///
3445 /// The main purpose of this code is to do something reasonable with all
3446 /// random intrinsics we might encounter, most importantly - SIMD intrinsics.
3447 /// We recognize several classes of intrinsics by their argument types and
3448 /// ModRefBehaviour and apply special instrumentation when we are reasonably
3449 /// sure that we know what the intrinsic does.
3450 ///
3451 /// We special-case intrinsics where this approach fails. See llvm.bswap
3452 /// handling as an example of that.
3453 bool maybeHandleUnknownIntrinsicUnlogged(IntrinsicInst &I) {
3454 unsigned NumArgOperands = I.arg_size();
3455 if (NumArgOperands == 0)
3456 return false;
3457
3458 if (NumArgOperands == 2 && I.getArgOperand(0)->getType()->isPointerTy() &&
3459 I.getArgOperand(1)->getType()->isVectorTy() &&
3460 I.getType()->isVoidTy() && !I.onlyReadsMemory()) {
3461 // This looks like a vector store.
3462 return handleVectorStoreIntrinsic(I);
3463 }
3464
3465 if (NumArgOperands == 1 && I.getArgOperand(0)->getType()->isPointerTy() &&
3466 I.getType()->isVectorTy() && I.onlyReadsMemory()) {
3467 // This looks like a vector load.
3468 return handleVectorLoadIntrinsic(I);
3469 }
3470
3471 if (I.doesNotAccessMemory())
3472 if (maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/0))
3473 return true;
3474
3475 // FIXME: detect and handle SSE maskstore/maskload?
3476 // Some cases are now handled in handleAVXMasked{Load,Store}.
3477 return false;
3478 }
3479
3480 bool maybeHandleUnknownIntrinsic(IntrinsicInst &I) {
3481 if (maybeHandleUnknownIntrinsicUnlogged(I)) {
3483 dumpInst(I, "Heuristic");
3484
3485 LLVM_DEBUG(dbgs() << "UNKNOWN INSTRUCTION HANDLED HEURISTICALLY: " << I
3486 << "\n");
3487 return true;
3488 } else
3489 return false;
3490 }
3491
3492 void handleInvariantGroup(IntrinsicInst &I) {
3493 setShadow(&I, getShadow(&I, 0));
3494 setOrigin(&I, getOrigin(&I, 0));
3495 }
3496
3497 void handleLifetimeStart(IntrinsicInst &I) {
3498 if (!PoisonStack)
3499 return;
3500 AllocaInst *AI = dyn_cast<AllocaInst>(I.getArgOperand(0));
3501 if (AI)
3502 LifetimeStartList.push_back(std::make_pair(&I, AI));
3503 }
3504
3505 void handleBswap(IntrinsicInst &I) {
3506 IRBuilder<> IRB(&I);
3507 Value *Op = I.getArgOperand(0);
3508 Type *OpType = Op->getType();
3509 setShadow(&I, IRB.CreateIntrinsic(Intrinsic::bswap, ArrayRef(&OpType, 1),
3510 getShadow(Op)));
3511 setOrigin(&I, getOrigin(Op));
3512 }
3513
3514 // Uninitialized bits are ok if they appear after the leading/trailing 0's
3515 // and a 1. If the input is all zero, it is fully initialized iff
3516 // !is_zero_poison.
3517 //
3518 // e.g., for ctlz, with little-endian, if 0/1 are initialized bits with
3519 // concrete value 0/1, and ? is an uninitialized bit:
3520 // - 0001 0??? is fully initialized
3521 // - 000? ???? is fully uninitialized (*)
3522 // - ???? ???? is fully uninitialized
3523 // - 0000 0000 is fully uninitialized if is_zero_poison,
3524 // fully initialized otherwise
3525 //
3526 // (*) TODO: arguably, since the number of zeros is in the range [3, 8], we
3527 // only need to poison 4 bits.
3528 //
3529 // OutputShadow =
3530 // ((ConcreteZerosCount >= ShadowZerosCount) && !AllZeroShadow)
3531 // || (is_zero_poison && AllZeroSrc)
3532 void handleCountLeadingTrailingZeros(IntrinsicInst &I) {
3533 IRBuilder<> IRB(&I);
3534 Value *Src = I.getArgOperand(0);
3535 Value *SrcShadow = getShadow(Src);
3536
3537 Value *False = IRB.getInt1(false);
3538 Value *ConcreteZerosCount = IRB.CreateIntrinsic(
3539 I.getType(), I.getIntrinsicID(), {Src, /*is_zero_poison=*/False});
3540 Value *ShadowZerosCount = IRB.CreateIntrinsic(
3541 I.getType(), I.getIntrinsicID(), {SrcShadow, /*is_zero_poison=*/False});
3542
3543 Value *CompareConcreteZeros = IRB.CreateICmpUGE(
3544 ConcreteZerosCount, ShadowZerosCount, "_mscz_cmp_zeros");
3545
3546 Value *NotAllZeroShadow =
3547 IRB.CreateIsNotNull(SrcShadow, "_mscz_shadow_not_null");
3548 Value *OutputShadow =
3549 IRB.CreateAnd(CompareConcreteZeros, NotAllZeroShadow, "_mscz_main");
3550
3551 // If zero poison is requested, mix in with the shadow
3552 Constant *IsZeroPoison = cast<Constant>(I.getOperand(1));
3553 if (!IsZeroPoison->isNullValue()) {
3554 Value *BoolZeroPoison = IRB.CreateIsNull(Src, "_mscz_bzp");
3555 OutputShadow = IRB.CreateOr(OutputShadow, BoolZeroPoison, "_mscz_bs");
3556 }
3557
3558 OutputShadow = IRB.CreateSExt(OutputShadow, getShadowTy(Src), "_mscz_os");
3559
3560 setShadow(&I, OutputShadow);
3561 setOriginForNaryOp(I);
3562 }
3563
3564 /// Handle Arm NEON vector convert intrinsics.
3565 ///
3566 /// e.g., <4 x i32> @llvm.aarch64.neon.fcvtpu.v4i32.v4f32(<4 x float>)
3567 /// i32 @llvm.aarch64.neon.fcvtms.i32.f64 (double)
3568 ///
3569 /// For conversions to or from fixed-point, there is a trailing argument to
3570 /// indicate the fixed-point precision:
3571 /// - <4 x float> llvm.aarch64.neon.vcvtfxs2fp.v4f32.v4i32(<4 x i32>, i32)
3572 /// - <4 x i32> llvm.aarch64.neon.vcvtfp2fxu.v4i32.v4f32(<4 x float>, i32)
3573 ///
3574 /// For x86 SSE vector convert intrinsics, see
3575 /// handleSSEVectorConvertIntrinsic().
3576 void handleNEONVectorConvertIntrinsic(IntrinsicInst &I, bool FixedPoint) {
3577 if (FixedPoint)
3578 assert(I.arg_size() == 2);
3579 else
3580 assert(I.arg_size() == 1);
3581
3582 IRBuilder<> IRB(&I);
3583 Value *S0 = getShadow(&I, 0);
3584
3585 if (FixedPoint) {
3586 Value *Precision = I.getOperand(1);
3587 insertCheckShadowOf(Precision, &I);
3588 }
3589
3590 /// For scalars:
3591 /// Since they are converting from floating-point to integer, the output is
3592 /// - fully uninitialized if *any* bit of the input is uninitialized
3593 /// - fully ininitialized if all bits of the input are ininitialized
3594 /// We apply the same principle on a per-field basis for vectors.
3595 Value *OutShadow = IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)),
3596 getShadowTy(&I));
3597 setShadow(&I, OutShadow);
3598 setOriginForNaryOp(I);
3599 }
3600
3601 /// Some instructions have additional zero-elements in the return type
3602 /// e.g., <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64>, ...)
3603 ///
3604 /// This function will return a vector type with the same number of elements
3605 /// as the input, but same per-element width as the return value e.g.,
3606 /// <8 x i8>.
3607 FixedVectorType *maybeShrinkVectorShadowType(Value *Src, IntrinsicInst &I) {
3608 assert(isa<FixedVectorType>(getShadowTy(&I)));
3609 FixedVectorType *ShadowType = cast<FixedVectorType>(getShadowTy(&I));
3610
3611 // TODO: generalize beyond 2x?
3612 if (ShadowType->getElementCount() ==
3613 cast<VectorType>(Src->getType())->getElementCount() * 2)
3614 ShadowType = FixedVectorType::getHalfElementsVectorType(ShadowType);
3615
3616 assert(ShadowType->getElementCount() ==
3617 cast<VectorType>(Src->getType())->getElementCount());
3618
3619 return ShadowType;
3620 }
3621
3622 /// Doubles the length of a vector shadow (extending with zeros) if necessary
3623 /// to match the length of the shadow for the instruction.
3624 /// If scalar types of the vectors are different, it will use the type of the
3625 /// input vector.
3626 /// This is more type-safe than CreateShadowCast().
3627 Value *maybeExtendVectorShadowWithZeros(Value *Shadow, IntrinsicInst &I) {
3628 IRBuilder<> IRB(&I);
3630 assert(isa<FixedVectorType>(I.getType()));
3631
3632 Value *FullShadow = getCleanShadow(&I);
3633 unsigned ShadowNumElems =
3634 cast<FixedVectorType>(Shadow->getType())->getNumElements();
3635 unsigned FullShadowNumElems =
3636 cast<FixedVectorType>(FullShadow->getType())->getNumElements();
3637
3638 assert((ShadowNumElems == FullShadowNumElems) ||
3639 (ShadowNumElems * 2 == FullShadowNumElems));
3640
3641 if (ShadowNumElems == FullShadowNumElems) {
3642 FullShadow = Shadow;
3643 } else {
3644 // TODO: generalize beyond 2x?
3645 SmallVector<int, 32> ShadowMask(FullShadowNumElems);
3646 std::iota(ShadowMask.begin(), ShadowMask.end(), 0);
3647
3648 // Append zeros
3649 FullShadow =
3650 IRB.CreateShuffleVector(Shadow, getCleanShadow(Shadow), ShadowMask);
3651 }
3652
3653 return FullShadow;
3654 }
3655
3656 /// Handle x86 SSE vector conversion.
3657 ///
3658 /// e.g., single-precision to half-precision conversion:
3659 /// <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %a0, i32 0)
3660 /// <8 x i16> @llvm.x86.vcvtps2ph.128(<4 x float> %a0, i32 0)
3661 ///
3662 /// floating-point to integer:
3663 /// <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>)
3664 /// <4 x i32> @llvm.x86.sse2.cvtpd2dq(<2 x double>)
3665 ///
3666 /// Note: if the output has more elements, they are zero-initialized (and
3667 /// therefore the shadow will also be initialized).
3668 ///
3669 /// This differs from handleSSEVectorConvertIntrinsic() because it
3670 /// propagates uninitialized shadow (instead of checking the shadow).
3671 void handleSSEVectorConvertIntrinsicByProp(IntrinsicInst &I,
3672 bool HasRoundingMode) {
3673 if (HasRoundingMode) {
3674 assert(I.arg_size() == 2);
3675 [[maybe_unused]] Value *RoundingMode = I.getArgOperand(1);
3676 assert(RoundingMode->getType()->isIntegerTy());
3677 } else {
3678 assert(I.arg_size() == 1);
3679 }
3680
3681 Value *Src = I.getArgOperand(0);
3682 assert(Src->getType()->isVectorTy());
3683
3684 // The return type might have more elements than the input.
3685 // Temporarily shrink the return type's number of elements.
3686 VectorType *ShadowType = maybeShrinkVectorShadowType(Src, I);
3687
3688 IRBuilder<> IRB(&I);
3689 Value *S0 = getShadow(&I, 0);
3690
3691 /// For scalars:
3692 /// Since they are converting to and/or from floating-point, the output is:
3693 /// - fully uninitialized if *any* bit of the input is uninitialized
3694 /// - fully ininitialized if all bits of the input are ininitialized
3695 /// We apply the same principle on a per-field basis for vectors.
3696 Value *Shadow =
3697 IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)), ShadowType);
3698
3699 // The return type might have more elements than the input.
3700 // Extend the return type back to its original width if necessary.
3701 Value *FullShadow = maybeExtendVectorShadowWithZeros(Shadow, I);
3702
3703 setShadow(&I, FullShadow);
3704 setOriginForNaryOp(I);
3705 }
3706
3707 // Instrument x86 SSE vector convert intrinsic.
3708 //
3709 // This function instruments intrinsics like cvtsi2ss:
3710 // %Out = int_xxx_cvtyyy(%ConvertOp)
3711 // or
3712 // %Out = int_xxx_cvtyyy(%CopyOp, %ConvertOp)
3713 // Intrinsic converts \p NumUsedElements elements of \p ConvertOp to the same
3714 // number \p Out elements, and (if has 2 arguments) copies the rest of the
3715 // elements from \p CopyOp.
3716 // In most cases conversion involves floating-point value which may trigger a
3717 // hardware exception when not fully initialized. For this reason we require
3718 // \p ConvertOp[0:NumUsedElements] to be fully initialized and trap otherwise.
3719 // We copy the shadow of \p CopyOp[NumUsedElements:] to \p
3720 // Out[NumUsedElements:]. This means that intrinsics without \p CopyOp always
3721 // return a fully initialized value.
3722 //
3723 // For Arm NEON vector convert intrinsics, see
3724 // handleNEONVectorConvertIntrinsic().
3725 void handleSSEVectorConvertIntrinsic(IntrinsicInst &I, int NumUsedElements,
3726 bool HasRoundingMode = false) {
3727 IRBuilder<> IRB(&I);
3728 Value *CopyOp, *ConvertOp;
3729
3730 assert((!HasRoundingMode ||
3731 isa<ConstantInt>(I.getArgOperand(I.arg_size() - 1))) &&
3732 "Invalid rounding mode");
3733
3734 switch (I.arg_size() - HasRoundingMode) {
3735 case 2:
3736 CopyOp = I.getArgOperand(0);
3737 ConvertOp = I.getArgOperand(1);
3738 break;
3739 case 1:
3740 ConvertOp = I.getArgOperand(0);
3741 CopyOp = nullptr;
3742 break;
3743 default:
3744 llvm_unreachable("Cvt intrinsic with unsupported number of arguments.");
3745 }
3746
3747 // The first *NumUsedElements* elements of ConvertOp are converted to the
3748 // same number of output elements. The rest of the output is copied from
3749 // CopyOp, or (if not available) filled with zeroes.
3750 // Combine shadow for elements of ConvertOp that are used in this operation,
3751 // and insert a check.
3752 // FIXME: consider propagating shadow of ConvertOp, at least in the case of
3753 // int->any conversion.
3754 Value *ConvertShadow = getShadow(ConvertOp);
3755 Value *AggShadow = nullptr;
3756 if (ConvertOp->getType()->isVectorTy()) {
3757 AggShadow = IRB.CreateExtractElement(
3758 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), 0));
3759 for (int i = 1; i < NumUsedElements; ++i) {
3760 Value *MoreShadow = IRB.CreateExtractElement(
3761 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), i));
3762 AggShadow = IRB.CreateOr(AggShadow, MoreShadow);
3763 }
3764 } else {
3765 AggShadow = ConvertShadow;
3766 }
3767 assert(AggShadow->getType()->isIntegerTy());
3768 insertCheckShadow(AggShadow, getOrigin(ConvertOp), &I);
3769
3770 // Build result shadow by zero-filling parts of CopyOp shadow that come from
3771 // ConvertOp.
3772 if (CopyOp) {
3773 assert(CopyOp->getType() == I.getType());
3774 assert(CopyOp->getType()->isVectorTy());
3775 Value *ResultShadow = getShadow(CopyOp);
3776 Type *EltTy = cast<VectorType>(ResultShadow->getType())->getElementType();
3777 for (int i = 0; i < NumUsedElements; ++i) {
3778 ResultShadow = IRB.CreateInsertElement(
3779 ResultShadow, ConstantInt::getNullValue(EltTy),
3780 ConstantInt::get(IRB.getInt32Ty(), i));
3781 }
3782 setShadow(&I, ResultShadow);
3783 setOrigin(&I, getOrigin(CopyOp));
3784 } else {
3785 setShadow(&I, getCleanShadow(&I));
3786 setOrigin(&I, getCleanOrigin());
3787 }
3788 }
3789
3790 // Given a scalar or vector, extract lower 64 bits (or less), and return all
3791 // zeroes if it is zero, and all ones otherwise.
3792 Value *Lower64ShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3793 if (S->getType()->isVectorTy())
3794 S = CreateShadowCast(IRB, S, IRB.getInt64Ty(), /* Signed */ true);
3795 assert(S->getType()->getPrimitiveSizeInBits() <= 64);
3796 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3797 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3798 }
3799
3800 // Given a vector, extract its first element, and return all
3801 // zeroes if it is zero, and all ones otherwise.
3802 Value *LowerElementShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3803 Value *S1 = IRB.CreateExtractElement(S, (uint64_t)0);
3804 Value *S2 = IRB.CreateICmpNE(S1, getCleanShadow(S1));
3805 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3806 }
3807
3808 Value *VariableShadowExtend(IRBuilder<> &IRB, Value *S) {
3809 Type *T = S->getType();
3810 assert(T->isVectorTy());
3811 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3812 return IRB.CreateSExt(S2, T);
3813 }
3814
3815 // Instrument vector shift intrinsic.
3816 //
3817 // This function instruments intrinsics like int_x86_avx2_psll_w.
3818 // Intrinsic shifts %In by %ShiftSize bits.
3819 // %ShiftSize may be a vector. In that case the lower 64 bits determine shift
3820 // size, and the rest is ignored. Behavior is defined even if shift size is
3821 // greater than register (or field) width.
3822 void handleVectorShiftIntrinsic(IntrinsicInst &I, bool Variable) {
3823 assert(I.arg_size() == 2);
3824 IRBuilder<> IRB(&I);
3825 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3826 // Otherwise perform the same shift on S1.
3827 Value *S1 = getShadow(&I, 0);
3828 Value *S2 = getShadow(&I, 1);
3829 Value *S2Conv = Variable ? VariableShadowExtend(IRB, S2)
3830 : Lower64ShadowExtend(IRB, S2, getShadowTy(&I));
3831 Value *V1 = I.getOperand(0);
3832 Value *V2 = I.getOperand(1);
3833 Value *Shift = IRB.CreateCall(I.getFunctionType(), I.getCalledOperand(),
3834 {IRB.CreateBitCast(S1, V1->getType()), V2});
3835 Shift = IRB.CreateBitCast(Shift, getShadowTy(&I));
3836 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3837 setOriginForNaryOp(I);
3838 }
3839
3840 // Get an MMX-sized (64-bit) vector type, or optionally, other sized
3841 // vectors.
3842 Type *getMMXVectorTy(unsigned EltSizeInBits,
3843 unsigned X86_MMXSizeInBits = 64) {
3844 assert(EltSizeInBits != 0 && (X86_MMXSizeInBits % EltSizeInBits) == 0 &&
3845 "Illegal MMX vector element size");
3846 return FixedVectorType::get(IntegerType::get(*MS.C, EltSizeInBits),
3847 X86_MMXSizeInBits / EltSizeInBits);
3848 }
3849
3850 // Returns a signed counterpart for an (un)signed-saturate-and-pack
3851 // intrinsic.
3852 Intrinsic::ID getSignedPackIntrinsic(Intrinsic::ID id) {
3853 switch (id) {
3854 case Intrinsic::x86_sse2_packsswb_128:
3855 case Intrinsic::x86_sse2_packuswb_128:
3856 return Intrinsic::x86_sse2_packsswb_128;
3857
3858 case Intrinsic::x86_sse2_packssdw_128:
3859 case Intrinsic::x86_sse41_packusdw:
3860 return Intrinsic::x86_sse2_packssdw_128;
3861
3862 case Intrinsic::x86_avx2_packsswb:
3863 case Intrinsic::x86_avx2_packuswb:
3864 return Intrinsic::x86_avx2_packsswb;
3865
3866 case Intrinsic::x86_avx2_packssdw:
3867 case Intrinsic::x86_avx2_packusdw:
3868 return Intrinsic::x86_avx2_packssdw;
3869
3870 case Intrinsic::x86_mmx_packsswb:
3871 case Intrinsic::x86_mmx_packuswb:
3872 return Intrinsic::x86_mmx_packsswb;
3873
3874 case Intrinsic::x86_mmx_packssdw:
3875 return Intrinsic::x86_mmx_packssdw;
3876
3877 case Intrinsic::x86_avx512_packssdw_512:
3878 case Intrinsic::x86_avx512_packusdw_512:
3879 return Intrinsic::x86_avx512_packssdw_512;
3880
3881 case Intrinsic::x86_avx512_packsswb_512:
3882 case Intrinsic::x86_avx512_packuswb_512:
3883 return Intrinsic::x86_avx512_packsswb_512;
3884
3885 default:
3886 llvm_unreachable("unexpected intrinsic id");
3887 }
3888 }
3889
3890 // Instrument vector pack intrinsic.
3891 //
3892 // This function instruments intrinsics like x86_mmx_packsswb, that
3893 // packs elements of 2 input vectors into half as many bits with saturation.
3894 // Shadow is propagated with the signed variant of the same intrinsic applied
3895 // to sext(Sa != zeroinitializer), sext(Sb != zeroinitializer).
3896 // MMXEltSizeInBits is used only for x86mmx arguments.
3897 //
3898 // TODO: consider using GetMinMaxUnsigned() to handle saturation precisely
3899 void handleVectorPackIntrinsic(IntrinsicInst &I,
3900 unsigned MMXEltSizeInBits = 0) {
3901 assert(I.arg_size() == 2);
3902 IRBuilder<> IRB(&I);
3903 Value *S1 = getShadow(&I, 0);
3904 Value *S2 = getShadow(&I, 1);
3905 assert(S1->getType()->isVectorTy());
3906
3907 // SExt and ICmpNE below must apply to individual elements of input vectors.
3908 // In case of x86mmx arguments, cast them to appropriate vector types and
3909 // back.
3910 Type *T =
3911 MMXEltSizeInBits ? getMMXVectorTy(MMXEltSizeInBits) : S1->getType();
3912 if (MMXEltSizeInBits) {
3913 S1 = IRB.CreateBitCast(S1, T);
3914 S2 = IRB.CreateBitCast(S2, T);
3915 }
3916 Value *S1_ext =
3918 Value *S2_ext =
3920 if (MMXEltSizeInBits) {
3921 S1_ext = IRB.CreateBitCast(S1_ext, getMMXVectorTy(64));
3922 S2_ext = IRB.CreateBitCast(S2_ext, getMMXVectorTy(64));
3923 }
3924
3925 Value *S = IRB.CreateIntrinsic(getSignedPackIntrinsic(I.getIntrinsicID()),
3926 {S1_ext, S2_ext}, /*FMFSource=*/nullptr,
3927 "_msprop_vector_pack");
3928 if (MMXEltSizeInBits)
3929 S = IRB.CreateBitCast(S, getShadowTy(&I));
3930 setShadow(&I, S);
3931 setOriginForNaryOp(I);
3932 }
3933
3934 // Convert `Mask` into `<n x i1>`.
3935 Constant *createDppMask(unsigned Width, unsigned Mask) {
3936 SmallVector<Constant *, 4> R(Width);
3937 for (auto &M : R) {
3938 M = ConstantInt::getBool(F.getContext(), Mask & 1);
3939 Mask >>= 1;
3940 }
3941 return ConstantVector::get(R);
3942 }
3943
3944 // Calculate output shadow as array of booleans `<n x i1>`, assuming if any
3945 // arg is poisoned, entire dot product is poisoned.
3946 Value *findDppPoisonedOutput(IRBuilder<> &IRB, Value *S, unsigned SrcMask,
3947 unsigned DstMask) {
3948 const unsigned Width =
3949 cast<FixedVectorType>(S->getType())->getNumElements();
3950
3951 S = IRB.CreateSelect(createDppMask(Width, SrcMask), S,
3953 Value *SElem = IRB.CreateOrReduce(S);
3954 Value *IsClean = IRB.CreateIsNull(SElem, "_msdpp");
3955 Value *DstMaskV = createDppMask(Width, DstMask);
3956
3957 return IRB.CreateSelect(
3958 IsClean, Constant::getNullValue(DstMaskV->getType()), DstMaskV);
3959 }
3960
3961 // See `Intel Intrinsics Guide` for `_dp_p*` instructions.
3962 //
3963 // 2 and 4 element versions produce single scalar of dot product, and then
3964 // puts it into elements of output vector, selected by 4 lowest bits of the
3965 // mask. Top 4 bits of the mask control which elements of input to use for dot
3966 // product.
3967 //
3968 // 8 element version mask still has only 4 bit for input, and 4 bit for output
3969 // mask. According to the spec it just operates as 4 element version on first
3970 // 4 elements of inputs and output, and then on last 4 elements of inputs and
3971 // output.
3972 void handleDppIntrinsic(IntrinsicInst &I) {
3973 IRBuilder<> IRB(&I);
3974
3975 Value *S0 = getShadow(&I, 0);
3976 Value *S1 = getShadow(&I, 1);
3977 Value *S = IRB.CreateOr(S0, S1);
3978
3979 const unsigned Width =
3980 cast<FixedVectorType>(S->getType())->getNumElements();
3981 assert(Width == 2 || Width == 4 || Width == 8);
3982
3983 const unsigned Mask = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
3984 const unsigned SrcMask = Mask >> 4;
3985 const unsigned DstMask = Mask & 0xf;
3986
3987 // Calculate shadow as `<n x i1>`.
3988 Value *SI1 = findDppPoisonedOutput(IRB, S, SrcMask, DstMask);
3989 if (Width == 8) {
3990 // First 4 elements of shadow are already calculated. `makeDppShadow`
3991 // operats on 32 bit masks, so we can just shift masks, and repeat.
3992 SI1 = IRB.CreateOr(
3993 SI1, findDppPoisonedOutput(IRB, S, SrcMask << 4, DstMask << 4));
3994 }
3995 // Extend to real size of shadow, poisoning either all or none bits of an
3996 // element.
3997 S = IRB.CreateSExt(SI1, S->getType(), "_msdpp");
3998
3999 setShadow(&I, S);
4000 setOriginForNaryOp(I);
4001 }
4002
4003 Value *convertBlendvToSelectMask(IRBuilder<> &IRB, Value *C) {
4004 C = CreateAppToShadowCast(IRB, C);
4005 FixedVectorType *FVT = cast<FixedVectorType>(C->getType());
4006 unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits();
4007 C = IRB.CreateAShr(C, ElSize - 1);
4008 FVT = FixedVectorType::get(IRB.getInt1Ty(), FVT->getNumElements());
4009 return IRB.CreateTrunc(C, FVT);
4010 }
4011
4012 // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`.
4013 void handleBlendvIntrinsic(IntrinsicInst &I) {
4014 Value *C = I.getOperand(2);
4015 Value *T = I.getOperand(1);
4016 Value *F = I.getOperand(0);
4017
4018 Value *Sc = getShadow(&I, 2);
4019 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
4020
4021 {
4022 IRBuilder<> IRB(&I);
4023 // Extract top bit from condition and its shadow.
4024 C = convertBlendvToSelectMask(IRB, C);
4025 Sc = convertBlendvToSelectMask(IRB, Sc);
4026
4027 setShadow(C, Sc);
4028 setOrigin(C, Oc);
4029 }
4030
4031 handleSelectLikeInst(I, C, T, F);
4032 }
4033
4034 // Instrument sum-of-absolute-differences intrinsic.
4035 void handleVectorSadIntrinsic(IntrinsicInst &I, bool IsMMX = false) {
4036 const unsigned SignificantBitsPerResultElement = 16;
4037 Type *ResTy = IsMMX ? IntegerType::get(*MS.C, 64) : I.getType();
4038 unsigned ZeroBitsPerResultElement =
4039 ResTy->getScalarSizeInBits() - SignificantBitsPerResultElement;
4040
4041 IRBuilder<> IRB(&I);
4042 auto *Shadow0 = getShadow(&I, 0);
4043 auto *Shadow1 = getShadow(&I, 1);
4044 Value *S = IRB.CreateOr(Shadow0, Shadow1);
4045 S = IRB.CreateBitCast(S, ResTy);
4046 S = IRB.CreateSExt(IRB.CreateICmpNE(S, Constant::getNullValue(ResTy)),
4047 ResTy);
4048 S = IRB.CreateLShr(S, ZeroBitsPerResultElement);
4049 S = IRB.CreateBitCast(S, getShadowTy(&I));
4050 setShadow(&I, S);
4051 setOriginForNaryOp(I);
4052 }
4053
4054 // Instrument dot-product / multiply-add(-accumulate)? intrinsics.
4055 //
4056 // e.g., Two operands:
4057 // <4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16> %a, <8 x i16> %b)
4058 //
4059 // Two operands which require an EltSizeInBits override:
4060 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64> %a, <1 x i64> %b)
4061 //
4062 // Three operands:
4063 // <4 x i32> @llvm.x86.avx512.vpdpbusd.128
4064 // (<4 x i32> %s, <16 x i8> %a, <16 x i8> %b)
4065 // <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16
4066 // (<2 x float> %acc, <4 x bfloat> %a, <4 x bfloat> %b)
4067 // (these are equivalent to multiply-add on %a and %b, followed by
4068 // adding/"accumulating" %s. "Accumulation" stores the result in one
4069 // of the source registers, but this accumulate vs. add distinction
4070 // is lost when dealing with LLVM intrinsics.)
4071 //
4072 // ZeroPurifies means that multiplying a known-zero with an uninitialized
4073 // value results in an initialized value. This is applicable for integer
4074 // multiplication, but not floating-point (counter-example: NaN).
4075 void handleVectorDotProductIntrinsic(IntrinsicInst &I,
4076 unsigned ReductionFactor,
4077 bool ZeroPurifies,
4078 unsigned EltSizeInBits,
4079 enum OddOrEvenLanes Lanes) {
4080 IRBuilder<> IRB(&I);
4081
4082 [[maybe_unused]] FixedVectorType *ReturnType =
4083 cast<FixedVectorType>(I.getType());
4084 assert(isa<FixedVectorType>(ReturnType));
4085
4086 // Vectors A and B, and shadows
4087 Value *Va = nullptr;
4088 Value *Vb = nullptr;
4089 Value *Sa = nullptr;
4090 Value *Sb = nullptr;
4091
4092 assert(I.arg_size() == 2 || I.arg_size() == 3);
4093 if (I.arg_size() == 2) {
4094 assert(Lanes == kBothLanes);
4095
4096 Va = I.getOperand(0);
4097 Vb = I.getOperand(1);
4098
4099 Sa = getShadow(&I, 0);
4100 Sb = getShadow(&I, 1);
4101 } else if (I.arg_size() == 3) {
4102 // Operand 0 is the accumulator. We will deal with that below.
4103 Va = I.getOperand(1);
4104 Vb = I.getOperand(2);
4105
4106 Sa = getShadow(&I, 1);
4107 Sb = getShadow(&I, 2);
4108
4109 if (Lanes == kEvenLanes || Lanes == kOddLanes) {
4110 // Convert < S0, S1, S2, S3, S4, S5, S6, S7 >
4111 // to < S0, S0, S2, S2, S4, S4, S6, S6 > (if even)
4112 // to < S1, S1, S3, S3, S5, S5, S7, S7 > (if odd)
4113 //
4114 // Note: for aarch64.neon.bfmlalb/t, the odd/even-indexed values are
4115 // zeroed, not duplicated. However, for shadow propagation, this
4116 // distinction is unimportant because Step 1 below will squeeze
4117 // each pair of elements (e.g., [S0, S0]) into a single bit, and
4118 // we only care if it is fully initialized.
4119
4120 FixedVectorType *InputShadowType = cast<FixedVectorType>(Sa->getType());
4121 unsigned Width = InputShadowType->getNumElements();
4122
4123 Sa = IRB.CreateShuffleVector(
4124 Sa, getPclmulMask(Width, /*OddElements=*/Lanes == kOddLanes));
4125 Sb = IRB.CreateShuffleVector(
4126 Sb, getPclmulMask(Width, /*OddElements=*/Lanes == kOddLanes));
4127 }
4128 }
4129
4130 FixedVectorType *ParamType = cast<FixedVectorType>(Va->getType());
4131 assert(ParamType == Vb->getType());
4132
4133 assert(ParamType->getPrimitiveSizeInBits() ==
4134 ReturnType->getPrimitiveSizeInBits());
4135
4136 if (I.arg_size() == 3) {
4137 [[maybe_unused]] auto *AccumulatorType =
4138 cast<FixedVectorType>(I.getOperand(0)->getType());
4139 assert(AccumulatorType == ReturnType);
4140 }
4141
4142 FixedVectorType *ImplicitReturnType =
4143 cast<FixedVectorType>(getShadowTy(ReturnType));
4144 // Step 1: instrument multiplication of corresponding vector elements
4145 if (EltSizeInBits) {
4146 ImplicitReturnType = cast<FixedVectorType>(
4147 getMMXVectorTy(EltSizeInBits * ReductionFactor,
4148 ParamType->getPrimitiveSizeInBits()));
4149 ParamType = cast<FixedVectorType>(
4150 getMMXVectorTy(EltSizeInBits, ParamType->getPrimitiveSizeInBits()));
4151
4152 Va = IRB.CreateBitCast(Va, ParamType);
4153 Vb = IRB.CreateBitCast(Vb, ParamType);
4154
4155 Sa = IRB.CreateBitCast(Sa, getShadowTy(ParamType));
4156 Sb = IRB.CreateBitCast(Sb, getShadowTy(ParamType));
4157 } else {
4158 assert(ParamType->getNumElements() ==
4159 ReturnType->getNumElements() * ReductionFactor);
4160 }
4161
4162 // Each element of the vector is represented by a single bit (poisoned or
4163 // not) e.g., <8 x i1>.
4164 Value *SaNonZero = IRB.CreateIsNotNull(Sa);
4165 Value *SbNonZero = IRB.CreateIsNotNull(Sb);
4166 Value *And;
4167 if (ZeroPurifies) {
4168 // Multiplying an *initialized* zero by an uninitialized element results
4169 // in an initialized zero element.
4170 //
4171 // This is analogous to bitwise AND, where "AND" of 0 and a poisoned value
4172 // results in an unpoisoned value.
4173 Value *VaInt = Va;
4174 Value *VbInt = Vb;
4175 if (!Va->getType()->isIntegerTy()) {
4176 VaInt = CreateAppToShadowCast(IRB, Va);
4177 VbInt = CreateAppToShadowCast(IRB, Vb);
4178 }
4179
4180 // We check for non-zero on a per-element basis, not per-bit.
4181 Value *VaNonZero = IRB.CreateIsNotNull(VaInt);
4182 Value *VbNonZero = IRB.CreateIsNotNull(VbInt);
4183
4184 And = handleBitwiseAnd(IRB, VaNonZero, VbNonZero, SaNonZero, SbNonZero);
4185 } else {
4186 And = IRB.CreateOr({SaNonZero, SbNonZero});
4187 }
4188
4189 // Extend <8 x i1> to <8 x i16>.
4190 // (The real pmadd intrinsic would have computed intermediate values of
4191 // <8 x i32>, but that is irrelevant for our shadow purposes because we
4192 // consider each element to be either fully initialized or fully
4193 // uninitialized.)
4194 And = IRB.CreateSExt(And, Sa->getType());
4195
4196 // Step 2: instrument horizontal add
4197 // We don't need bit-precise horizontalReduce because we only want to check
4198 // if each pair/quad of elements is fully zero.
4199 // Cast to <4 x i32>.
4200 Value *Horizontal = IRB.CreateBitCast(And, ImplicitReturnType);
4201
4202 // Compute <4 x i1>, then extend back to <4 x i32>.
4203 Value *OutShadow = IRB.CreateSExt(
4204 IRB.CreateICmpNE(Horizontal,
4205 Constant::getNullValue(Horizontal->getType())),
4206 ImplicitReturnType);
4207
4208 // Cast it back to the required fake return type (if MMX: <1 x i64>; for
4209 // AVX, it is already correct).
4210 if (EltSizeInBits)
4211 OutShadow = CreateShadowCast(IRB, OutShadow, getShadowTy(&I));
4212
4213 // Step 3 (if applicable): instrument accumulator
4214 if (I.arg_size() == 3)
4215 OutShadow = IRB.CreateOr(OutShadow, getShadow(&I, 0));
4216
4217 setShadow(&I, OutShadow);
4218 setOriginForNaryOp(I);
4219 }
4220
4221 // Instrument compare-packed intrinsic.
4222 //
4223 // x86 has the predicate as the third operand, which is ImmArg e.g.,
4224 // - <4 x double> @llvm.x86.avx.cmp.pd.256(<4 x double>, <4 x double>, i8)
4225 // - <2 x double> @llvm.x86.sse2.cmp.pd(<2 x double>, <2 x double>, i8)
4226 //
4227 // while Arm has separate intrinsics for >= and > e.g.,
4228 // - <2 x i32> @llvm.aarch64.neon.facge.v2i32.v2f32
4229 // (<2 x float> %A, <2 x float>)
4230 // - <2 x i32> @llvm.aarch64.neon.facgt.v2i32.v2f32
4231 // (<2 x float> %A, <2 x float>)
4232 //
4233 // Bonus: this also handles scalar cases e.g.,
4234 // - i32 @llvm.aarch64.neon.facgt.i32.f32(float %A, float %B)
4235 void handleVectorComparePackedIntrinsic(IntrinsicInst &I,
4236 bool PredicateAsOperand) {
4237 if (PredicateAsOperand) {
4238 assert(I.arg_size() == 3);
4239 assert(I.paramHasAttr(2, Attribute::ImmArg));
4240 } else
4241 assert(I.arg_size() == 2);
4242
4243 IRBuilder<> IRB(&I);
4244
4245 // Basically, an or followed by sext(icmp ne 0) to end up with all-zeros or
4246 // all-ones shadow.
4247 Type *ResTy = getShadowTy(&I);
4248 auto *Shadow0 = getShadow(&I, 0);
4249 auto *Shadow1 = getShadow(&I, 1);
4250 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4251 Value *S = IRB.CreateSExt(
4252 IRB.CreateICmpNE(S0, Constant::getNullValue(ResTy)), ResTy);
4253 setShadow(&I, S);
4254 setOriginForNaryOp(I);
4255 }
4256
4257 // Instrument compare-scalar intrinsic.
4258 // This handles both cmp* intrinsics which return the result in the first
4259 // element of a vector, and comi* which return the result as i32.
4260 void handleVectorCompareScalarIntrinsic(IntrinsicInst &I) {
4261 IRBuilder<> IRB(&I);
4262 auto *Shadow0 = getShadow(&I, 0);
4263 auto *Shadow1 = getShadow(&I, 1);
4264 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4265 Value *S = LowerElementShadowExtend(IRB, S0, getShadowTy(&I));
4266 setShadow(&I, S);
4267 setOriginForNaryOp(I);
4268 }
4269
4270 // Instrument generic vector reduction intrinsics
4271 // by ORing together all their fields.
4272 //
4273 // If AllowShadowCast is true, the return type does not need to be the same
4274 // type as the fields
4275 // e.g., declare i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8>)
4276 void handleVectorReduceIntrinsic(IntrinsicInst &I, bool AllowShadowCast) {
4277 assert(I.arg_size() == 1);
4278
4279 IRBuilder<> IRB(&I);
4280 Value *S = IRB.CreateOrReduce(getShadow(&I, 0));
4281 if (AllowShadowCast)
4282 S = CreateShadowCast(IRB, S, getShadowTy(&I));
4283 else
4284 assert(S->getType() == getShadowTy(&I));
4285 setShadow(&I, S);
4286 setOriginForNaryOp(I);
4287 }
4288
4289 // Similar to handleVectorReduceIntrinsic but with an initial starting value.
4290 // e.g., call float @llvm.vector.reduce.fadd.f32.v2f32(float %a0, <2 x float>
4291 // %a1)
4292 // shadow = shadow[a0] | shadow[a1.0] | shadow[a1.1]
4293 //
4294 // The type of the return value, initial starting value, and elements of the
4295 // vector must be identical.
4296 void handleVectorReduceWithStarterIntrinsic(IntrinsicInst &I) {
4297 assert(I.arg_size() == 2);
4298
4299 IRBuilder<> IRB(&I);
4300 Value *Shadow0 = getShadow(&I, 0);
4301 Value *Shadow1 = IRB.CreateOrReduce(getShadow(&I, 1));
4302 assert(Shadow0->getType() == Shadow1->getType());
4303 Value *S = IRB.CreateOr(Shadow0, Shadow1);
4304 assert(S->getType() == getShadowTy(&I));
4305 setShadow(&I, S);
4306 setOriginForNaryOp(I);
4307 }
4308
4309 // Instrument vector.reduce.or intrinsic.
4310 // Valid (non-poisoned) set bits in the operand pull low the
4311 // corresponding shadow bits.
4312 void handleVectorReduceOrIntrinsic(IntrinsicInst &I) {
4313 assert(I.arg_size() == 1);
4314
4315 IRBuilder<> IRB(&I);
4316 Value *OperandShadow = getShadow(&I, 0);
4317 Value *OperandUnsetBits = IRB.CreateNot(I.getOperand(0));
4318 Value *OperandUnsetOrPoison = IRB.CreateOr(OperandUnsetBits, OperandShadow);
4319 // Bit N is clean if any field's bit N is 1 and unpoison
4320 Value *OutShadowMask = IRB.CreateAndReduce(OperandUnsetOrPoison);
4321 // Otherwise, it is clean if every field's bit N is unpoison
4322 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4323 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4324
4325 setShadow(&I, S);
4326 setOrigin(&I, getOrigin(&I, 0));
4327 }
4328
4329 // Instrument vector.reduce.and intrinsic.
4330 // Valid (non-poisoned) unset bits in the operand pull down the
4331 // corresponding shadow bits.
4332 void handleVectorReduceAndIntrinsic(IntrinsicInst &I) {
4333 assert(I.arg_size() == 1);
4334
4335 IRBuilder<> IRB(&I);
4336 Value *OperandShadow = getShadow(&I, 0);
4337 Value *OperandSetOrPoison = IRB.CreateOr(I.getOperand(0), OperandShadow);
4338 // Bit N is clean if any field's bit N is 0 and unpoison
4339 Value *OutShadowMask = IRB.CreateAndReduce(OperandSetOrPoison);
4340 // Otherwise, it is clean if every field's bit N is unpoison
4341 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4342 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4343
4344 setShadow(&I, S);
4345 setOrigin(&I, getOrigin(&I, 0));
4346 }
4347
4348 void handleStmxcsr(IntrinsicInst &I) {
4349 IRBuilder<> IRB(&I);
4350 Value *Addr = I.getArgOperand(0);
4351 Type *Ty = IRB.getInt32Ty();
4352 Value *ShadowPtr =
4353 getShadowOriginPtr(Addr, IRB, Ty, Align(1), /*isStore*/ true).first;
4354
4355 IRB.CreateStore(getCleanShadow(Ty), ShadowPtr);
4356
4358 insertCheckShadowOf(Addr, &I);
4359 }
4360
4361 void handleLdmxcsr(IntrinsicInst &I) {
4362 if (!InsertChecks)
4363 return;
4364
4365 IRBuilder<> IRB(&I);
4366 Value *Addr = I.getArgOperand(0);
4367 Type *Ty = IRB.getInt32Ty();
4368 const Align Alignment = Align(1);
4369 Value *ShadowPtr, *OriginPtr;
4370 std::tie(ShadowPtr, OriginPtr) =
4371 getShadowOriginPtr(Addr, IRB, Ty, Alignment, /*isStore*/ false);
4372
4374 insertCheckShadowOf(Addr, &I);
4375
4376 Value *Shadow = IRB.CreateAlignedLoad(Ty, ShadowPtr, Alignment, "_ldmxcsr");
4377 Value *Origin = MS.TrackOrigins ? IRB.CreateLoad(MS.OriginTy, OriginPtr)
4378 : getCleanOrigin();
4379 insertCheckShadow(Shadow, Origin, &I);
4380 }
4381
4382 void handleMaskedExpandLoad(IntrinsicInst &I) {
4383 IRBuilder<> IRB(&I);
4384 Value *Ptr = I.getArgOperand(0);
4385 MaybeAlign Align = I.getParamAlign(0);
4386 Value *Mask = I.getArgOperand(1);
4387 Value *PassThru = I.getArgOperand(2);
4388
4390 insertCheckShadowOf(Ptr, &I);
4391 insertCheckShadowOf(Mask, &I);
4392 }
4393
4394 if (!PropagateShadow) {
4395 setShadow(&I, getCleanShadow(&I));
4396 setOrigin(&I, getCleanOrigin());
4397 return;
4398 }
4399
4400 Type *ShadowTy = getShadowTy(&I);
4401 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4402 auto [ShadowPtr, OriginPtr] =
4403 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ false);
4404
4405 Value *Shadow =
4406 IRB.CreateMaskedExpandLoad(ShadowTy, ShadowPtr, Align, Mask,
4407 getShadow(PassThru), "_msmaskedexpload");
4408
4409 setShadow(&I, Shadow);
4410
4411 // TODO: Store origins.
4412 setOrigin(&I, getCleanOrigin());
4413 }
4414
4415 void handleMaskedCompressStore(IntrinsicInst &I) {
4416 IRBuilder<> IRB(&I);
4417 Value *Values = I.getArgOperand(0);
4418 Value *Ptr = I.getArgOperand(1);
4419 MaybeAlign Align = I.getParamAlign(1);
4420 Value *Mask = I.getArgOperand(2);
4421
4423 insertCheckShadowOf(Ptr, &I);
4424 insertCheckShadowOf(Mask, &I);
4425 }
4426
4427 Value *Shadow = getShadow(Values);
4428 Type *ElementShadowTy =
4429 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4430 auto [ShadowPtr, OriginPtrs] =
4431 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ true);
4432
4433 IRB.CreateMaskedCompressStore(Shadow, ShadowPtr, Align, Mask);
4434
4435 // TODO: Store origins.
4436 }
4437
4438 void handleMaskedGather(IntrinsicInst &I) {
4439 IRBuilder<> IRB(&I);
4440 Value *Ptrs = I.getArgOperand(0);
4441 const Align Alignment = I.getParamAlign(0).valueOrOne();
4442 Value *Mask = I.getArgOperand(1);
4443 Value *PassThru = I.getArgOperand(2);
4444
4445 Type *PtrsShadowTy = getShadowTy(Ptrs);
4447 insertCheckShadowOf(Mask, &I);
4448 Value *MaskedPtrShadow = IRB.CreateSelect(
4449 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4450 "_msmaskedptrs");
4451 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4452 }
4453
4454 if (!PropagateShadow) {
4455 setShadow(&I, getCleanShadow(&I));
4456 setOrigin(&I, getCleanOrigin());
4457 return;
4458 }
4459
4460 Type *ShadowTy = getShadowTy(&I);
4461 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4462 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4463 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ false);
4464
4465 Value *Shadow =
4466 IRB.CreateMaskedGather(ShadowTy, ShadowPtrs, Alignment, Mask,
4467 getShadow(PassThru), "_msmaskedgather");
4468
4469 setShadow(&I, Shadow);
4470
4471 // TODO: Store origins.
4472 setOrigin(&I, getCleanOrigin());
4473 }
4474
4475 void handleMaskedScatter(IntrinsicInst &I) {
4476 IRBuilder<> IRB(&I);
4477 Value *Values = I.getArgOperand(0);
4478 Value *Ptrs = I.getArgOperand(1);
4479 const Align Alignment = I.getParamAlign(1).valueOrOne();
4480 Value *Mask = I.getArgOperand(2);
4481
4482 Type *PtrsShadowTy = getShadowTy(Ptrs);
4484 insertCheckShadowOf(Mask, &I);
4485 Value *MaskedPtrShadow = IRB.CreateSelect(
4486 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4487 "_msmaskedptrs");
4488 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4489 }
4490
4491 Value *Shadow = getShadow(Values);
4492 Type *ElementShadowTy =
4493 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4494 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4495 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ true);
4496
4497 IRB.CreateMaskedScatter(Shadow, ShadowPtrs, Alignment, Mask);
4498
4499 // TODO: Store origin.
4500 }
4501
4502 // Intrinsic::masked_store
4503 //
4504 // Note: handleAVXMaskedStore handles AVX/AVX2 variants, though AVX512 masked
4505 // stores are lowered to Intrinsic::masked_store.
4506 void handleMaskedStore(IntrinsicInst &I) {
4507 IRBuilder<> IRB(&I);
4508 Value *V = I.getArgOperand(0);
4509 Value *Ptr = I.getArgOperand(1);
4510 const Align Alignment = I.getParamAlign(1).valueOrOne();
4511 Value *Mask = I.getArgOperand(2);
4512 Value *Shadow = getShadow(V);
4513
4515 insertCheckShadowOf(Ptr, &I);
4516 insertCheckShadowOf(Mask, &I);
4517 }
4518
4519 Value *ShadowPtr;
4520 Value *OriginPtr;
4521 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
4522 Ptr, IRB, Shadow->getType(), Alignment, /*isStore*/ true);
4523
4524 IRB.CreateMaskedStore(Shadow, ShadowPtr, Alignment, Mask);
4525
4526 if (!MS.TrackOrigins)
4527 return;
4528
4529 auto &DL = F.getDataLayout();
4530 paintOrigin(IRB, getOrigin(V), OriginPtr,
4531 DL.getTypeStoreSize(Shadow->getType()),
4532 std::max(Alignment, kMinOriginAlignment));
4533 }
4534
4535 // Intrinsic::masked_load
4536 //
4537 // Note: handleAVXMaskedLoad handles AVX/AVX2 variants, though AVX512 masked
4538 // loads are lowered to Intrinsic::masked_load.
4539 void handleMaskedLoad(IntrinsicInst &I) {
4540 IRBuilder<> IRB(&I);
4541 Value *Ptr = I.getArgOperand(0);
4542 const Align Alignment = I.getParamAlign(0).valueOrOne();
4543 Value *Mask = I.getArgOperand(1);
4544 Value *PassThru = I.getArgOperand(2);
4545
4547 insertCheckShadowOf(Ptr, &I);
4548 insertCheckShadowOf(Mask, &I);
4549 }
4550
4551 if (!PropagateShadow) {
4552 setShadow(&I, getCleanShadow(&I));
4553 setOrigin(&I, getCleanOrigin());
4554 return;
4555 }
4556
4557 Type *ShadowTy = getShadowTy(&I);
4558 Value *ShadowPtr, *OriginPtr;
4559 std::tie(ShadowPtr, OriginPtr) =
4560 getShadowOriginPtr(Ptr, IRB, ShadowTy, Alignment, /*isStore*/ false);
4561 setShadow(&I, IRB.CreateMaskedLoad(ShadowTy, ShadowPtr, Alignment, Mask,
4562 getShadow(PassThru), "_msmaskedld"));
4563
4564 if (!MS.TrackOrigins)
4565 return;
4566
4567 // Choose between PassThru's and the loaded value's origins.
4568 Value *MaskedPassThruShadow = IRB.CreateAnd(
4569 getShadow(PassThru), IRB.CreateSExt(IRB.CreateNeg(Mask), ShadowTy));
4570
4571 Value *NotNull = convertToBool(MaskedPassThruShadow, IRB, "_mscmp");
4572
4573 Value *PtrOrigin = IRB.CreateLoad(MS.OriginTy, OriginPtr);
4574 Value *Origin = IRB.CreateSelect(NotNull, getOrigin(PassThru), PtrOrigin);
4575
4576 setOrigin(&I, Origin);
4577 }
4578
4579 // e.g., void @llvm.x86.avx.maskstore.ps.256(ptr, <8 x i32>, <8 x float>)
4580 // dst mask src
4581 //
4582 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4583 // by handleMaskedStore.
4584 //
4585 // This function handles AVX and AVX2 masked stores; these use the MSBs of a
4586 // vector of integers, unlike the LLVM masked intrinsics, which require a
4587 // vector of booleans. X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad
4588 // mentions that the x86 backend does not know how to efficiently convert
4589 // from a vector of booleans back into the AVX mask format; therefore, they
4590 // (and we) do not reduce AVX/AVX2 masked intrinsics into LLVM masked
4591 // intrinsics.
4592 void handleAVXMaskedStore(IntrinsicInst &I) {
4593 assert(I.arg_size() == 3);
4594
4595 IRBuilder<> IRB(&I);
4596
4597 Value *Dst = I.getArgOperand(0);
4598 assert(Dst->getType()->isPointerTy() && "Destination is not a pointer!");
4599
4600 Value *Mask = I.getArgOperand(1);
4601 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4602
4603 Value *Src = I.getArgOperand(2);
4604 assert(isa<VectorType>(Src->getType()) && "Source is not a vector!");
4605
4606 const Align Alignment = Align(1);
4607
4608 Value *SrcShadow = getShadow(Src);
4609
4611 insertCheckShadowOf(Dst, &I);
4612 insertCheckShadowOf(Mask, &I);
4613 }
4614
4615 Value *DstShadowPtr;
4616 Value *DstOriginPtr;
4617 std::tie(DstShadowPtr, DstOriginPtr) = getShadowOriginPtr(
4618 Dst, IRB, SrcShadow->getType(), Alignment, /*isStore*/ true);
4619
4620 SmallVector<Value *, 2> ShadowArgs;
4621 ShadowArgs.append(1, DstShadowPtr);
4622 ShadowArgs.append(1, Mask);
4623 // The intrinsic may require floating-point but shadows can be arbitrary
4624 // bit patterns, of which some would be interpreted as "invalid"
4625 // floating-point values (NaN etc.); we assume the intrinsic will happily
4626 // copy them.
4627 ShadowArgs.append(1, IRB.CreateBitCast(SrcShadow, Src->getType()));
4628
4629 CallInst *CI =
4630 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
4631 setShadow(&I, CI);
4632
4633 if (!MS.TrackOrigins)
4634 return;
4635
4636 // Approximation only
4637 auto &DL = F.getDataLayout();
4638 paintOrigin(IRB, getOrigin(Src), DstOriginPtr,
4639 DL.getTypeStoreSize(SrcShadow->getType()),
4640 std::max(Alignment, kMinOriginAlignment));
4641 }
4642
4643 // e.g., <8 x float> @llvm.x86.avx.maskload.ps.256(ptr, <8 x i32>)
4644 // return src mask
4645 //
4646 // Masked-off values are replaced with 0, which conveniently also represents
4647 // initialized memory.
4648 //
4649 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4650 // by handleMaskedStore.
4651 //
4652 // We do not combine this with handleMaskedLoad; see comment in
4653 // handleAVXMaskedStore for the rationale.
4654 //
4655 // This is subtly different than handleIntrinsicByApplyingToShadow(I, 1)
4656 // because we need to apply getShadowOriginPtr, not getShadow, to the first
4657 // parameter.
4658 void handleAVXMaskedLoad(IntrinsicInst &I) {
4659 assert(I.arg_size() == 2);
4660
4661 IRBuilder<> IRB(&I);
4662
4663 Value *Src = I.getArgOperand(0);
4664 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
4665
4666 Value *Mask = I.getArgOperand(1);
4667 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4668
4669 const Align Alignment = Align(1);
4670
4672 insertCheckShadowOf(Mask, &I);
4673 }
4674
4675 Type *SrcShadowTy = getShadowTy(Src);
4676 Value *SrcShadowPtr, *SrcOriginPtr;
4677 std::tie(SrcShadowPtr, SrcOriginPtr) =
4678 getShadowOriginPtr(Src, IRB, SrcShadowTy, Alignment, /*isStore*/ false);
4679
4680 SmallVector<Value *, 2> ShadowArgs;
4681 ShadowArgs.append(1, SrcShadowPtr);
4682 ShadowArgs.append(1, Mask);
4683
4684 CallInst *CI =
4685 IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(), ShadowArgs);
4686 // The AVX masked load intrinsics do not have integer variants. We use the
4687 // floating-point variants, which will happily copy the shadows even if
4688 // they are interpreted as "invalid" floating-point values (NaN etc.).
4689 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4690
4691 if (!MS.TrackOrigins)
4692 return;
4693
4694 // The "pass-through" value is always zero (initialized). To the extent
4695 // that that results in initialized aligned 4-byte chunks, the origin value
4696 // is ignored. It is therefore correct to simply copy the origin from src.
4697 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
4698 setOrigin(&I, PtrSrcOrigin);
4699 }
4700
4701 // Test whether the mask indices are initialized, only checking the bits that
4702 // are actually used.
4703 //
4704 // e.g., if Idx is <32 x i16>, only (log2(32) == 5) bits of each index are
4705 // used/checked.
4706 void maskedCheckAVXIndexShadow(IRBuilder<> &IRB, Value *Idx, Instruction *I) {
4707 assert(isFixedIntVector(Idx));
4708 auto IdxVectorSize =
4709 cast<FixedVectorType>(Idx->getType())->getNumElements();
4710 assert(isPowerOf2_64(IdxVectorSize));
4711
4712 // Compiler isn't smart enough, let's help it
4713 if (isa<Constant>(Idx))
4714 return;
4715
4716 auto *IdxShadow = getShadow(Idx);
4717 Value *Truncated = IRB.CreateTrunc(
4718 IdxShadow,
4719 FixedVectorType::get(Type::getIntNTy(*MS.C, Log2_64(IdxVectorSize)),
4720 IdxVectorSize));
4721 insertCheckShadow(Truncated, getOrigin(Idx), I);
4722 }
4723
4724 // Instrument AVX permutation intrinsic.
4725 // We apply the same permutation (argument index 1) to the shadow.
4726 void handleAVXVpermilvar(IntrinsicInst &I) {
4727 IRBuilder<> IRB(&I);
4728 Value *Shadow = getShadow(&I, 0);
4729 maskedCheckAVXIndexShadow(IRB, I.getArgOperand(1), &I);
4730
4731 // Shadows are integer-ish types but some intrinsics require a
4732 // different (e.g., floating-point) type.
4733 Shadow = IRB.CreateBitCast(Shadow, I.getArgOperand(0)->getType());
4734 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4735 {Shadow, I.getArgOperand(1)});
4736
4737 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4738 setOriginForNaryOp(I);
4739 }
4740
4741 // Instrument AVX permutation intrinsic.
4742 // We apply the same permutation (argument index 1) to the shadows.
4743 void handleAVXVpermi2var(IntrinsicInst &I) {
4744 assert(I.arg_size() == 3);
4745 assert(isa<FixedVectorType>(I.getArgOperand(0)->getType()));
4746 assert(isa<FixedVectorType>(I.getArgOperand(1)->getType()));
4747 assert(isa<FixedVectorType>(I.getArgOperand(2)->getType()));
4748 [[maybe_unused]] auto ArgVectorSize =
4749 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4750 assert(cast<FixedVectorType>(I.getArgOperand(1)->getType())
4751 ->getNumElements() == ArgVectorSize);
4752 assert(cast<FixedVectorType>(I.getArgOperand(2)->getType())
4753 ->getNumElements() == ArgVectorSize);
4754 assert(I.getArgOperand(0)->getType() == I.getArgOperand(2)->getType());
4755 assert(I.getType() == I.getArgOperand(0)->getType());
4756 assert(I.getArgOperand(1)->getType()->isIntOrIntVectorTy());
4757 IRBuilder<> IRB(&I);
4758 Value *AShadow = getShadow(&I, 0);
4759 Value *Idx = I.getArgOperand(1);
4760 Value *BShadow = getShadow(&I, 2);
4761
4762 maskedCheckAVXIndexShadow(IRB, Idx, &I);
4763
4764 // Shadows are integer-ish types but some intrinsics require a
4765 // different (e.g., floating-point) type.
4766 AShadow = IRB.CreateBitCast(AShadow, I.getArgOperand(0)->getType());
4767 BShadow = IRB.CreateBitCast(BShadow, I.getArgOperand(2)->getType());
4768 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4769 {AShadow, Idx, BShadow});
4770 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4771 setOriginForNaryOp(I);
4772 }
4773
4774 [[maybe_unused]] static bool isFixedIntVectorTy(const Type *T) {
4775 return isa<FixedVectorType>(T) && T->isIntOrIntVectorTy();
4776 }
4777
4778 [[maybe_unused]] static bool isFixedFPVectorTy(const Type *T) {
4779 return isa<FixedVectorType>(T) && T->isFPOrFPVectorTy();
4780 }
4781
4782 [[maybe_unused]] static bool isFixedIntVector(const Value *V) {
4783 return isFixedIntVectorTy(V->getType());
4784 }
4785
4786 [[maybe_unused]] static bool isFixedFPVector(const Value *V) {
4787 return isFixedFPVectorTy(V->getType());
4788 }
4789
4790 // e.g., <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
4791 // (<16 x float> a, <16 x i32> writethru, i16 mask,
4792 // i32 rounding)
4793 //
4794 // Inconveniently, some similar intrinsics have a different operand order:
4795 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
4796 // (<16 x float> a, i32 rounding, <16 x i16> writethru,
4797 // i16 mask)
4798 //
4799 // If the return type has more elements than A, the excess elements are
4800 // zeroed (and the corresponding shadow is initialized).
4801 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
4802 // (<4 x float> a, i32 rounding, <8 x i16> writethru,
4803 // i8 mask)
4804 //
4805 // dst[i] = mask[i] ? convert(a[i]) : writethru[i]
4806 // dst_shadow[i] = mask[i] ? all_or_nothing(a_shadow[i]) : writethru_shadow[i]
4807 // where all_or_nothing(x) is fully uninitialized if x has any
4808 // uninitialized bits
4809 void handleAVX512VectorConvertFPToInt(IntrinsicInst &I, bool LastMask) {
4810 IRBuilder<> IRB(&I);
4811
4812 assert(I.arg_size() == 4);
4813 Value *A = I.getOperand(0);
4814 Value *WriteThrough;
4815 Value *Mask;
4817 if (LastMask) {
4818 WriteThrough = I.getOperand(2);
4819 Mask = I.getOperand(3);
4820 RoundingMode = I.getOperand(1);
4821 } else {
4822 WriteThrough = I.getOperand(1);
4823 Mask = I.getOperand(2);
4824 RoundingMode = I.getOperand(3);
4825 }
4826
4827 assert(isFixedFPVector(A));
4828 assert(isFixedIntVector(WriteThrough));
4829
4830 unsigned ANumElements =
4831 cast<FixedVectorType>(A->getType())->getNumElements();
4832 [[maybe_unused]] unsigned WriteThruNumElements =
4833 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4834 assert(ANumElements == WriteThruNumElements ||
4835 ANumElements * 2 == WriteThruNumElements);
4836
4837 assert(Mask->getType()->isIntegerTy());
4838 unsigned MaskNumElements = Mask->getType()->getScalarSizeInBits();
4839 assert(ANumElements == MaskNumElements ||
4840 ANumElements * 2 == MaskNumElements);
4841
4842 assert(WriteThruNumElements == MaskNumElements);
4843
4844 // Some bits of the mask may be unused, though it's unusual to have partly
4845 // uninitialized bits.
4846 insertCheckShadowOf(Mask, &I);
4847
4848 assert(RoundingMode->getType()->isIntegerTy());
4849 // Only some bits of the rounding mode are used, though it's very
4850 // unusual to have uninitialized bits there (more commonly, it's a
4851 // constant).
4852 insertCheckShadowOf(RoundingMode, &I);
4853
4854 assert(I.getType() == WriteThrough->getType());
4855
4856 Value *AShadow = getShadow(A);
4857 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4858
4859 if (ANumElements * 2 == MaskNumElements) {
4860 // Ensure that the irrelevant bits of the mask are zero, hence selecting
4861 // from the zeroed shadow instead of the writethrough's shadow.
4862 Mask =
4863 IRB.CreateTrunc(Mask, IRB.getIntNTy(ANumElements), "_ms_mask_trunc");
4864 Mask =
4865 IRB.CreateZExt(Mask, IRB.getIntNTy(MaskNumElements), "_ms_mask_zext");
4866 }
4867
4868 // Convert i16 mask to <16 x i1>
4869 Mask = IRB.CreateBitCast(
4870 Mask, FixedVectorType::get(IRB.getInt1Ty(), MaskNumElements),
4871 "_ms_mask_bitcast");
4872
4873 /// For floating-point to integer conversion, the output is:
4874 /// - fully uninitialized if *any* bit of the input is uninitialized
4875 /// - fully ininitialized if all bits of the input are ininitialized
4876 /// We apply the same principle on a per-element basis for vectors.
4877 ///
4878 /// We use the scalar width of the return type instead of A's.
4879 AShadow = IRB.CreateSExt(
4880 IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow->getType())),
4881 getShadowTy(&I), "_ms_a_shadow");
4882
4883 Value *WriteThroughShadow = getShadow(WriteThrough);
4884 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow,
4885 "_ms_writethru_select");
4886
4887 setShadow(&I, Shadow);
4888 setOriginForNaryOp(I);
4889 }
4890
4891 // Instrument BMI / BMI2 intrinsics.
4892 // All of these intrinsics are Z = I(X, Y)
4893 // where the types of all operands and the result match, and are either i32 or
4894 // i64. The following instrumentation happens to work for all of them:
4895 // Sz = I(Sx, Y) | (sext (Sy != 0))
4896 void handleBmiIntrinsic(IntrinsicInst &I) {
4897 IRBuilder<> IRB(&I);
4898 Type *ShadowTy = getShadowTy(&I);
4899
4900 // If any bit of the mask operand is poisoned, then the whole thing is.
4901 Value *SMask = getShadow(&I, 1);
4902 SMask = IRB.CreateSExt(IRB.CreateICmpNE(SMask, getCleanShadow(ShadowTy)),
4903 ShadowTy);
4904 // Apply the same intrinsic to the shadow of the first operand.
4905 Value *S = IRB.CreateCall(I.getCalledFunction(),
4906 {getShadow(&I, 0), I.getOperand(1)});
4907 S = IRB.CreateOr(SMask, S);
4908 setShadow(&I, S);
4909 setOriginForNaryOp(I);
4910 }
4911
4912 static SmallVector<int, 8> getPclmulMask(unsigned Width, bool OddElements) {
4913 SmallVector<int, 8> Mask;
4914 for (unsigned X = OddElements ? 1 : 0; X < Width; X += 2) {
4915 Mask.append(2, X);
4916 }
4917 return Mask;
4918 }
4919
4920 // Instrument pclmul intrinsics.
4921 // These intrinsics operate either on odd or on even elements of the input
4922 // vectors, depending on the constant in the 3rd argument, ignoring the rest.
4923 // Replace the unused elements with copies of the used ones, ex:
4924 // (0, 1, 2, 3) -> (0, 0, 2, 2) (even case)
4925 // or
4926 // (0, 1, 2, 3) -> (1, 1, 3, 3) (odd case)
4927 // and then apply the usual shadow combining logic.
4928 void handlePclmulIntrinsic(IntrinsicInst &I) {
4929 IRBuilder<> IRB(&I);
4930 unsigned Width =
4931 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4932 assert(isa<ConstantInt>(I.getArgOperand(2)) &&
4933 "pclmul 3rd operand must be a constant");
4934 unsigned Imm = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
4935 Value *Shuf0 = IRB.CreateShuffleVector(getShadow(&I, 0),
4936 getPclmulMask(Width, Imm & 0x01));
4937 Value *Shuf1 = IRB.CreateShuffleVector(getShadow(&I, 1),
4938 getPclmulMask(Width, Imm & 0x10));
4939 ShadowAndOriginCombiner SOC(this, IRB);
4940 SOC.Add(Shuf0, getOrigin(&I, 0));
4941 SOC.Add(Shuf1, getOrigin(&I, 1));
4942 SOC.Done(&I);
4943 }
4944
4945 // Instrument _mm_*_sd|ss intrinsics
4946 void handleUnarySdSsIntrinsic(IntrinsicInst &I) {
4947 IRBuilder<> IRB(&I);
4948 unsigned Width =
4949 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4950 Value *First = getShadow(&I, 0);
4951 Value *Second = getShadow(&I, 1);
4952 // First element of second operand, remaining elements of first operand
4953 SmallVector<int, 16> Mask;
4954 Mask.push_back(Width);
4955 for (unsigned i = 1; i < Width; i++)
4956 Mask.push_back(i);
4957 Value *Shadow = IRB.CreateShuffleVector(First, Second, Mask);
4958
4959 setShadow(&I, Shadow);
4960 setOriginForNaryOp(I);
4961 }
4962
4963 void handleVtestIntrinsic(IntrinsicInst &I) {
4964 IRBuilder<> IRB(&I);
4965 Value *Shadow0 = getShadow(&I, 0);
4966 Value *Shadow1 = getShadow(&I, 1);
4967 Value *Or = IRB.CreateOr(Shadow0, Shadow1);
4968 Value *NZ = IRB.CreateICmpNE(Or, Constant::getNullValue(Or->getType()));
4969 Value *Scalar = convertShadowToScalar(NZ, IRB);
4970 Value *Shadow = IRB.CreateZExt(Scalar, getShadowTy(&I));
4971
4972 setShadow(&I, Shadow);
4973 setOriginForNaryOp(I);
4974 }
4975
4976 void handleBinarySdSsIntrinsic(IntrinsicInst &I) {
4977 IRBuilder<> IRB(&I);
4978 unsigned Width =
4979 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4980 Value *First = getShadow(&I, 0);
4981 Value *Second = getShadow(&I, 1);
4982 Value *OrShadow = IRB.CreateOr(First, Second);
4983 // First element of both OR'd together, remaining elements of first operand
4984 SmallVector<int, 16> Mask;
4985 Mask.push_back(Width);
4986 for (unsigned i = 1; i < Width; i++)
4987 Mask.push_back(i);
4988 Value *Shadow = IRB.CreateShuffleVector(First, OrShadow, Mask);
4989
4990 setShadow(&I, Shadow);
4991 setOriginForNaryOp(I);
4992 }
4993
4994 // _mm_round_ps / _mm_round_ps.
4995 // Similar to maybeHandleSimpleNomemIntrinsic except
4996 // the second argument is guaranteed to be a constant integer.
4997 void handleRoundPdPsIntrinsic(IntrinsicInst &I) {
4998 assert(I.getArgOperand(0)->getType() == I.getType());
4999 assert(I.arg_size() == 2);
5000 assert(isa<ConstantInt>(I.getArgOperand(1)));
5001
5002 IRBuilder<> IRB(&I);
5003 ShadowAndOriginCombiner SC(this, IRB);
5004 SC.Add(I.getArgOperand(0));
5005 SC.Done(&I);
5006 }
5007
5008 // Instrument @llvm.abs intrinsic.
5009 //
5010 // e.g., i32 @llvm.abs.i32 (i32 <Src>, i1 <is_int_min_poison>)
5011 // <4 x i32> @llvm.abs.v4i32(<4 x i32> <Src>, i1 <is_int_min_poison>)
5012 void handleAbsIntrinsic(IntrinsicInst &I) {
5013 assert(I.arg_size() == 2);
5014 Value *Src = I.getArgOperand(0);
5015 Value *IsIntMinPoison = I.getArgOperand(1);
5016
5017 assert(I.getType()->isIntOrIntVectorTy());
5018
5019 assert(Src->getType() == I.getType());
5020
5021 assert(IsIntMinPoison->getType()->isIntegerTy());
5022 assert(IsIntMinPoison->getType()->getIntegerBitWidth() == 1);
5023
5024 IRBuilder<> IRB(&I);
5025 Value *SrcShadow = getShadow(Src);
5026
5027 APInt MinVal =
5028 APInt::getSignedMinValue(Src->getType()->getScalarSizeInBits());
5029 Value *MinValVec = ConstantInt::get(Src->getType(), MinVal);
5030 Value *SrcIsMin = IRB.CreateICmp(CmpInst::ICMP_EQ, Src, MinValVec);
5031
5032 Value *PoisonedShadow = getPoisonedShadow(Src);
5033 Value *PoisonedIfIntMinShadow =
5034 IRB.CreateSelect(SrcIsMin, PoisonedShadow, SrcShadow);
5035 Value *Shadow =
5036 IRB.CreateSelect(IsIntMinPoison, PoisonedIfIntMinShadow, SrcShadow);
5037
5038 setShadow(&I, Shadow);
5039 setOrigin(&I, getOrigin(&I, 0));
5040 }
5041
5042 void handleIsFpClass(IntrinsicInst &I) {
5043 IRBuilder<> IRB(&I);
5044 Value *Shadow = getShadow(&I, 0);
5045 setShadow(&I, IRB.CreateICmpNE(Shadow, getCleanShadow(Shadow)));
5046 setOrigin(&I, getOrigin(&I, 0));
5047 }
5048
5049 void handleArithmeticWithOverflow(IntrinsicInst &I) {
5050 IRBuilder<> IRB(&I);
5051 Value *Shadow0 = getShadow(&I, 0);
5052 Value *Shadow1 = getShadow(&I, 1);
5053 Value *ShadowElt0 = IRB.CreateOr(Shadow0, Shadow1);
5054 Value *ShadowElt1 =
5055 IRB.CreateICmpNE(ShadowElt0, getCleanShadow(ShadowElt0));
5056
5057 Value *Shadow = PoisonValue::get(getShadowTy(&I));
5058 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt0, 0);
5059 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt1, 1);
5060
5061 setShadow(&I, Shadow);
5062 setOriginForNaryOp(I);
5063 }
5064
5065 Value *extractLowerShadow(IRBuilder<> &IRB, Value *V) {
5066 assert(isa<FixedVectorType>(V->getType()));
5067 assert(cast<FixedVectorType>(V->getType())->getNumElements() > 0);
5068 Value *Shadow = getShadow(V);
5069 return IRB.CreateExtractElement(Shadow,
5070 ConstantInt::get(IRB.getInt32Ty(), 0));
5071 }
5072
5073 // Handle llvm.x86.avx512.mask.pmov{,s,us}.*.512
5074 //
5075 // e.g., call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512
5076 // (<8 x i64>, <16 x i8>, i8)
5077 // A WriteThru Mask
5078 //
5079 // call <16 x i8> @llvm.x86.avx512.mask.pmovs.db.512
5080 // (<16 x i32>, <16 x i8>, i16)
5081 //
5082 // Dst[i] = Mask[i] ? truncate_or_saturate(A[i]) : WriteThru[i]
5083 // Dst_shadow[i] = Mask[i] ? truncate(A_shadow[i]) : WriteThru_shadow[i]
5084 //
5085 // If Dst has more elements than A, the excess elements are zeroed (and the
5086 // corresponding shadow is initialized).
5087 //
5088 // Note: for PMOV (truncation), handleIntrinsicByApplyingToShadow is precise
5089 // and is much faster than this handler.
5090 void handleAVX512VectorDownConvert(IntrinsicInst &I) {
5091 IRBuilder<> IRB(&I);
5092
5093 assert(I.arg_size() == 3);
5094 Value *A = I.getOperand(0);
5095 Value *WriteThrough = I.getOperand(1);
5096 Value *Mask = I.getOperand(2);
5097
5098 assert(isFixedIntVector(A));
5099 assert(isFixedIntVector(WriteThrough));
5100
5101 unsigned ANumElements =
5102 cast<FixedVectorType>(A->getType())->getNumElements();
5103 unsigned OutputNumElements =
5104 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
5105 assert(ANumElements == OutputNumElements ||
5106 ANumElements * 2 == OutputNumElements);
5107
5108 assert(Mask->getType()->isIntegerTy());
5109 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
5110 insertCheckShadowOf(Mask, &I);
5111
5112 assert(I.getType() == WriteThrough->getType());
5113
5114 // Widen the mask, if necessary, to have one bit per element of the output
5115 // vector.
5116 // We want the extra bits to have '1's, so that the CreateSelect will
5117 // select the values from AShadow instead of WriteThroughShadow ("maskless"
5118 // versions of the intrinsics are sometimes implemented using an all-1's
5119 // mask and an undefined value for WriteThroughShadow). We accomplish this
5120 // by using bitwise NOT before and after the ZExt.
5121 if (ANumElements != OutputNumElements) {
5122 Mask = IRB.CreateNot(Mask);
5123 Mask = IRB.CreateZExt(Mask, Type::getIntNTy(*MS.C, OutputNumElements),
5124 "_ms_widen_mask");
5125 Mask = IRB.CreateNot(Mask);
5126 }
5127 Mask = IRB.CreateBitCast(
5128 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
5129
5130 Value *AShadow = getShadow(A);
5131
5132 // The return type might have more elements than the input.
5133 // Temporarily shrink the return type's number of elements.
5134 VectorType *ShadowType = maybeShrinkVectorShadowType(A, I);
5135
5136 // PMOV truncates; PMOVS/PMOVUS uses signed/unsigned saturation.
5137 // This handler treats them all as truncation, which leads to some rare
5138 // false positives in the cases where the truncated bytes could
5139 // unambiguously saturate the value e.g., if A = ??????10 ????????
5140 // (big-endian), the unsigned saturated byte conversion is 11111111 i.e.,
5141 // fully defined, but the truncated byte is ????????.
5142 //
5143 // TODO: use GetMinMaxUnsigned() to handle saturation precisely.
5144 AShadow = IRB.CreateTrunc(AShadow, ShadowType, "_ms_trunc_shadow");
5145 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
5146
5147 Value *WriteThroughShadow = getShadow(WriteThrough);
5148
5149 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow);
5150 setShadow(&I, Shadow);
5151 setOriginForNaryOp(I);
5152 }
5153
5154 // Handle llvm.x86.avx512.* instructions that take vector(s) of floating-point
5155 // values and perform an operation whose shadow propagation should be handled
5156 // as all-or-nothing [*], with masking provided by a vector and a mask
5157 // supplied as an integer.
5158 //
5159 // [*] if all bits of a vector element are initialized, the output is fully
5160 // initialized; otherwise, the output is fully uninitialized
5161 //
5162 // e.g., <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
5163 // (<16 x float>, <16 x float>, i16)
5164 // A WriteThru Mask
5165 //
5166 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
5167 // (<2 x double>, <2 x double>, i8)
5168 // A WriteThru Mask
5169 //
5170 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
5171 // (<8 x double>, i32, <8 x double>, i8, i32)
5172 // A Imm WriteThru Mask Rounding
5173 //
5174 // <16 x float> @llvm.x86.avx512.mask.scalef.ps.512
5175 // (<16 x float>, <16 x float>, <16 x float>, i16, i32)
5176 // WriteThru A B Mask Rnd
5177 //
5178 // All operands other than A, B, ..., and WriteThru (e.g., Mask, Imm,
5179 // Rounding) must be fully initialized.
5180 //
5181 // Dst[i] = Mask[i] ? some_op(A[i], B[i], ...)
5182 // : WriteThru[i]
5183 // Dst_shadow[i] = Mask[i] ? all_or_nothing(A_shadow[i] | B_shadow[i] | ...)
5184 // : WriteThru_shadow[i]
5185 void handleAVX512VectorGenericMaskedFP(IntrinsicInst &I,
5186 SmallVector<unsigned, 4> DataIndices,
5187 unsigned WriteThruIndex,
5188 unsigned MaskIndex) {
5189 IRBuilder<> IRB(&I);
5190
5191 unsigned NumArgs = I.arg_size();
5192
5193 assert(WriteThruIndex < NumArgs);
5194 assert(MaskIndex < NumArgs);
5195 assert(WriteThruIndex != MaskIndex);
5196 Value *WriteThru = I.getOperand(WriteThruIndex);
5197
5198 unsigned OutputNumElements =
5199 cast<FixedVectorType>(WriteThru->getType())->getNumElements();
5200
5201 assert(DataIndices.size() > 0);
5202
5203 bool isData[16] = {false};
5204 assert(NumArgs <= 16);
5205 for (unsigned i : DataIndices) {
5206 assert(i < NumArgs);
5207 assert(i != WriteThruIndex);
5208 assert(i != MaskIndex);
5209
5210 isData[i] = true;
5211
5212 Value *A = I.getOperand(i);
5213 assert(isFixedFPVector(A));
5214 [[maybe_unused]] unsigned ANumElements =
5215 cast<FixedVectorType>(A->getType())->getNumElements();
5216 assert(ANumElements == OutputNumElements);
5217 }
5218
5219 Value *Mask = I.getOperand(MaskIndex);
5220
5221 assert(isFixedFPVector(WriteThru));
5222
5223 for (unsigned i = 0; i < NumArgs; ++i) {
5224 if (!isData[i] && i != WriteThruIndex) {
5225 // Imm, Mask, Rounding etc. are "control" data, hence we require that
5226 // they be fully initialized.
5227 assert(I.getOperand(i)->getType()->isIntegerTy());
5228 insertCheckShadowOf(I.getOperand(i), &I);
5229 }
5230 }
5231
5232 // The mask has 1 bit per element of A, but a minimum of 8 bits.
5233 if (Mask->getType()->getScalarSizeInBits() == 8 && OutputNumElements < 8)
5234 Mask = IRB.CreateTrunc(Mask, Type::getIntNTy(*MS.C, OutputNumElements));
5235 assert(Mask->getType()->getScalarSizeInBits() == OutputNumElements);
5236
5237 assert(I.getType() == WriteThru->getType());
5238
5239 Mask = IRB.CreateBitCast(
5240 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
5241
5242 Value *DataShadow = nullptr;
5243 for (unsigned i : DataIndices) {
5244 Value *A = I.getOperand(i);
5245 if (DataShadow)
5246 DataShadow = IRB.CreateOr(DataShadow, getShadow(A));
5247 else
5248 DataShadow = getShadow(A);
5249 }
5250
5251 // All-or-nothing shadow
5252 DataShadow =
5253 IRB.CreateSExt(IRB.CreateICmpNE(DataShadow, getCleanShadow(DataShadow)),
5254 DataShadow->getType());
5255
5256 Value *WriteThruShadow = getShadow(WriteThru);
5257
5258 Value *Shadow = IRB.CreateSelect(Mask, DataShadow, WriteThruShadow);
5259 setShadow(&I, Shadow);
5260
5261 setOriginForNaryOp(I);
5262 }
5263
5264 // For sh.* compiler intrinsics:
5265 // llvm.x86.avx512fp16.mask.{add/sub/mul/div/max/min}.sh.round
5266 // (<8 x half>, <8 x half>, <8 x half>, i8, i32)
5267 // A B WriteThru Mask RoundingMode
5268 //
5269 // DstShadow[0] = Mask[0] ? (AShadow[0] | BShadow[0]) : WriteThruShadow[0]
5270 // DstShadow[1..7] = AShadow[1..7]
5271 void visitGenericScalarHalfwordInst(IntrinsicInst &I) {
5272 IRBuilder<> IRB(&I);
5273
5274 assert(I.arg_size() == 5);
5275 Value *A = I.getOperand(0);
5276 Value *B = I.getOperand(1);
5277 Value *WriteThrough = I.getOperand(2);
5278 Value *Mask = I.getOperand(3);
5279 Value *RoundingMode = I.getOperand(4);
5280
5281 // Technically, we could probably just check whether the LSB is
5282 // initialized, but intuitively it feels like a partly uninitialized mask
5283 // is unintended, and we should warn the user immediately.
5284 insertCheckShadowOf(Mask, &I);
5285 insertCheckShadowOf(RoundingMode, &I);
5286
5287 assert(isa<FixedVectorType>(A->getType()));
5288 unsigned NumElements =
5289 cast<FixedVectorType>(A->getType())->getNumElements();
5290 assert(NumElements == 8);
5291 assert(A->getType() == B->getType());
5292 assert(B->getType() == WriteThrough->getType());
5293 assert(Mask->getType()->getPrimitiveSizeInBits() == NumElements);
5294 assert(RoundingMode->getType()->isIntegerTy());
5295
5296 Value *ALowerShadow = extractLowerShadow(IRB, A);
5297 Value *BLowerShadow = extractLowerShadow(IRB, B);
5298
5299 Value *ABLowerShadow = IRB.CreateOr(ALowerShadow, BLowerShadow);
5300
5301 Value *WriteThroughLowerShadow = extractLowerShadow(IRB, WriteThrough);
5302
5303 Mask = IRB.CreateBitCast(
5304 Mask, FixedVectorType::get(IRB.getInt1Ty(), NumElements));
5305 Value *MaskLower =
5306 IRB.CreateExtractElement(Mask, ConstantInt::get(IRB.getInt32Ty(), 0));
5307
5308 Value *AShadow = getShadow(A);
5309 Value *DstLowerShadow =
5310 IRB.CreateSelect(MaskLower, ABLowerShadow, WriteThroughLowerShadow);
5311 Value *DstShadow = IRB.CreateInsertElement(
5312 AShadow, DstLowerShadow, ConstantInt::get(IRB.getInt32Ty(), 0),
5313 "_msprop");
5314
5315 setShadow(&I, DstShadow);
5316 setOriginForNaryOp(I);
5317 }
5318
5319 // Approximately handle AVX Galois Field Affine Transformation
5320 //
5321 // e.g.,
5322 // <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
5323 // <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
5324 // <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
5325 // Out A x b
5326 // where A and x are packed matrices, b is a vector,
5327 // Out = A * x + b in GF(2)
5328 //
5329 // Multiplication in GF(2) is equivalent to bitwise AND. However, the matrix
5330 // computation also includes a parity calculation.
5331 //
5332 // For the bitwise AND of bits V1 and V2, the exact shadow is:
5333 // Out_Shadow = (V1_Shadow & V2_Shadow)
5334 // | (V1 & V2_Shadow)
5335 // | (V1_Shadow & V2 )
5336 //
5337 // We approximate the shadow of gf2p8affineqb using:
5338 // Out_Shadow = gf2p8affineqb(x_Shadow, A_shadow, 0)
5339 // | gf2p8affineqb(x, A_shadow, 0)
5340 // | gf2p8affineqb(x_Shadow, A, 0)
5341 // | set1_epi8(b_Shadow)
5342 //
5343 // This approximation has false negatives: if an intermediate dot-product
5344 // contains an even number of 1's, the parity is 0.
5345 // It has no false positives.
5346 void handleAVXGF2P8Affine(IntrinsicInst &I) {
5347 IRBuilder<> IRB(&I);
5348
5349 assert(I.arg_size() == 3);
5350 Value *A = I.getOperand(0);
5351 Value *X = I.getOperand(1);
5352 Value *B = I.getOperand(2);
5353
5354 assert(isFixedIntVector(A));
5355 assert(cast<VectorType>(A->getType())
5356 ->getElementType()
5357 ->getScalarSizeInBits() == 8);
5358
5359 assert(A->getType() == X->getType());
5360
5361 assert(B->getType()->isIntegerTy());
5362 assert(B->getType()->getScalarSizeInBits() == 8);
5363
5364 assert(I.getType() == A->getType());
5365
5366 Value *AShadow = getShadow(A);
5367 Value *XShadow = getShadow(X);
5368 Value *BZeroShadow = getCleanShadow(B);
5369
5370 CallInst *AShadowXShadow = IRB.CreateIntrinsic(
5371 I.getType(), I.getIntrinsicID(), {XShadow, AShadow, BZeroShadow});
5372 CallInst *AShadowX = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5373 {X, AShadow, BZeroShadow});
5374 CallInst *XShadowA = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5375 {XShadow, A, BZeroShadow});
5376
5377 unsigned NumElements = cast<FixedVectorType>(I.getType())->getNumElements();
5378 Value *BShadow = getShadow(B);
5379 Value *BBroadcastShadow = getCleanShadow(AShadow);
5380 // There is no LLVM IR intrinsic for _mm512_set1_epi8.
5381 // This loop generates a lot of LLVM IR, which we expect that CodeGen will
5382 // lower appropriately (e.g., VPBROADCASTB).
5383 // Besides, b is often a constant, in which case it is fully initialized.
5384 for (unsigned i = 0; i < NumElements; i++)
5385 BBroadcastShadow = IRB.CreateInsertElement(BBroadcastShadow, BShadow, i);
5386
5387 setShadow(&I, IRB.CreateOr(
5388 {AShadowXShadow, AShadowX, XShadowA, BBroadcastShadow}));
5389 setOriginForNaryOp(I);
5390 }
5391
5392 // Handle Arm NEON vector load intrinsics (vld*).
5393 //
5394 // The WithLane instructions (ld[234]lane) are similar to:
5395 // call {<4 x i32>, <4 x i32>, <4 x i32>}
5396 // @llvm.aarch64.neon.ld3lane.v4i32.p0
5397 // (<4 x i32> %L1, <4 x i32> %L2, <4 x i32> %L3, i64 %lane, ptr
5398 // %A)
5399 //
5400 // The non-WithLane instructions (ld[234], ld1x[234], ld[234]r) are similar
5401 // to:
5402 // call {<8 x i8>, <8 x i8>} @llvm.aarch64.neon.ld2.v8i8.p0(ptr %A)
5403 void handleNEONVectorLoad(IntrinsicInst &I, bool WithLane) {
5404 unsigned int numArgs = I.arg_size();
5405
5406 // Return type is a struct of vectors of integers or floating-point
5407 assert(I.getType()->isStructTy());
5408 [[maybe_unused]] StructType *RetTy = cast<StructType>(I.getType());
5409 assert(RetTy->getNumElements() > 0);
5411 RetTy->getElementType(0)->isFPOrFPVectorTy());
5412 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5413 assert(RetTy->getElementType(i) == RetTy->getElementType(0));
5414
5415 if (WithLane) {
5416 // 2, 3 or 4 vectors, plus lane number, plus input pointer
5417 assert(4 <= numArgs && numArgs <= 6);
5418
5419 // Return type is a struct of the input vectors
5420 assert(RetTy->getNumElements() + 2 == numArgs);
5421 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5422 assert(I.getArgOperand(i)->getType() == RetTy->getElementType(0));
5423 } else {
5424 assert(numArgs == 1);
5425 }
5426
5427 IRBuilder<> IRB(&I);
5428
5429 SmallVector<Value *, 6> ShadowArgs;
5430 if (WithLane) {
5431 for (unsigned int i = 0; i < numArgs - 2; i++)
5432 ShadowArgs.push_back(getShadow(I.getArgOperand(i)));
5433
5434 // Lane number, passed verbatim
5435 Value *LaneNumber = I.getArgOperand(numArgs - 2);
5436 ShadowArgs.push_back(LaneNumber);
5437
5438 // TODO: blend shadow of lane number into output shadow?
5439 insertCheckShadowOf(LaneNumber, &I);
5440 }
5441
5442 Value *Src = I.getArgOperand(numArgs - 1);
5443 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
5444
5445 Type *SrcShadowTy = getShadowTy(Src);
5446 auto [SrcShadowPtr, SrcOriginPtr] =
5447 getShadowOriginPtr(Src, IRB, SrcShadowTy, Align(1), /*isStore*/ false);
5448 ShadowArgs.push_back(SrcShadowPtr);
5449
5450 // The NEON vector load instructions handled by this function all have
5451 // integer variants. It is easier to use those rather than trying to cast
5452 // a struct of vectors of floats into a struct of vectors of integers.
5453 CallInst *CI =
5454 IRB.CreateIntrinsic(getShadowTy(&I), I.getIntrinsicID(), ShadowArgs);
5455 setShadow(&I, CI);
5456
5457 if (!MS.TrackOrigins)
5458 return;
5459
5460 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
5461 setOrigin(&I, PtrSrcOrigin);
5462 }
5463
5464 /// Handle Arm NEON vector store intrinsics (vst{2,3,4}, vst1x_{2,3,4},
5465 /// and vst{2,3,4}lane).
5466 ///
5467 /// Arm NEON vector store intrinsics have the output address (pointer) as the
5468 /// last argument, with the initial arguments being the inputs (and lane
5469 /// number for vst{2,3,4}lane). They return void.
5470 ///
5471 /// - st4 interleaves the output e.g., st4 (inA, inB, inC, inD, outP) writes
5472 /// abcdabcdabcdabcd... into *outP
5473 /// - st1_x4 is non-interleaved e.g., st1_x4 (inA, inB, inC, inD, outP)
5474 /// writes aaaa...bbbb...cccc...dddd... into *outP
5475 /// - st4lane has arguments of (inA, inB, inC, inD, lane, outP)
5476 /// These instructions can all be instrumented with essentially the same
5477 /// MSan logic, simply by applying the corresponding intrinsic to the shadow.
5478 void handleNEONVectorStoreIntrinsic(IntrinsicInst &I, bool useLane) {
5479 IRBuilder<> IRB(&I);
5480
5481 // Don't use getNumOperands() because it includes the callee
5482 int numArgOperands = I.arg_size();
5483
5484 // The last arg operand is the output (pointer)
5485 assert(numArgOperands >= 1);
5486 Value *Addr = I.getArgOperand(numArgOperands - 1);
5487 assert(Addr->getType()->isPointerTy());
5488 int skipTrailingOperands = 1;
5489
5491 insertCheckShadowOf(Addr, &I);
5492
5493 // Second-last operand is the lane number (for vst{2,3,4}lane)
5494 if (useLane) {
5495 skipTrailingOperands++;
5496 assert(numArgOperands >= static_cast<int>(skipTrailingOperands));
5498 I.getArgOperand(numArgOperands - skipTrailingOperands)->getType()));
5499 }
5500
5501 SmallVector<Value *, 8> ShadowArgs;
5502 // All the initial operands are the inputs
5503 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++) {
5504 assert(isa<FixedVectorType>(I.getArgOperand(i)->getType()));
5505 Value *Shadow = getShadow(&I, i);
5506 ShadowArgs.append(1, Shadow);
5507 }
5508
5509 // MSan's GetShadowTy assumes the LHS is the type we want the shadow for
5510 // e.g., for:
5511 // [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
5512 // we know the type of the output (and its shadow) is <16 x i8>.
5513 //
5514 // Arm NEON VST is unusual because the last argument is the output address:
5515 // define void @st2_16b(<16 x i8> %A, <16 x i8> %B, ptr %P) {
5516 // call void @llvm.aarch64.neon.st2.v16i8.p0
5517 // (<16 x i8> [[A]], <16 x i8> [[B]], ptr [[P]])
5518 // and we have no type information about P's operand. We must manually
5519 // compute the type (<16 x i8> x 2).
5520 FixedVectorType *OutputVectorTy = FixedVectorType::get(
5521 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getElementType(),
5522 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements() *
5523 (numArgOperands - skipTrailingOperands));
5524 Type *OutputShadowTy = getShadowTy(OutputVectorTy);
5525
5526 if (useLane)
5527 ShadowArgs.append(1,
5528 I.getArgOperand(numArgOperands - skipTrailingOperands));
5529
5530 Value *OutputShadowPtr, *OutputOriginPtr;
5531 // AArch64 NEON does not need alignment (unless OS requires it)
5532 std::tie(OutputShadowPtr, OutputOriginPtr) = getShadowOriginPtr(
5533 Addr, IRB, OutputShadowTy, Align(1), /*isStore*/ true);
5534 ShadowArgs.append(1, OutputShadowPtr);
5535
5536 CallInst *CI =
5537 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
5538 setShadow(&I, CI);
5539
5540 if (MS.TrackOrigins) {
5541 // TODO: if we modelled the vst* instruction more precisely, we could
5542 // more accurately track the origins (e.g., if both inputs are
5543 // uninitialized for vst2, we currently blame the second input, even
5544 // though part of the output depends only on the first input).
5545 //
5546 // This is particularly imprecise for vst{2,3,4}lane, since only one
5547 // lane of each input is actually copied to the output.
5548 OriginCombiner OC(this, IRB);
5549 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++)
5550 OC.Add(I.getArgOperand(i));
5551
5552 const DataLayout &DL = F.getDataLayout();
5553 OC.DoneAndStoreOrigin(DL.getTypeStoreSize(OutputVectorTy),
5554 OutputOriginPtr);
5555 }
5556 }
5557
5558 // Integer matrix multiplication:
5559 // - <4 x i32> @llvm.aarch64.neon.{s,u,us}mmla.v4i32.v16i8
5560 // (<4 x i32> %R, <16 x i8> %A, <16 x i8> %B)
5561 // - <4 x i32> is a 2x2 matrix
5562 // - <16 x i8> %A and %B are 2x8 and 8x2 matrices respectively
5563 //
5564 // Floating-point matrix multiplication:
5565 // - <4 x float> @llvm.aarch64.neon.bfmmla
5566 // (<4 x float> %R, <8 x bfloat> %A, <8 x bfloat> %B)
5567 // - <4 x float> is a 2x2 matrix
5568 // - <8 x bfloat> %A and %B are 2x4 and 4x2 matrices respectively
5569 //
5570 // The general shadow propagation approach is:
5571 // 1) get the shadows of the input matrices %A and %B
5572 // 2) map each shadow value to 0x1 if the corresponding value is fully
5573 // initialized, and 0x0 otherwise
5574 // 3) perform a matrix multiplication on the shadows of %A and %B [*].
5575 // The output will be a 2x2 matrix. For each element, a value of 0x8
5576 // (for {s,u,us}mmla) or 0x4 (for bfmmla) means all the corresponding
5577 // inputs were clean; if so, set the shadow to zero, otherwise set to -1.
5578 // 4) blend in the shadow of %R
5579 //
5580 // [*] Since shadows are integral, the obvious approach is to always apply
5581 // ummla to the shadows. Unfortunately, Armv8.2+bf16 supports bfmmla,
5582 // but not ummla. Thus, for bfmmla, our instrumentation reuses bfmmla.
5583 //
5584 // TODO: consider allowing multiplication of zero with an uninitialized value
5585 // to result in an initialized value.
5586 void handleNEONMatrixMultiply(IntrinsicInst &I) {
5587 IRBuilder<> IRB(&I);
5588
5589 assert(I.arg_size() == 3);
5590 Value *R = I.getArgOperand(0);
5591 Value *A = I.getArgOperand(1);
5592 Value *B = I.getArgOperand(2);
5593
5594 assert(I.getType() == R->getType());
5595
5596 assert(isa<FixedVectorType>(R->getType()));
5597 assert(isa<FixedVectorType>(A->getType()));
5598 assert(isa<FixedVectorType>(B->getType()));
5599
5600 FixedVectorType *RTy = cast<FixedVectorType>(R->getType());
5601 FixedVectorType *ATy = cast<FixedVectorType>(A->getType());
5602 FixedVectorType *BTy = cast<FixedVectorType>(B->getType());
5603 assert(ATy->getElementType() == BTy->getElementType());
5604
5605 if (RTy->getElementType()->isIntegerTy()) {
5606 // <4 x i32> @llvm.aarch64.neon.ummla.v4i32.v16i8
5607 // (<4 x i32> %R, <16 x i8> %X, <16 x i8> %Y)
5608 assert(RTy == FixedVectorType::get(IntegerType::get(*MS.C, 32), 4));
5609 assert(ATy == FixedVectorType::get(IntegerType::get(*MS.C, 8), 16));
5610 assert(BTy == FixedVectorType::get(IntegerType::get(*MS.C, 8), 16));
5611 } else {
5612 // <4 x float> @llvm.aarch64.neon.bfmmla
5613 // (<4 x float> %R, <8 x bfloat> %X, <8 x bfloat> %Y)
5614 assert(RTy == FixedVectorType::get(Type::getFloatTy(*MS.C), 4));
5615 assert(ATy == FixedVectorType::get(Type::getBFloatTy(*MS.C), 8));
5616 assert(BTy == FixedVectorType::get(Type::getBFloatTy(*MS.C), 8));
5617 }
5618
5619 Value *ShadowR = getShadow(&I, 0);
5620 Value *ShadowA = getShadow(&I, 1);
5621 Value *ShadowB = getShadow(&I, 2);
5622
5623 Value *ShadowAB;
5624 Value *FullyInit;
5625
5626 if (RTy->getElementType()->isIntegerTy()) {
5627 // If the value is fully initialized, the shadow will be 000...001.
5628 // Otherwise, the shadow will be all zero.
5629 // (This is the opposite of how we typically handle shadows.)
5630 ShadowA = IRB.CreateZExt(IRB.CreateICmpEQ(ShadowA, getCleanShadow(ATy)),
5631 getShadowTy(ATy));
5632 ShadowB = IRB.CreateZExt(IRB.CreateICmpEQ(ShadowB, getCleanShadow(BTy)),
5633 getShadowTy(BTy));
5634 // TODO: the CreateSelect approach used below for floating-point is more
5635 // generic than CreateZExt. Investigate whether it is worthwhile
5636 // unifying the two approaches.
5637
5638 ShadowAB = IRB.CreateIntrinsic(RTy, Intrinsic::aarch64_neon_ummla,
5639 {getCleanShadow(RTy), ShadowA, ShadowB});
5640
5641 // ummla multiplies a 2x8 matrix with an 8x2 matrix. If all entries of the
5642 // input matrices are equal to 0x1, all entries of the output matrix will
5643 // be 0x8.
5644 FullyInit = ConstantVector::getSplat(
5645 RTy->getElementCount(), ConstantInt::get(RTy->getElementType(), 0x8));
5646
5647 ShadowAB = IRB.CreateICmpNE(ShadowAB, FullyInit);
5648 } else {
5650 ATy->getElementCount(), ConstantFP::get(ATy->getElementType(), 0));
5652 ATy->getElementCount(), ConstantFP::get(ATy->getElementType(), 1));
5653
5654 // As per the integer case, if the shadow is clean, we store 0x1,
5655 // otherwise we store 0x0 (the opposite of usual shadow arithmetic).
5656 ShadowA = IRB.CreateSelect(IRB.CreateICmpEQ(ShadowA, getCleanShadow(ATy)),
5657 ABOnes, ABZeros);
5658 ShadowB = IRB.CreateSelect(IRB.CreateICmpEQ(ShadowB, getCleanShadow(BTy)),
5659 ABOnes, ABZeros);
5660
5662 RTy->getElementCount(), ConstantFP::get(RTy->getElementType(), 0));
5663
5664 ShadowAB = IRB.CreateIntrinsic(RTy, Intrinsic::aarch64_neon_bfmmla,
5665 {RZeros, ShadowA, ShadowB});
5666
5667 // bfmmla multiplies a 2x4 matrix with an 4x2 matrix. If all entries of
5668 // the input matrices are equal to 0x1, all entries of the output matrix
5669 // will be 4.0. (To avoid floating-point error, we check if each entry
5670 // < 3.5.)
5671 FullyInit = ConstantVector::getSplat(
5672 RTy->getElementCount(), ConstantFP::get(RTy->getElementType(), 3.5));
5673
5674 // FCmpULT: "yields true if either operand is a QNAN or op1 is less than"
5675 // op2"
5676 ShadowAB = IRB.CreateFCmpULT(ShadowAB, FullyInit);
5677 }
5678
5679 ShadowR = IRB.CreateICmpNE(ShadowR, getCleanShadow(RTy));
5680 ShadowR = IRB.CreateOr(ShadowAB, ShadowR);
5681
5682 setShadow(&I, IRB.CreateSExt(ShadowR, getShadowTy(RTy)));
5683
5684 setOriginForNaryOp(I);
5685 }
5686
5687 /// Handle intrinsics by applying the intrinsic to the shadows.
5688 ///
5689 /// The trailing arguments are passed verbatim to the intrinsic, though any
5690 /// uninitialized trailing arguments can also taint the shadow e.g., for an
5691 /// intrinsic with one trailing verbatim argument:
5692 /// out = intrinsic(var1, var2, opType)
5693 /// we compute:
5694 /// shadow[out] =
5695 /// intrinsic(shadow[var1], shadow[var2], opType) | shadow[opType]
5696 ///
5697 /// Typically, shadowIntrinsicID will be specified by the caller to be
5698 /// I.getIntrinsicID(), but the caller can choose to replace it with another
5699 /// intrinsic of the same type.
5700 ///
5701 /// CAUTION: this assumes that the intrinsic will handle arbitrary
5702 /// bit-patterns (for example, if the intrinsic accepts floats for
5703 /// var1, we require that it doesn't care if inputs are NaNs).
5704 ///
5705 /// For example, this can be applied to the Arm NEON vector table intrinsics
5706 /// (tbl{1,2,3,4}).
5707 ///
5708 /// The origin is approximated using setOriginForNaryOp.
5709 void handleIntrinsicByApplyingToShadow(IntrinsicInst &I,
5710 Intrinsic::ID shadowIntrinsicID,
5711 unsigned int trailingVerbatimArgs) {
5712 IRBuilder<> IRB(&I);
5713
5714 assert(trailingVerbatimArgs < I.arg_size());
5715
5716 SmallVector<Value *, 8> ShadowArgs;
5717 // Don't use getNumOperands() because it includes the callee
5718 for (unsigned int i = 0; i < I.arg_size() - trailingVerbatimArgs; i++) {
5719 Value *Shadow = getShadow(&I, i);
5720
5721 // Shadows are integer-ish types but some intrinsics require a
5722 // different (e.g., floating-point) type.
5723 ShadowArgs.push_back(
5724 IRB.CreateBitCast(Shadow, I.getArgOperand(i)->getType()));
5725 }
5726
5727 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5728 i++) {
5729 Value *Arg = I.getArgOperand(i);
5730 ShadowArgs.push_back(Arg);
5731 }
5732
5733 CallInst *CI =
5734 IRB.CreateIntrinsic(I.getType(), shadowIntrinsicID, ShadowArgs);
5735 Value *CombinedShadow = CI;
5736
5737 // Combine the computed shadow with the shadow of trailing args
5738 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5739 i++) {
5740 Value *Shadow =
5741 CreateShadowCast(IRB, getShadow(&I, i), CombinedShadow->getType());
5742 CombinedShadow = IRB.CreateOr(Shadow, CombinedShadow, "_msprop");
5743 }
5744
5745 setShadow(&I, IRB.CreateBitCast(CombinedShadow, getShadowTy(&I)));
5746
5747 setOriginForNaryOp(I);
5748 }
5749
5750 // Approximation only
5751 //
5752 // e.g., <16 x i8> @llvm.aarch64.neon.pmull64(i64, i64)
5753 void handleNEONVectorMultiplyIntrinsic(IntrinsicInst &I) {
5754 assert(I.arg_size() == 2);
5755
5756 handleShadowOr(I);
5757 }
5758
5759 bool maybeHandleCrossPlatformIntrinsic(IntrinsicInst &I) {
5760 switch (I.getIntrinsicID()) {
5761 case Intrinsic::uadd_with_overflow:
5762 case Intrinsic::sadd_with_overflow:
5763 case Intrinsic::usub_with_overflow:
5764 case Intrinsic::ssub_with_overflow:
5765 case Intrinsic::umul_with_overflow:
5766 case Intrinsic::smul_with_overflow:
5767 handleArithmeticWithOverflow(I);
5768 break;
5769 case Intrinsic::abs:
5770 handleAbsIntrinsic(I);
5771 break;
5772 case Intrinsic::bitreverse:
5773 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
5774 /*trailingVerbatimArgs*/ 0);
5775 break;
5776 case Intrinsic::is_fpclass:
5777 handleIsFpClass(I);
5778 break;
5779 case Intrinsic::lifetime_start:
5780 handleLifetimeStart(I);
5781 break;
5782 case Intrinsic::launder_invariant_group:
5783 case Intrinsic::strip_invariant_group:
5784 handleInvariantGroup(I);
5785 break;
5786 case Intrinsic::bswap:
5787 handleBswap(I);
5788 break;
5789 case Intrinsic::ctlz:
5790 case Intrinsic::cttz:
5791 handleCountLeadingTrailingZeros(I);
5792 break;
5793 case Intrinsic::masked_compressstore:
5794 handleMaskedCompressStore(I);
5795 break;
5796 case Intrinsic::masked_expandload:
5797 handleMaskedExpandLoad(I);
5798 break;
5799 case Intrinsic::masked_gather:
5800 handleMaskedGather(I);
5801 break;
5802 case Intrinsic::masked_scatter:
5803 handleMaskedScatter(I);
5804 break;
5805 case Intrinsic::masked_store:
5806 handleMaskedStore(I);
5807 break;
5808 case Intrinsic::masked_load:
5809 handleMaskedLoad(I);
5810 break;
5811 case Intrinsic::vector_reduce_and:
5812 handleVectorReduceAndIntrinsic(I);
5813 break;
5814 case Intrinsic::vector_reduce_or:
5815 handleVectorReduceOrIntrinsic(I);
5816 break;
5817
5818 case Intrinsic::vector_reduce_add:
5819 case Intrinsic::vector_reduce_xor:
5820 case Intrinsic::vector_reduce_mul:
5821 // Signed/Unsigned Min/Max
5822 // TODO: handling similarly to AND/OR may be more precise.
5823 case Intrinsic::vector_reduce_smax:
5824 case Intrinsic::vector_reduce_smin:
5825 case Intrinsic::vector_reduce_umax:
5826 case Intrinsic::vector_reduce_umin:
5827 // TODO: this has no false positives, but arguably we should check that all
5828 // the bits are initialized.
5829 case Intrinsic::vector_reduce_fmax:
5830 case Intrinsic::vector_reduce_fmin:
5831 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/false);
5832 break;
5833
5834 case Intrinsic::vector_reduce_fadd:
5835 case Intrinsic::vector_reduce_fmul:
5836 handleVectorReduceWithStarterIntrinsic(I);
5837 break;
5838
5839 case Intrinsic::scmp:
5840 case Intrinsic::ucmp: {
5841 handleShadowOr(I);
5842 break;
5843 }
5844
5845 case Intrinsic::fshl:
5846 case Intrinsic::fshr:
5847 handleFunnelShift(I);
5848 break;
5849
5850 case Intrinsic::is_constant:
5851 // The result of llvm.is.constant() is always defined.
5852 setShadow(&I, getCleanShadow(&I));
5853 setOrigin(&I, getCleanOrigin());
5854 break;
5855
5856 default:
5857 return false;
5858 }
5859
5860 return true;
5861 }
5862
5863 bool maybeHandleX86SIMDIntrinsic(IntrinsicInst &I) {
5864 switch (I.getIntrinsicID()) {
5865 case Intrinsic::x86_sse_stmxcsr:
5866 handleStmxcsr(I);
5867 break;
5868 case Intrinsic::x86_sse_ldmxcsr:
5869 handleLdmxcsr(I);
5870 break;
5871
5872 // Convert Scalar Double Precision Floating-Point Value
5873 // to Unsigned Doubleword Integer
5874 // etc.
5875 case Intrinsic::x86_avx512_vcvtsd2usi64:
5876 case Intrinsic::x86_avx512_vcvtsd2usi32:
5877 case Intrinsic::x86_avx512_vcvtss2usi64:
5878 case Intrinsic::x86_avx512_vcvtss2usi32:
5879 case Intrinsic::x86_avx512_cvttss2usi64:
5880 case Intrinsic::x86_avx512_cvttss2usi:
5881 case Intrinsic::x86_avx512_cvttsd2usi64:
5882 case Intrinsic::x86_avx512_cvttsd2usi:
5883 case Intrinsic::x86_avx512_cvtusi2ss:
5884 case Intrinsic::x86_avx512_cvtusi642sd:
5885 case Intrinsic::x86_avx512_cvtusi642ss:
5886 handleSSEVectorConvertIntrinsic(I, 1, true);
5887 break;
5888 case Intrinsic::x86_sse2_cvtsd2si64:
5889 case Intrinsic::x86_sse2_cvtsd2si:
5890 case Intrinsic::x86_sse2_cvtsd2ss:
5891 case Intrinsic::x86_sse2_cvttsd2si64:
5892 case Intrinsic::x86_sse2_cvttsd2si:
5893 case Intrinsic::x86_sse_cvtss2si64:
5894 case Intrinsic::x86_sse_cvtss2si:
5895 case Intrinsic::x86_sse_cvttss2si64:
5896 case Intrinsic::x86_sse_cvttss2si:
5897 handleSSEVectorConvertIntrinsic(I, 1);
5898 break;
5899 case Intrinsic::x86_sse_cvtps2pi:
5900 case Intrinsic::x86_sse_cvttps2pi:
5901 handleSSEVectorConvertIntrinsic(I, 2);
5902 break;
5903
5904 // TODO:
5905 // <1 x i64> @llvm.x86.sse.cvtpd2pi(<2 x double>)
5906 // <2 x double> @llvm.x86.sse.cvtpi2pd(<1 x i64>)
5907 // <4 x float> @llvm.x86.sse.cvtpi2ps(<4 x float>, <1 x i64>)
5908
5909 case Intrinsic::x86_vcvtps2ph_128:
5910 case Intrinsic::x86_vcvtps2ph_256: {
5911 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/true);
5912 break;
5913 }
5914
5915 // Convert Packed Single Precision Floating-Point Values
5916 // to Packed Signed Doubleword Integer Values
5917 //
5918 // <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
5919 // (<16 x float>, <16 x i32>, i16, i32)
5920 case Intrinsic::x86_avx512_mask_cvtps2dq_512:
5921 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/false);
5922 break;
5923
5924 // Convert Packed Double Precision Floating-Point Values
5925 // to Packed Single Precision Floating-Point Values
5926 case Intrinsic::x86_sse2_cvtpd2ps:
5927 case Intrinsic::x86_sse2_cvtps2dq:
5928 case Intrinsic::x86_sse2_cvtpd2dq:
5929 case Intrinsic::x86_sse2_cvttps2dq:
5930 case Intrinsic::x86_sse2_cvttpd2dq:
5931 case Intrinsic::x86_avx_cvt_pd2_ps_256:
5932 case Intrinsic::x86_avx_cvt_ps2dq_256:
5933 case Intrinsic::x86_avx_cvt_pd2dq_256:
5934 case Intrinsic::x86_avx_cvtt_ps2dq_256:
5935 case Intrinsic::x86_avx_cvtt_pd2dq_256: {
5936 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/false);
5937 break;
5938 }
5939
5940 // Convert Single-Precision FP Value to 16-bit FP Value
5941 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
5942 // (<16 x float>, i32, <16 x i16>, i16)
5943 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
5944 // (<4 x float>, i32, <8 x i16>, i8)
5945 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256
5946 // (<8 x float>, i32, <8 x i16>, i8)
5947 case Intrinsic::x86_avx512_mask_vcvtps2ph_512:
5948 case Intrinsic::x86_avx512_mask_vcvtps2ph_256:
5949 case Intrinsic::x86_avx512_mask_vcvtps2ph_128:
5950 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/true);
5951 break;
5952
5953 // Shift Packed Data (Left Logical, Right Arithmetic, Right Logical)
5954 case Intrinsic::x86_avx512_psll_w_512:
5955 case Intrinsic::x86_avx512_psll_d_512:
5956 case Intrinsic::x86_avx512_psll_q_512:
5957 case Intrinsic::x86_avx512_pslli_w_512:
5958 case Intrinsic::x86_avx512_pslli_d_512:
5959 case Intrinsic::x86_avx512_pslli_q_512:
5960 case Intrinsic::x86_avx512_psrl_w_512:
5961 case Intrinsic::x86_avx512_psrl_d_512:
5962 case Intrinsic::x86_avx512_psrl_q_512:
5963 case Intrinsic::x86_avx512_psra_w_512:
5964 case Intrinsic::x86_avx512_psra_d_512:
5965 case Intrinsic::x86_avx512_psra_q_512:
5966 case Intrinsic::x86_avx512_psrli_w_512:
5967 case Intrinsic::x86_avx512_psrli_d_512:
5968 case Intrinsic::x86_avx512_psrli_q_512:
5969 case Intrinsic::x86_avx512_psrai_w_512:
5970 case Intrinsic::x86_avx512_psrai_d_512:
5971 case Intrinsic::x86_avx512_psrai_q_512:
5972 case Intrinsic::x86_avx512_psra_q_256:
5973 case Intrinsic::x86_avx512_psra_q_128:
5974 case Intrinsic::x86_avx512_psrai_q_256:
5975 case Intrinsic::x86_avx512_psrai_q_128:
5976 case Intrinsic::x86_avx2_psll_w:
5977 case Intrinsic::x86_avx2_psll_d:
5978 case Intrinsic::x86_avx2_psll_q:
5979 case Intrinsic::x86_avx2_pslli_w:
5980 case Intrinsic::x86_avx2_pslli_d:
5981 case Intrinsic::x86_avx2_pslli_q:
5982 case Intrinsic::x86_avx2_psrl_w:
5983 case Intrinsic::x86_avx2_psrl_d:
5984 case Intrinsic::x86_avx2_psrl_q:
5985 case Intrinsic::x86_avx2_psra_w:
5986 case Intrinsic::x86_avx2_psra_d:
5987 case Intrinsic::x86_avx2_psrli_w:
5988 case Intrinsic::x86_avx2_psrli_d:
5989 case Intrinsic::x86_avx2_psrli_q:
5990 case Intrinsic::x86_avx2_psrai_w:
5991 case Intrinsic::x86_avx2_psrai_d:
5992 case Intrinsic::x86_sse2_psll_w:
5993 case Intrinsic::x86_sse2_psll_d:
5994 case Intrinsic::x86_sse2_psll_q:
5995 case Intrinsic::x86_sse2_pslli_w:
5996 case Intrinsic::x86_sse2_pslli_d:
5997 case Intrinsic::x86_sse2_pslli_q:
5998 case Intrinsic::x86_sse2_psrl_w:
5999 case Intrinsic::x86_sse2_psrl_d:
6000 case Intrinsic::x86_sse2_psrl_q:
6001 case Intrinsic::x86_sse2_psra_w:
6002 case Intrinsic::x86_sse2_psra_d:
6003 case Intrinsic::x86_sse2_psrli_w:
6004 case Intrinsic::x86_sse2_psrli_d:
6005 case Intrinsic::x86_sse2_psrli_q:
6006 case Intrinsic::x86_sse2_psrai_w:
6007 case Intrinsic::x86_sse2_psrai_d:
6008 case Intrinsic::x86_mmx_psll_w:
6009 case Intrinsic::x86_mmx_psll_d:
6010 case Intrinsic::x86_mmx_psll_q:
6011 case Intrinsic::x86_mmx_pslli_w:
6012 case Intrinsic::x86_mmx_pslli_d:
6013 case Intrinsic::x86_mmx_pslli_q:
6014 case Intrinsic::x86_mmx_psrl_w:
6015 case Intrinsic::x86_mmx_psrl_d:
6016 case Intrinsic::x86_mmx_psrl_q:
6017 case Intrinsic::x86_mmx_psra_w:
6018 case Intrinsic::x86_mmx_psra_d:
6019 case Intrinsic::x86_mmx_psrli_w:
6020 case Intrinsic::x86_mmx_psrli_d:
6021 case Intrinsic::x86_mmx_psrli_q:
6022 case Intrinsic::x86_mmx_psrai_w:
6023 case Intrinsic::x86_mmx_psrai_d:
6024 handleVectorShiftIntrinsic(I, /* Variable */ false);
6025 break;
6026 case Intrinsic::x86_avx2_psllv_d:
6027 case Intrinsic::x86_avx2_psllv_d_256:
6028 case Intrinsic::x86_avx512_psllv_d_512:
6029 case Intrinsic::x86_avx2_psllv_q:
6030 case Intrinsic::x86_avx2_psllv_q_256:
6031 case Intrinsic::x86_avx512_psllv_q_512:
6032 case Intrinsic::x86_avx2_psrlv_d:
6033 case Intrinsic::x86_avx2_psrlv_d_256:
6034 case Intrinsic::x86_avx512_psrlv_d_512:
6035 case Intrinsic::x86_avx2_psrlv_q:
6036 case Intrinsic::x86_avx2_psrlv_q_256:
6037 case Intrinsic::x86_avx512_psrlv_q_512:
6038 case Intrinsic::x86_avx2_psrav_d:
6039 case Intrinsic::x86_avx2_psrav_d_256:
6040 case Intrinsic::x86_avx512_psrav_d_512:
6041 case Intrinsic::x86_avx512_psrav_q_128:
6042 case Intrinsic::x86_avx512_psrav_q_256:
6043 case Intrinsic::x86_avx512_psrav_q_512:
6044 handleVectorShiftIntrinsic(I, /* Variable */ true);
6045 break;
6046
6047 // Pack with Signed/Unsigned Saturation
6048 case Intrinsic::x86_sse2_packsswb_128:
6049 case Intrinsic::x86_sse2_packssdw_128:
6050 case Intrinsic::x86_sse2_packuswb_128:
6051 case Intrinsic::x86_sse41_packusdw:
6052 case Intrinsic::x86_avx2_packsswb:
6053 case Intrinsic::x86_avx2_packssdw:
6054 case Intrinsic::x86_avx2_packuswb:
6055 case Intrinsic::x86_avx2_packusdw:
6056 // e.g., <64 x i8> @llvm.x86.avx512.packsswb.512
6057 // (<32 x i16> %a, <32 x i16> %b)
6058 // <32 x i16> @llvm.x86.avx512.packssdw.512
6059 // (<16 x i32> %a, <16 x i32> %b)
6060 // Note: AVX512 masked variants are auto-upgraded by LLVM.
6061 case Intrinsic::x86_avx512_packsswb_512:
6062 case Intrinsic::x86_avx512_packssdw_512:
6063 case Intrinsic::x86_avx512_packuswb_512:
6064 case Intrinsic::x86_avx512_packusdw_512:
6065 handleVectorPackIntrinsic(I);
6066 break;
6067
6068 case Intrinsic::x86_sse41_pblendvb:
6069 case Intrinsic::x86_sse41_blendvpd:
6070 case Intrinsic::x86_sse41_blendvps:
6071 case Intrinsic::x86_avx_blendv_pd_256:
6072 case Intrinsic::x86_avx_blendv_ps_256:
6073 case Intrinsic::x86_avx2_pblendvb:
6074 handleBlendvIntrinsic(I);
6075 break;
6076
6077 case Intrinsic::x86_avx_dp_ps_256:
6078 case Intrinsic::x86_sse41_dppd:
6079 case Intrinsic::x86_sse41_dpps:
6080 handleDppIntrinsic(I);
6081 break;
6082
6083 case Intrinsic::x86_mmx_packsswb:
6084 case Intrinsic::x86_mmx_packuswb:
6085 handleVectorPackIntrinsic(I, 16);
6086 break;
6087
6088 case Intrinsic::x86_mmx_packssdw:
6089 handleVectorPackIntrinsic(I, 32);
6090 break;
6091
6092 case Intrinsic::x86_mmx_psad_bw:
6093 handleVectorSadIntrinsic(I, true);
6094 break;
6095 case Intrinsic::x86_sse2_psad_bw:
6096 case Intrinsic::x86_avx2_psad_bw:
6097 handleVectorSadIntrinsic(I);
6098 break;
6099
6100 // Multiply and Add Packed Words
6101 // < 4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16>, <8 x i16>)
6102 // < 8 x i32> @llvm.x86.avx2.pmadd.wd(<16 x i16>, <16 x i16>)
6103 // <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
6104 //
6105 // Multiply and Add Packed Signed and Unsigned Bytes
6106 // < 8 x i16> @llvm.x86.ssse3.pmadd.ub.sw.128(<16 x i8>, <16 x i8>)
6107 // <16 x i16> @llvm.x86.avx2.pmadd.ub.sw(<32 x i8>, <32 x i8>)
6108 // <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
6109 //
6110 // These intrinsics are auto-upgraded into non-masked forms:
6111 // < 4 x i32> @llvm.x86.avx512.mask.pmaddw.d.128
6112 // (<8 x i16>, <8 x i16>, <4 x i32>, i8)
6113 // < 8 x i32> @llvm.x86.avx512.mask.pmaddw.d.256
6114 // (<16 x i16>, <16 x i16>, <8 x i32>, i8)
6115 // <16 x i32> @llvm.x86.avx512.mask.pmaddw.d.512
6116 // (<32 x i16>, <32 x i16>, <16 x i32>, i16)
6117 // < 8 x i16> @llvm.x86.avx512.mask.pmaddubs.w.128
6118 // (<16 x i8>, <16 x i8>, <8 x i16>, i8)
6119 // <16 x i16> @llvm.x86.avx512.mask.pmaddubs.w.256
6120 // (<32 x i8>, <32 x i8>, <16 x i16>, i16)
6121 // <32 x i16> @llvm.x86.avx512.mask.pmaddubs.w.512
6122 // (<64 x i8>, <64 x i8>, <32 x i16>, i32)
6123 case Intrinsic::x86_sse2_pmadd_wd:
6124 case Intrinsic::x86_avx2_pmadd_wd:
6125 case Intrinsic::x86_avx512_pmaddw_d_512:
6126 case Intrinsic::x86_ssse3_pmadd_ub_sw_128:
6127 case Intrinsic::x86_avx2_pmadd_ub_sw:
6128 case Intrinsic::x86_avx512_pmaddubs_w_512:
6129 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6130 /*ZeroPurifies=*/true,
6131 /*EltSizeInBits=*/0,
6132 /*Lanes=*/kBothLanes);
6133 break;
6134
6135 // <1 x i64> @llvm.x86.ssse3.pmadd.ub.sw(<1 x i64>, <1 x i64>)
6136 case Intrinsic::x86_ssse3_pmadd_ub_sw:
6137 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6138 /*ZeroPurifies=*/true,
6139 /*EltSizeInBits=*/8,
6140 /*Lanes=*/kBothLanes);
6141 break;
6142
6143 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64>, <1 x i64>)
6144 case Intrinsic::x86_mmx_pmadd_wd:
6145 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6146 /*ZeroPurifies=*/true,
6147 /*EltSizeInBits=*/16,
6148 /*Lanes=*/kBothLanes);
6149 break;
6150
6151 // BFloat16 multiply-add to single-precision
6152 // <4 x float> llvm.aarch64.neon.bfmlalt
6153 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6154 case Intrinsic::aarch64_neon_bfmlalt:
6155 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6156 /*ZeroPurifies=*/false,
6157 /*EltSizeInBits=*/0,
6158 /*Lanes=*/kOddLanes);
6159 break;
6160
6161 // <4 x float> llvm.aarch64.neon.bfmlalb
6162 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6163 case Intrinsic::aarch64_neon_bfmlalb:
6164 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6165 /*ZeroPurifies=*/false,
6166 /*EltSizeInBits=*/0,
6167 /*Lanes=*/kEvenLanes);
6168 break;
6169
6170 // AVX Vector Neural Network Instructions: bytes
6171 //
6172 // Multiply and Add Signed Bytes
6173 // < 4 x i32> @llvm.x86.avx2.vpdpbssd.128
6174 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6175 // < 8 x i32> @llvm.x86.avx2.vpdpbssd.256
6176 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6177 // <16 x i32> @llvm.x86.avx10.vpdpbssd.512
6178 // (<16 x i32>, <64 x i8>, <64 x i8>)
6179 //
6180 // Multiply and Add Signed Bytes With Saturation
6181 // < 4 x i32> @llvm.x86.avx2.vpdpbssds.128
6182 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6183 // < 8 x i32> @llvm.x86.avx2.vpdpbssds.256
6184 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6185 // <16 x i32> @llvm.x86.avx10.vpdpbssds.512
6186 // (<16 x i32>, <64 x i8>, <64 x i8>)
6187 //
6188 // Multiply and Add Signed and Unsigned Bytes
6189 // < 4 x i32> @llvm.x86.avx2.vpdpbsud.128
6190 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6191 // < 8 x i32> @llvm.x86.avx2.vpdpbsud.256
6192 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6193 // <16 x i32> @llvm.x86.avx10.vpdpbsud.512
6194 // (<16 x i32>, <64 x i8>, <64 x i8>)
6195 //
6196 // Multiply and Add Signed and Unsigned Bytes With Saturation
6197 // < 4 x i32> @llvm.x86.avx2.vpdpbsuds.128
6198 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6199 // < 8 x i32> @llvm.x86.avx2.vpdpbsuds.256
6200 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6201 // <16 x i32> @llvm.x86.avx512.vpdpbusds.512
6202 // (<16 x i32>, <64 x i8>, <64 x i8>)
6203 //
6204 // Multiply and Add Unsigned and Signed Bytes
6205 // < 4 x i32> @llvm.x86.avx512.vpdpbusd.128
6206 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6207 // < 8 x i32> @llvm.x86.avx512.vpdpbusd.256
6208 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6209 // <16 x i32> @llvm.x86.avx512.vpdpbusd.512
6210 // (<16 x i32>, <64 x i8>, <64 x i8>)
6211 //
6212 // Multiply and Add Unsigned and Signed Bytes With Saturation
6213 // < 4 x i32> @llvm.x86.avx512.vpdpbusds.128
6214 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6215 // < 8 x i32> @llvm.x86.avx512.vpdpbusds.256
6216 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6217 // <16 x i32> @llvm.x86.avx10.vpdpbsuds.512
6218 // (<16 x i32>, <64 x i8>, <64 x i8>)
6219 //
6220 // Multiply and Add Unsigned Bytes
6221 // < 4 x i32> @llvm.x86.avx2.vpdpbuud.128
6222 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6223 // < 8 x i32> @llvm.x86.avx2.vpdpbuud.256
6224 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6225 // <16 x i32> @llvm.x86.avx10.vpdpbuud.512
6226 // (<16 x i32>, <64 x i8>, <64 x i8>)
6227 //
6228 // Multiply and Add Unsigned Bytes With Saturation
6229 // < 4 x i32> @llvm.x86.avx2.vpdpbuuds.128
6230 // (< 4 x i32>, <16 x i8>, <16 x i8>)
6231 // < 8 x i32> @llvm.x86.avx2.vpdpbuuds.256
6232 // (< 8 x i32>, <32 x i8>, <32 x i8>)
6233 // <16 x i32> @llvm.x86.avx10.vpdpbuuds.512
6234 // (<16 x i32>, <64 x i8>, <64 x i8>)
6235 //
6236 // These intrinsics are auto-upgraded into non-masked forms:
6237 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusd.128
6238 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6239 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusd.128
6240 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6241 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusd.256
6242 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6243 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusd.256
6244 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6245 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusd.512
6246 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6247 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusd.512
6248 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6249 //
6250 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusds.128
6251 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6252 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusds.128
6253 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
6254 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusds.256
6255 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6256 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusds.256
6257 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
6258 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusds.512
6259 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6260 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusds.512
6261 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
6262 case Intrinsic::x86_avx512_vpdpbusd_128:
6263 case Intrinsic::x86_avx512_vpdpbusd_256:
6264 case Intrinsic::x86_avx512_vpdpbusd_512:
6265 case Intrinsic::x86_avx512_vpdpbusds_128:
6266 case Intrinsic::x86_avx512_vpdpbusds_256:
6267 case Intrinsic::x86_avx512_vpdpbusds_512:
6268 case Intrinsic::x86_avx2_vpdpbssd_128:
6269 case Intrinsic::x86_avx2_vpdpbssd_256:
6270 case Intrinsic::x86_avx10_vpdpbssd_512:
6271 case Intrinsic::x86_avx2_vpdpbssds_128:
6272 case Intrinsic::x86_avx2_vpdpbssds_256:
6273 case Intrinsic::x86_avx10_vpdpbssds_512:
6274 case Intrinsic::x86_avx2_vpdpbsud_128:
6275 case Intrinsic::x86_avx2_vpdpbsud_256:
6276 case Intrinsic::x86_avx10_vpdpbsud_512:
6277 case Intrinsic::x86_avx2_vpdpbsuds_128:
6278 case Intrinsic::x86_avx2_vpdpbsuds_256:
6279 case Intrinsic::x86_avx10_vpdpbsuds_512:
6280 case Intrinsic::x86_avx2_vpdpbuud_128:
6281 case Intrinsic::x86_avx2_vpdpbuud_256:
6282 case Intrinsic::x86_avx10_vpdpbuud_512:
6283 case Intrinsic::x86_avx2_vpdpbuuds_128:
6284 case Intrinsic::x86_avx2_vpdpbuuds_256:
6285 case Intrinsic::x86_avx10_vpdpbuuds_512:
6286 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/4,
6287 /*ZeroPurifies=*/true,
6288 /*EltSizeInBits=*/0,
6289 /*Lanes=*/kBothLanes);
6290 break;
6291
6292 // AVX Vector Neural Network Instructions: words
6293 //
6294 // Multiply and Add Signed Word Integers
6295 // < 4 x i32> @llvm.x86.avx512.vpdpwssd.128
6296 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6297 // < 8 x i32> @llvm.x86.avx512.vpdpwssd.256
6298 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6299 // <16 x i32> @llvm.x86.avx512.vpdpwssd.512
6300 // (<16 x i32>, <32 x i16>, <32 x i16>)
6301 //
6302 // Multiply and Add Signed Word Integers With Saturation
6303 // < 4 x i32> @llvm.x86.avx512.vpdpwssds.128
6304 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6305 // < 8 x i32> @llvm.x86.avx512.vpdpwssds.256
6306 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6307 // <16 x i32> @llvm.x86.avx512.vpdpwssds.512
6308 // (<16 x i32>, <32 x i16>, <32 x i16>)
6309 //
6310 // Multiply and Add Signed and Unsigned Word Integers
6311 // < 4 x i32> @llvm.x86.avx2.vpdpwsud.128
6312 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6313 // < 8 x i32> @llvm.x86.avx2.vpdpwsud.256
6314 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6315 // <16 x i32> @llvm.x86.avx10.vpdpwsud.512
6316 // (<16 x i32>, <32 x i16>, <32 x i16>)
6317 //
6318 // Multiply and Add Signed and Unsigned Word Integers With Saturation
6319 // < 4 x i32> @llvm.x86.avx2.vpdpwsuds.128
6320 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6321 // < 8 x i32> @llvm.x86.avx2.vpdpwsuds.256
6322 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6323 // <16 x i32> @llvm.x86.avx10.vpdpwsuds.512
6324 // (<16 x i32>, <32 x i16>, <32 x i16>)
6325 //
6326 // Multiply and Add Unsigned and Signed Word Integers
6327 // < 4 x i32> @llvm.x86.avx2.vpdpwusd.128
6328 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6329 // < 8 x i32> @llvm.x86.avx2.vpdpwusd.256
6330 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6331 // <16 x i32> @llvm.x86.avx10.vpdpwusd.512
6332 // (<16 x i32>, <32 x i16>, <32 x i16>)
6333 //
6334 // Multiply and Add Unsigned and Signed Word Integers With Saturation
6335 // < 4 x i32> @llvm.x86.avx2.vpdpwusds.128
6336 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6337 // < 8 x i32> @llvm.x86.avx2.vpdpwusds.256
6338 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6339 // <16 x i32> @llvm.x86.avx10.vpdpwusds.512
6340 // (<16 x i32>, <32 x i16>, <32 x i16>)
6341 //
6342 // Multiply and Add Unsigned and Unsigned Word Integers
6343 // < 4 x i32> @llvm.x86.avx2.vpdpwuud.128
6344 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6345 // < 8 x i32> @llvm.x86.avx2.vpdpwuud.256
6346 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6347 // <16 x i32> @llvm.x86.avx10.vpdpwuud.512
6348 // (<16 x i32>, <32 x i16>, <32 x i16>)
6349 //
6350 // Multiply and Add Unsigned and Unsigned Word Integers With Saturation
6351 // < 4 x i32> @llvm.x86.avx2.vpdpwuuds.128
6352 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
6353 // < 8 x i32> @llvm.x86.avx2.vpdpwuuds.256
6354 // (< 8 x i32>, <16 x i16>, <16 x i16>)
6355 // <16 x i32> @llvm.x86.avx10.vpdpwuuds.512
6356 // (<16 x i32>, <32 x i16>, <32 x i16>)
6357 //
6358 // These intrinsics are auto-upgraded into non-masked forms:
6359 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssd.128
6360 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6361 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssd.128
6362 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6363 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssd.256
6364 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6365 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssd.256
6366 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6367 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssd.512
6368 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6369 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssd.512
6370 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6371 //
6372 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssds.128
6373 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6374 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssds.128
6375 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6376 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssds.256
6377 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6378 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssds.256
6379 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6380 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssds.512
6381 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6382 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssds.512
6383 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6384 case Intrinsic::x86_avx512_vpdpwssd_128:
6385 case Intrinsic::x86_avx512_vpdpwssd_256:
6386 case Intrinsic::x86_avx512_vpdpwssd_512:
6387 case Intrinsic::x86_avx512_vpdpwssds_128:
6388 case Intrinsic::x86_avx512_vpdpwssds_256:
6389 case Intrinsic::x86_avx512_vpdpwssds_512:
6390 case Intrinsic::x86_avx2_vpdpwsud_128:
6391 case Intrinsic::x86_avx2_vpdpwsud_256:
6392 case Intrinsic::x86_avx10_vpdpwsud_512:
6393 case Intrinsic::x86_avx2_vpdpwsuds_128:
6394 case Intrinsic::x86_avx2_vpdpwsuds_256:
6395 case Intrinsic::x86_avx10_vpdpwsuds_512:
6396 case Intrinsic::x86_avx2_vpdpwusd_128:
6397 case Intrinsic::x86_avx2_vpdpwusd_256:
6398 case Intrinsic::x86_avx10_vpdpwusd_512:
6399 case Intrinsic::x86_avx2_vpdpwusds_128:
6400 case Intrinsic::x86_avx2_vpdpwusds_256:
6401 case Intrinsic::x86_avx10_vpdpwusds_512:
6402 case Intrinsic::x86_avx2_vpdpwuud_128:
6403 case Intrinsic::x86_avx2_vpdpwuud_256:
6404 case Intrinsic::x86_avx10_vpdpwuud_512:
6405 case Intrinsic::x86_avx2_vpdpwuuds_128:
6406 case Intrinsic::x86_avx2_vpdpwuuds_256:
6407 case Intrinsic::x86_avx10_vpdpwuuds_512:
6408 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6409 /*ZeroPurifies=*/true,
6410 /*EltSizeInBits=*/0,
6411 /*Lanes=*/kBothLanes);
6412 break;
6413
6414 // Dot Product of BF16 Pairs Accumulated Into Packed Single
6415 // Precision
6416 // <4 x float> @llvm.x86.avx512bf16.dpbf16ps.128
6417 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6418 // <8 x float> @llvm.x86.avx512bf16.dpbf16ps.256
6419 // (<8 x float>, <16 x bfloat>, <16 x bfloat>)
6420 // <16 x float> @llvm.x86.avx512bf16.dpbf16ps.512
6421 // (<16 x float>, <32 x bfloat>, <32 x bfloat>)
6422 case Intrinsic::x86_avx512bf16_dpbf16ps_128:
6423 case Intrinsic::x86_avx512bf16_dpbf16ps_256:
6424 case Intrinsic::x86_avx512bf16_dpbf16ps_512:
6425 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
6426 /*ZeroPurifies=*/false,
6427 /*EltSizeInBits=*/0,
6428 /*Lanes=*/kBothLanes);
6429 break;
6430
6431 case Intrinsic::x86_sse_cmp_ss:
6432 case Intrinsic::x86_sse2_cmp_sd:
6433 case Intrinsic::x86_sse_comieq_ss:
6434 case Intrinsic::x86_sse_comilt_ss:
6435 case Intrinsic::x86_sse_comile_ss:
6436 case Intrinsic::x86_sse_comigt_ss:
6437 case Intrinsic::x86_sse_comige_ss:
6438 case Intrinsic::x86_sse_comineq_ss:
6439 case Intrinsic::x86_sse_ucomieq_ss:
6440 case Intrinsic::x86_sse_ucomilt_ss:
6441 case Intrinsic::x86_sse_ucomile_ss:
6442 case Intrinsic::x86_sse_ucomigt_ss:
6443 case Intrinsic::x86_sse_ucomige_ss:
6444 case Intrinsic::x86_sse_ucomineq_ss:
6445 case Intrinsic::x86_sse2_comieq_sd:
6446 case Intrinsic::x86_sse2_comilt_sd:
6447 case Intrinsic::x86_sse2_comile_sd:
6448 case Intrinsic::x86_sse2_comigt_sd:
6449 case Intrinsic::x86_sse2_comige_sd:
6450 case Intrinsic::x86_sse2_comineq_sd:
6451 case Intrinsic::x86_sse2_ucomieq_sd:
6452 case Intrinsic::x86_sse2_ucomilt_sd:
6453 case Intrinsic::x86_sse2_ucomile_sd:
6454 case Intrinsic::x86_sse2_ucomigt_sd:
6455 case Intrinsic::x86_sse2_ucomige_sd:
6456 case Intrinsic::x86_sse2_ucomineq_sd:
6457 handleVectorCompareScalarIntrinsic(I);
6458 break;
6459
6460 case Intrinsic::x86_avx_cmp_pd_256:
6461 case Intrinsic::x86_avx_cmp_ps_256:
6462 case Intrinsic::x86_sse2_cmp_pd:
6463 case Intrinsic::x86_sse_cmp_ps:
6464 handleVectorComparePackedIntrinsic(I, /*PredicateAsOperand=*/true);
6465 break;
6466
6467 case Intrinsic::x86_bmi_bextr_32:
6468 case Intrinsic::x86_bmi_bextr_64:
6469 case Intrinsic::x86_bmi_bzhi_32:
6470 case Intrinsic::x86_bmi_bzhi_64:
6471 case Intrinsic::x86_bmi_pdep_32:
6472 case Intrinsic::x86_bmi_pdep_64:
6473 case Intrinsic::x86_bmi_pext_32:
6474 case Intrinsic::x86_bmi_pext_64:
6475 handleBmiIntrinsic(I);
6476 break;
6477
6478 case Intrinsic::x86_pclmulqdq:
6479 case Intrinsic::x86_pclmulqdq_256:
6480 case Intrinsic::x86_pclmulqdq_512:
6481 handlePclmulIntrinsic(I);
6482 break;
6483
6484 case Intrinsic::x86_avx_round_pd_256:
6485 case Intrinsic::x86_avx_round_ps_256:
6486 case Intrinsic::x86_sse41_round_pd:
6487 case Intrinsic::x86_sse41_round_ps:
6488 handleRoundPdPsIntrinsic(I);
6489 break;
6490
6491 case Intrinsic::x86_sse41_round_sd:
6492 case Intrinsic::x86_sse41_round_ss:
6493 handleUnarySdSsIntrinsic(I);
6494 break;
6495
6496 case Intrinsic::x86_sse2_max_sd:
6497 case Intrinsic::x86_sse_max_ss:
6498 case Intrinsic::x86_sse2_min_sd:
6499 case Intrinsic::x86_sse_min_ss:
6500 handleBinarySdSsIntrinsic(I);
6501 break;
6502
6503 case Intrinsic::x86_avx_vtestc_pd:
6504 case Intrinsic::x86_avx_vtestc_pd_256:
6505 case Intrinsic::x86_avx_vtestc_ps:
6506 case Intrinsic::x86_avx_vtestc_ps_256:
6507 case Intrinsic::x86_avx_vtestnzc_pd:
6508 case Intrinsic::x86_avx_vtestnzc_pd_256:
6509 case Intrinsic::x86_avx_vtestnzc_ps:
6510 case Intrinsic::x86_avx_vtestnzc_ps_256:
6511 case Intrinsic::x86_avx_vtestz_pd:
6512 case Intrinsic::x86_avx_vtestz_pd_256:
6513 case Intrinsic::x86_avx_vtestz_ps:
6514 case Intrinsic::x86_avx_vtestz_ps_256:
6515 case Intrinsic::x86_avx_ptestc_256:
6516 case Intrinsic::x86_avx_ptestnzc_256:
6517 case Intrinsic::x86_avx_ptestz_256:
6518 case Intrinsic::x86_sse41_ptestc:
6519 case Intrinsic::x86_sse41_ptestnzc:
6520 case Intrinsic::x86_sse41_ptestz:
6521 handleVtestIntrinsic(I);
6522 break;
6523
6524 // Packed Horizontal Add/Subtract
6525 case Intrinsic::x86_ssse3_phadd_w:
6526 case Intrinsic::x86_ssse3_phadd_w_128:
6527 case Intrinsic::x86_ssse3_phsub_w:
6528 case Intrinsic::x86_ssse3_phsub_w_128:
6529 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6530 /*ReinterpretElemWidth=*/16);
6531 break;
6532
6533 case Intrinsic::x86_avx2_phadd_w:
6534 case Intrinsic::x86_avx2_phsub_w:
6535 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6536 /*ReinterpretElemWidth=*/16);
6537 break;
6538
6539 // Packed Horizontal Add/Subtract
6540 case Intrinsic::x86_ssse3_phadd_d:
6541 case Intrinsic::x86_ssse3_phadd_d_128:
6542 case Intrinsic::x86_ssse3_phsub_d:
6543 case Intrinsic::x86_ssse3_phsub_d_128:
6544 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6545 /*ReinterpretElemWidth=*/32);
6546 break;
6547
6548 case Intrinsic::x86_avx2_phadd_d:
6549 case Intrinsic::x86_avx2_phsub_d:
6550 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6551 /*ReinterpretElemWidth=*/32);
6552 break;
6553
6554 // Packed Horizontal Add/Subtract and Saturate
6555 case Intrinsic::x86_ssse3_phadd_sw:
6556 case Intrinsic::x86_ssse3_phadd_sw_128:
6557 case Intrinsic::x86_ssse3_phsub_sw:
6558 case Intrinsic::x86_ssse3_phsub_sw_128:
6559 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6560 /*ReinterpretElemWidth=*/16);
6561 break;
6562
6563 case Intrinsic::x86_avx2_phadd_sw:
6564 case Intrinsic::x86_avx2_phsub_sw:
6565 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6566 /*ReinterpretElemWidth=*/16);
6567 break;
6568
6569 // Packed Single/Double Precision Floating-Point Horizontal Add
6570 case Intrinsic::x86_sse3_hadd_ps:
6571 case Intrinsic::x86_sse3_hadd_pd:
6572 case Intrinsic::x86_sse3_hsub_ps:
6573 case Intrinsic::x86_sse3_hsub_pd:
6574 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6575 break;
6576
6577 case Intrinsic::x86_avx_hadd_pd_256:
6578 case Intrinsic::x86_avx_hadd_ps_256:
6579 case Intrinsic::x86_avx_hsub_pd_256:
6580 case Intrinsic::x86_avx_hsub_ps_256:
6581 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2);
6582 break;
6583
6584 case Intrinsic::x86_avx_maskstore_ps:
6585 case Intrinsic::x86_avx_maskstore_pd:
6586 case Intrinsic::x86_avx_maskstore_ps_256:
6587 case Intrinsic::x86_avx_maskstore_pd_256:
6588 case Intrinsic::x86_avx2_maskstore_d:
6589 case Intrinsic::x86_avx2_maskstore_q:
6590 case Intrinsic::x86_avx2_maskstore_d_256:
6591 case Intrinsic::x86_avx2_maskstore_q_256: {
6592 handleAVXMaskedStore(I);
6593 break;
6594 }
6595
6596 case Intrinsic::x86_avx_maskload_ps:
6597 case Intrinsic::x86_avx_maskload_pd:
6598 case Intrinsic::x86_avx_maskload_ps_256:
6599 case Intrinsic::x86_avx_maskload_pd_256:
6600 case Intrinsic::x86_avx2_maskload_d:
6601 case Intrinsic::x86_avx2_maskload_q:
6602 case Intrinsic::x86_avx2_maskload_d_256:
6603 case Intrinsic::x86_avx2_maskload_q_256: {
6604 handleAVXMaskedLoad(I);
6605 break;
6606 }
6607
6608 // Packed
6609 case Intrinsic::x86_avx512fp16_add_ph_512:
6610 case Intrinsic::x86_avx512fp16_sub_ph_512:
6611 case Intrinsic::x86_avx512fp16_mul_ph_512:
6612 case Intrinsic::x86_avx512fp16_div_ph_512:
6613 case Intrinsic::x86_avx512fp16_max_ph_512:
6614 case Intrinsic::x86_avx512fp16_min_ph_512:
6615 case Intrinsic::x86_avx512_min_ps_512:
6616 case Intrinsic::x86_avx512_min_pd_512:
6617 case Intrinsic::x86_avx512_max_ps_512:
6618 case Intrinsic::x86_avx512_max_pd_512: {
6619 // These AVX512 variants contain the rounding mode as a trailing flag.
6620 // Earlier variants do not have a trailing flag and are already handled
6621 // by maybeHandleSimpleNomemIntrinsic(I, 0) via
6622 // maybeHandleUnknownIntrinsic.
6623 [[maybe_unused]] bool Success =
6624 maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/1);
6625 assert(Success);
6626 break;
6627 }
6628
6629 case Intrinsic::x86_avx_vpermilvar_pd:
6630 case Intrinsic::x86_avx_vpermilvar_pd_256:
6631 case Intrinsic::x86_avx512_vpermilvar_pd_512:
6632 case Intrinsic::x86_avx_vpermilvar_ps:
6633 case Intrinsic::x86_avx_vpermilvar_ps_256:
6634 case Intrinsic::x86_avx512_vpermilvar_ps_512: {
6635 handleAVXVpermilvar(I);
6636 break;
6637 }
6638
6639 case Intrinsic::x86_avx512_vpermi2var_d_128:
6640 case Intrinsic::x86_avx512_vpermi2var_d_256:
6641 case Intrinsic::x86_avx512_vpermi2var_d_512:
6642 case Intrinsic::x86_avx512_vpermi2var_hi_128:
6643 case Intrinsic::x86_avx512_vpermi2var_hi_256:
6644 case Intrinsic::x86_avx512_vpermi2var_hi_512:
6645 case Intrinsic::x86_avx512_vpermi2var_pd_128:
6646 case Intrinsic::x86_avx512_vpermi2var_pd_256:
6647 case Intrinsic::x86_avx512_vpermi2var_pd_512:
6648 case Intrinsic::x86_avx512_vpermi2var_ps_128:
6649 case Intrinsic::x86_avx512_vpermi2var_ps_256:
6650 case Intrinsic::x86_avx512_vpermi2var_ps_512:
6651 case Intrinsic::x86_avx512_vpermi2var_q_128:
6652 case Intrinsic::x86_avx512_vpermi2var_q_256:
6653 case Intrinsic::x86_avx512_vpermi2var_q_512:
6654 case Intrinsic::x86_avx512_vpermi2var_qi_128:
6655 case Intrinsic::x86_avx512_vpermi2var_qi_256:
6656 case Intrinsic::x86_avx512_vpermi2var_qi_512:
6657 handleAVXVpermi2var(I);
6658 break;
6659
6660 // Packed Shuffle
6661 // llvm.x86.sse.pshuf.w(<1 x i64>, i8)
6662 // llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
6663 // llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>)
6664 // llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>)
6665 // llvm.x86.avx512.pshuf.b.512(<64 x i8>, <64 x i8>)
6666 //
6667 // The following intrinsics are auto-upgraded:
6668 // llvm.x86.sse2.pshuf.d(<4 x i32>, i8)
6669 // llvm.x86.sse2.gpshufh.w(<8 x i16>, i8)
6670 // llvm.x86.sse2.pshufl.w(<8 x i16>, i8)
6671 case Intrinsic::x86_avx2_pshuf_b:
6672 case Intrinsic::x86_sse_pshuf_w:
6673 case Intrinsic::x86_ssse3_pshuf_b_128:
6674 case Intrinsic::x86_ssse3_pshuf_b:
6675 case Intrinsic::x86_avx512_pshuf_b_512:
6676 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6677 /*trailingVerbatimArgs=*/1);
6678 break;
6679
6680 // AVX512 PMOV: Packed MOV, with truncation
6681 // Precisely handled by applying the same intrinsic to the shadow
6682 case Intrinsic::x86_avx512_mask_pmov_dw_512:
6683 case Intrinsic::x86_avx512_mask_pmov_db_512:
6684 case Intrinsic::x86_avx512_mask_pmov_qb_512:
6685 case Intrinsic::x86_avx512_mask_pmov_qw_512: {
6686 // Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 were removed in
6687 // f608dc1f5775ee880e8ea30e2d06ab5a4a935c22
6688 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6689 /*trailingVerbatimArgs=*/1);
6690 break;
6691 }
6692
6693 // AVX512 PMVOV{S,US}: Packed MOV, with signed/unsigned saturation
6694 // Approximately handled using the corresponding truncation intrinsic
6695 // TODO: improve handleAVX512VectorDownConvert to precisely model saturation
6696 case Intrinsic::x86_avx512_mask_pmovs_dw_512:
6697 case Intrinsic::x86_avx512_mask_pmovus_dw_512: {
6698 handleIntrinsicByApplyingToShadow(I,
6699 Intrinsic::x86_avx512_mask_pmov_dw_512,
6700 /* trailingVerbatimArgs=*/1);
6701 break;
6702 }
6703
6704 case Intrinsic::x86_avx512_mask_pmovs_db_512:
6705 case Intrinsic::x86_avx512_mask_pmovus_db_512: {
6706 handleIntrinsicByApplyingToShadow(I,
6707 Intrinsic::x86_avx512_mask_pmov_db_512,
6708 /* trailingVerbatimArgs=*/1);
6709 break;
6710 }
6711
6712 case Intrinsic::x86_avx512_mask_pmovs_qb_512:
6713 case Intrinsic::x86_avx512_mask_pmovus_qb_512: {
6714 handleIntrinsicByApplyingToShadow(I,
6715 Intrinsic::x86_avx512_mask_pmov_qb_512,
6716 /* trailingVerbatimArgs=*/1);
6717 break;
6718 }
6719
6720 case Intrinsic::x86_avx512_mask_pmovs_qw_512:
6721 case Intrinsic::x86_avx512_mask_pmovus_qw_512: {
6722 handleIntrinsicByApplyingToShadow(I,
6723 Intrinsic::x86_avx512_mask_pmov_qw_512,
6724 /* trailingVerbatimArgs=*/1);
6725 break;
6726 }
6727
6728 case Intrinsic::x86_avx512_mask_pmovs_qd_512:
6729 case Intrinsic::x86_avx512_mask_pmovus_qd_512:
6730 case Intrinsic::x86_avx512_mask_pmovs_wb_512:
6731 case Intrinsic::x86_avx512_mask_pmovus_wb_512: {
6732 // Since Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 do not exist, we
6733 // cannot use handleIntrinsicByApplyingToShadow. Instead, we call the
6734 // slow-path handler.
6735 handleAVX512VectorDownConvert(I);
6736 break;
6737 }
6738
6739 // AVX512/AVX10 Reciprocal
6740 // <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
6741 // (<16 x float>, <16 x float>, i16)
6742 // <8 x float> @llvm.x86.avx512.rsqrt14.ps.256
6743 // (<8 x float>, <8 x float>, i8)
6744 // <4 x float> @llvm.x86.avx512.rsqrt14.ps.128
6745 // (<4 x float>, <4 x float>, i8)
6746 //
6747 // <8 x double> @llvm.x86.avx512.rsqrt14.pd.512
6748 // (<8 x double>, <8 x double>, i8)
6749 // <4 x double> @llvm.x86.avx512.rsqrt14.pd.256
6750 // (<4 x double>, <4 x double>, i8)
6751 // <2 x double> @llvm.x86.avx512.rsqrt14.pd.128
6752 // (<2 x double>, <2 x double>, i8)
6753 //
6754 // <32 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.512
6755 // (<32 x bfloat>, <32 x bfloat>, i32)
6756 // <16 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.256
6757 // (<16 x bfloat>, <16 x bfloat>, i16)
6758 // <8 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.128
6759 // (<8 x bfloat>, <8 x bfloat>, i8)
6760 //
6761 // <32 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.512
6762 // (<32 x half>, <32 x half>, i32)
6763 // <16 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.256
6764 // (<16 x half>, <16 x half>, i16)
6765 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.128
6766 // (<8 x half>, <8 x half>, i8)
6767 //
6768 // TODO: 3-operand variants are not handled:
6769 // <2 x double> @llvm.x86.avx512.rsqrt14.sd
6770 // (<2 x double>, <2 x double>, <2 x double>, i8)
6771 // <4 x float> @llvm.x86.avx512.rsqrt14.ss
6772 // (<4 x float>, <4 x float>, <4 x float>, i8)
6773 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.sh
6774 // (<8 x half>, <8 x half>, <8 x half>, i8)
6775 case Intrinsic::x86_avx512_rsqrt14_ps_512:
6776 case Intrinsic::x86_avx512_rsqrt14_ps_256:
6777 case Intrinsic::x86_avx512_rsqrt14_ps_128:
6778 case Intrinsic::x86_avx512_rsqrt14_pd_512:
6779 case Intrinsic::x86_avx512_rsqrt14_pd_256:
6780 case Intrinsic::x86_avx512_rsqrt14_pd_128:
6781 case Intrinsic::x86_avx10_mask_rsqrt_bf16_512:
6782 case Intrinsic::x86_avx10_mask_rsqrt_bf16_256:
6783 case Intrinsic::x86_avx10_mask_rsqrt_bf16_128:
6784 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_512:
6785 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_256:
6786 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_128:
6787 handleAVX512VectorGenericMaskedFP(I, /*DataIndices=*/{0},
6788 /*WriteThruIndex=*/1,
6789 /*MaskIndex=*/2);
6790 break;
6791
6792 // AVX512/AVX10 Reciprocal Square Root
6793 // <16 x float> @llvm.x86.avx512.rcp14.ps.512
6794 // (<16 x float>, <16 x float>, i16)
6795 // <8 x float> @llvm.x86.avx512.rcp14.ps.256
6796 // (<8 x float>, <8 x float>, i8)
6797 // <4 x float> @llvm.x86.avx512.rcp14.ps.128
6798 // (<4 x float>, <4 x float>, i8)
6799 //
6800 // <8 x double> @llvm.x86.avx512.rcp14.pd.512
6801 // (<8 x double>, <8 x double>, i8)
6802 // <4 x double> @llvm.x86.avx512.rcp14.pd.256
6803 // (<4 x double>, <4 x double>, i8)
6804 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
6805 // (<2 x double>, <2 x double>, i8)
6806 //
6807 // <32 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.512
6808 // (<32 x bfloat>, <32 x bfloat>, i32)
6809 // <16 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.256
6810 // (<16 x bfloat>, <16 x bfloat>, i16)
6811 // <8 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.128
6812 // (<8 x bfloat>, <8 x bfloat>, i8)
6813 //
6814 // <32 x half> @llvm.x86.avx512fp16.mask.rcp.ph.512
6815 // (<32 x half>, <32 x half>, i32)
6816 // <16 x half> @llvm.x86.avx512fp16.mask.rcp.ph.256
6817 // (<16 x half>, <16 x half>, i16)
6818 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.ph.128
6819 // (<8 x half>, <8 x half>, i8)
6820 //
6821 // TODO: 3-operand variants are not handled:
6822 // <2 x double> @llvm.x86.avx512.rcp14.sd
6823 // (<2 x double>, <2 x double>, <2 x double>, i8)
6824 // <4 x float> @llvm.x86.avx512.rcp14.ss
6825 // (<4 x float>, <4 x float>, <4 x float>, i8)
6826 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.sh
6827 // (<8 x half>, <8 x half>, <8 x half>, i8)
6828 case Intrinsic::x86_avx512_rcp14_ps_512:
6829 case Intrinsic::x86_avx512_rcp14_ps_256:
6830 case Intrinsic::x86_avx512_rcp14_ps_128:
6831 case Intrinsic::x86_avx512_rcp14_pd_512:
6832 case Intrinsic::x86_avx512_rcp14_pd_256:
6833 case Intrinsic::x86_avx512_rcp14_pd_128:
6834 case Intrinsic::x86_avx10_mask_rcp_bf16_512:
6835 case Intrinsic::x86_avx10_mask_rcp_bf16_256:
6836 case Intrinsic::x86_avx10_mask_rcp_bf16_128:
6837 case Intrinsic::x86_avx512fp16_mask_rcp_ph_512:
6838 case Intrinsic::x86_avx512fp16_mask_rcp_ph_256:
6839 case Intrinsic::x86_avx512fp16_mask_rcp_ph_128:
6840 handleAVX512VectorGenericMaskedFP(I, /*DataIndices=*/{0},
6841 /*WriteThruIndex=*/1,
6842 /*MaskIndex=*/2);
6843 break;
6844
6845 // <32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512
6846 // (<32 x half>, i32, <32 x half>, i32, i32)
6847 // <16 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.256
6848 // (<16 x half>, i32, <16 x half>, i32, i16)
6849 // <8 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.128
6850 // (<8 x half>, i32, <8 x half>, i32, i8)
6851 //
6852 // <16 x float> @llvm.x86.avx512.mask.rndscale.ps.512
6853 // (<16 x float>, i32, <16 x float>, i16, i32)
6854 // <8 x float> @llvm.x86.avx512.mask.rndscale.ps.256
6855 // (<8 x float>, i32, <8 x float>, i8)
6856 // <4 x float> @llvm.x86.avx512.mask.rndscale.ps.128
6857 // (<4 x float>, i32, <4 x float>, i8)
6858 //
6859 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
6860 // (<8 x double>, i32, <8 x double>, i8, i32)
6861 // A Imm WriteThru Mask Rounding
6862 // <4 x double> @llvm.x86.avx512.mask.rndscale.pd.256
6863 // (<4 x double>, i32, <4 x double>, i8)
6864 // <2 x double> @llvm.x86.avx512.mask.rndscale.pd.128
6865 // (<2 x double>, i32, <2 x double>, i8)
6866 // A Imm WriteThru Mask
6867 //
6868 // <32 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.512
6869 // (<32 x bfloat>, i32, <32 x bfloat>, i32)
6870 // <16 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.256
6871 // (<16 x bfloat>, i32, <16 x bfloat>, i16)
6872 // <8 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.128
6873 // (<8 x bfloat>, i32, <8 x bfloat>, i8)
6874 //
6875 // Not supported: three vectors
6876 // - <8 x half> @llvm.x86.avx512fp16.mask.rndscale.sh
6877 // (<8 x half>, <8 x half>,<8 x half>, i8, i32, i32)
6878 // - <4 x float> @llvm.x86.avx512.mask.rndscale.ss
6879 // (<4 x float>, <4 x float>, <4 x float>, i8, i32, i32)
6880 // - <2 x double> @llvm.x86.avx512.mask.rndscale.sd
6881 // (<2 x double>, <2 x double>, <2 x double>, i8, i32,
6882 // i32)
6883 // A B WriteThru Mask Imm
6884 // Rounding
6885 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_512:
6886 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_256:
6887 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_128:
6888 case Intrinsic::x86_avx512_mask_rndscale_ps_512:
6889 case Intrinsic::x86_avx512_mask_rndscale_ps_256:
6890 case Intrinsic::x86_avx512_mask_rndscale_ps_128:
6891 case Intrinsic::x86_avx512_mask_rndscale_pd_512:
6892 case Intrinsic::x86_avx512_mask_rndscale_pd_256:
6893 case Intrinsic::x86_avx512_mask_rndscale_pd_128:
6894 case Intrinsic::x86_avx10_mask_rndscale_bf16_512:
6895 case Intrinsic::x86_avx10_mask_rndscale_bf16_256:
6896 case Intrinsic::x86_avx10_mask_rndscale_bf16_128:
6897 handleAVX512VectorGenericMaskedFP(I, /*DataIndices=*/{0},
6898 /*WriteThruIndex=*/2,
6899 /*MaskIndex=*/3);
6900 break;
6901
6902 // AVX512 Vector Scale Float* Packed
6903 //
6904 // < 8 x double> @llvm.x86.avx512.mask.scalef.pd.512
6905 // (<8 x double>, <8 x double>, <8 x double>, i8, i32)
6906 // A B WriteThru Msk Round
6907 // < 4 x double> @llvm.x86.avx512.mask.scalef.pd.256
6908 // (<4 x double>, <4 x double>, <4 x double>, i8)
6909 // < 2 x double> @llvm.x86.avx512.mask.scalef.pd.128
6910 // (<2 x double>, <2 x double>, <2 x double>, i8)
6911 //
6912 // <16 x float> @llvm.x86.avx512.mask.scalef.ps.512
6913 // (<16 x float>, <16 x float>, <16 x float>, i16, i32)
6914 // < 8 x float> @llvm.x86.avx512.mask.scalef.ps.256
6915 // (<8 x float>, <8 x float>, <8 x float>, i8)
6916 // < 4 x float> @llvm.x86.avx512.mask.scalef.ps.128
6917 // (<4 x float>, <4 x float>, <4 x float>, i8)
6918 //
6919 // <32 x half> @llvm.x86.avx512fp16.mask.scalef.ph.512
6920 // (<32 x half>, <32 x half>, <32 x half>, i32, i32)
6921 // <16 x half> @llvm.x86.avx512fp16.mask.scalef.ph.256
6922 // (<16 x half>, <16 x half>, <16 x half>, i16)
6923 // < 8 x half> @llvm.x86.avx512fp16.mask.scalef.ph.128
6924 // (<8 x half>, <8 x half>, <8 x half>, i8)
6925 //
6926 // TODO: AVX10
6927 // <32 x bfloat> @llvm.x86.avx10.mask.scalef.bf16.512
6928 // (<32 x bfloat>, <32 x bfloat>, <32 x bfloat>, i32)
6929 // <16 x bfloat> @llvm.x86.avx10.mask.scalef.bf16.256
6930 // (<16 x bfloat>, <16 x bfloat>, <16 x bfloat>, i16)
6931 // < 8 x bfloat> @llvm.x86.avx10.mask.scalef.bf16.128
6932 // (<8 x bfloat>, <8 x bfloat>, <8 x bfloat>, i8)
6933 case Intrinsic::x86_avx512_mask_scalef_pd_512:
6934 case Intrinsic::x86_avx512_mask_scalef_pd_256:
6935 case Intrinsic::x86_avx512_mask_scalef_pd_128:
6936 case Intrinsic::x86_avx512_mask_scalef_ps_512:
6937 case Intrinsic::x86_avx512_mask_scalef_ps_256:
6938 case Intrinsic::x86_avx512_mask_scalef_ps_128:
6939 case Intrinsic::x86_avx512fp16_mask_scalef_ph_512:
6940 case Intrinsic::x86_avx512fp16_mask_scalef_ph_256:
6941 case Intrinsic::x86_avx512fp16_mask_scalef_ph_128:
6942 // The AVX512 512-bit operand variants have an extra operand (the
6943 // Rounding mode). The extra operand, if present, will be
6944 // automatically checked by the handler.
6945 handleAVX512VectorGenericMaskedFP(I, /*DataIndices=*/{0, 1},
6946 /*WriteThruIndex=*/2,
6947 /*MaskIndex=*/3);
6948 break;
6949
6950 // TODO: AVX512 Vector Scale Float* Scalar
6951 //
6952 // This is different from the Packed variant, because some bits are copied,
6953 // and some bits are zeroed.
6954 //
6955 // < 4 x float> @llvm.x86.avx512.mask.scalef.ss
6956 // (<4 x float>, <4 x float>, <4 x float>, i8, i32)
6957 //
6958 // < 2 x double> @llvm.x86.avx512.mask.scalef.sd
6959 // (<2 x double>, <2 x double>, <2 x double>, i8, i32)
6960 //
6961 // < 8 x half> @llvm.x86.avx512fp16.mask.scalef.sh
6962 // (<8 x half>, <8 x half>, <8 x half>, i8, i32)
6963
6964 // AVX512 FP16 Arithmetic
6965 case Intrinsic::x86_avx512fp16_mask_add_sh_round:
6966 case Intrinsic::x86_avx512fp16_mask_sub_sh_round:
6967 case Intrinsic::x86_avx512fp16_mask_mul_sh_round:
6968 case Intrinsic::x86_avx512fp16_mask_div_sh_round:
6969 case Intrinsic::x86_avx512fp16_mask_max_sh_round:
6970 case Intrinsic::x86_avx512fp16_mask_min_sh_round: {
6971 visitGenericScalarHalfwordInst(I);
6972 break;
6973 }
6974
6975 // AVX Galois Field New Instructions
6976 case Intrinsic::x86_vgf2p8affineqb_128:
6977 case Intrinsic::x86_vgf2p8affineqb_256:
6978 case Intrinsic::x86_vgf2p8affineqb_512:
6979 handleAVXGF2P8Affine(I);
6980 break;
6981
6982 default:
6983 return false;
6984 }
6985
6986 return true;
6987 }
6988
6989 bool maybeHandleArmSIMDIntrinsic(IntrinsicInst &I) {
6990 switch (I.getIntrinsicID()) {
6991 // Two operands e.g.,
6992 // - <8 x i8> @llvm.aarch64.neon.rshrn.v8i8 (<8 x i16>, i32)
6993 // - <4 x i16> @llvm.aarch64.neon.uqrshl.v4i16(<4 x i16>, <4 x i16>)
6994 case Intrinsic::aarch64_neon_rshrn:
6995 case Intrinsic::aarch64_neon_sqrshl:
6996 case Intrinsic::aarch64_neon_sqrshrn:
6997 case Intrinsic::aarch64_neon_sqrshrun:
6998 case Intrinsic::aarch64_neon_sqshl:
6999 case Intrinsic::aarch64_neon_sqshlu:
7000 case Intrinsic::aarch64_neon_sqshrn:
7001 case Intrinsic::aarch64_neon_sqshrun:
7002 case Intrinsic::aarch64_neon_srshl:
7003 case Intrinsic::aarch64_neon_sshl:
7004 case Intrinsic::aarch64_neon_uqrshl:
7005 case Intrinsic::aarch64_neon_uqrshrn:
7006 case Intrinsic::aarch64_neon_uqshl:
7007 case Intrinsic::aarch64_neon_uqshrn:
7008 case Intrinsic::aarch64_neon_urshl:
7009 case Intrinsic::aarch64_neon_ushl:
7010 handleVectorShiftIntrinsic(I, /* Variable */ false);
7011 break;
7012
7013 // Vector Shift Left/Right and Insert
7014 //
7015 // Three operands e.g.,
7016 // - <4 x i16> @llvm.aarch64.neon.vsli.v4i16
7017 // (<4 x i16> %a, <4 x i16> %b, i32 %n)
7018 // - <16 x i8> @llvm.aarch64.neon.vsri.v16i8
7019 // (<16 x i8> %a, <16 x i8> %b, i32 %n)
7020 //
7021 // %b is shifted by %n bits, and the "missing" bits are filled in with %a
7022 // (instead of zero-extending/sign-extending).
7023 case Intrinsic::aarch64_neon_vsli:
7024 case Intrinsic::aarch64_neon_vsri:
7025 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
7026 /*trailingVerbatimArgs=*/1);
7027 break;
7028
7029 // TODO: handling max/min similarly to AND/OR may be more precise
7030 // Floating-Point Maximum/Minimum Pairwise
7031 case Intrinsic::aarch64_neon_fmaxp:
7032 case Intrinsic::aarch64_neon_fminp:
7033 // Floating-Point Maximum/Minimum Number Pairwise
7034 case Intrinsic::aarch64_neon_fmaxnmp:
7035 case Intrinsic::aarch64_neon_fminnmp:
7036 // Signed/Unsigned Maximum/Minimum Pairwise
7037 case Intrinsic::aarch64_neon_smaxp:
7038 case Intrinsic::aarch64_neon_sminp:
7039 case Intrinsic::aarch64_neon_umaxp:
7040 case Intrinsic::aarch64_neon_uminp:
7041 // Add Pairwise
7042 case Intrinsic::aarch64_neon_addp:
7043 // Floating-point Add Pairwise
7044 case Intrinsic::aarch64_neon_faddp:
7045 // Add Long Pairwise
7046 case Intrinsic::aarch64_neon_saddlp:
7047 case Intrinsic::aarch64_neon_uaddlp: {
7048 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
7049 break;
7050 }
7051
7052 // Floating-point Convert to integer, rounding to nearest with ties to Away
7053 case Intrinsic::aarch64_neon_fcvtas:
7054 case Intrinsic::aarch64_neon_fcvtau:
7055 // Floating-point convert to integer, rounding toward minus infinity
7056 case Intrinsic::aarch64_neon_fcvtms:
7057 case Intrinsic::aarch64_neon_fcvtmu:
7058 // Floating-point convert to integer, rounding to nearest with ties to even
7059 case Intrinsic::aarch64_neon_fcvtns:
7060 case Intrinsic::aarch64_neon_fcvtnu:
7061 // Floating-point convert to integer, rounding toward plus infinity
7062 case Intrinsic::aarch64_neon_fcvtps:
7063 case Intrinsic::aarch64_neon_fcvtpu:
7064 // Floating-point Convert to integer, rounding toward Zero
7065 case Intrinsic::aarch64_neon_fcvtzs:
7066 case Intrinsic::aarch64_neon_fcvtzu:
7067 // Floating-point convert to lower precision narrow, rounding to odd
7068 case Intrinsic::aarch64_neon_fcvtxn:
7069 // Vector Conversions Between Half-Precision and Single-Precision
7070 case Intrinsic::aarch64_neon_vcvthf2fp:
7071 case Intrinsic::aarch64_neon_vcvtfp2hf:
7072 handleNEONVectorConvertIntrinsic(I, /*FixedPoint=*/false);
7073 break;
7074
7075 // Vector Conversions Between Fixed-Point and Floating-Point
7076 case Intrinsic::aarch64_neon_vcvtfxs2fp:
7077 case Intrinsic::aarch64_neon_vcvtfp2fxs:
7078 case Intrinsic::aarch64_neon_vcvtfxu2fp:
7079 case Intrinsic::aarch64_neon_vcvtfp2fxu:
7080 handleNEONVectorConvertIntrinsic(I, /*FixedPoint=*/true);
7081 break;
7082
7083 // TODO: bfloat conversions
7084 // - bfloat @llvm.aarch64.neon.bfcvt(float)
7085 // - <8 x bfloat> @llvm.aarch64.neon.bfcvtn(<4 x float>)
7086 // - <8 x bfloat> @llvm.aarch64.neon.bfcvtn2(<8 x bfloat>, <4 x float>)
7087
7088 // Add reduction to scalar
7089 case Intrinsic::aarch64_neon_faddv:
7090 case Intrinsic::aarch64_neon_saddv:
7091 case Intrinsic::aarch64_neon_uaddv:
7092 // Signed/Unsigned min/max (Vector)
7093 // TODO: handling similarly to AND/OR may be more precise.
7094 case Intrinsic::aarch64_neon_smaxv:
7095 case Intrinsic::aarch64_neon_sminv:
7096 case Intrinsic::aarch64_neon_umaxv:
7097 case Intrinsic::aarch64_neon_uminv:
7098 // Floating-point min/max (vector)
7099 // The f{min,max}"nm"v variants handle NaN differently than f{min,max}v,
7100 // but our shadow propagation is the same.
7101 case Intrinsic::aarch64_neon_fmaxv:
7102 case Intrinsic::aarch64_neon_fminv:
7103 case Intrinsic::aarch64_neon_fmaxnmv:
7104 case Intrinsic::aarch64_neon_fminnmv:
7105 // Sum long across vector
7106 case Intrinsic::aarch64_neon_saddlv:
7107 case Intrinsic::aarch64_neon_uaddlv:
7108 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/true);
7109 break;
7110
7111 case Intrinsic::aarch64_neon_ld1x2:
7112 case Intrinsic::aarch64_neon_ld1x3:
7113 case Intrinsic::aarch64_neon_ld1x4:
7114 case Intrinsic::aarch64_neon_ld2:
7115 case Intrinsic::aarch64_neon_ld3:
7116 case Intrinsic::aarch64_neon_ld4:
7117 case Intrinsic::aarch64_neon_ld2r:
7118 case Intrinsic::aarch64_neon_ld3r:
7119 case Intrinsic::aarch64_neon_ld4r: {
7120 handleNEONVectorLoad(I, /*WithLane=*/false);
7121 break;
7122 }
7123
7124 case Intrinsic::aarch64_neon_ld2lane:
7125 case Intrinsic::aarch64_neon_ld3lane:
7126 case Intrinsic::aarch64_neon_ld4lane: {
7127 handleNEONVectorLoad(I, /*WithLane=*/true);
7128 break;
7129 }
7130
7131 // Saturating extract narrow
7132 case Intrinsic::aarch64_neon_sqxtn:
7133 case Intrinsic::aarch64_neon_sqxtun:
7134 case Intrinsic::aarch64_neon_uqxtn:
7135 // These only have one argument, but we (ab)use handleShadowOr because it
7136 // does work on single argument intrinsics and will typecast the shadow
7137 // (and update the origin).
7138 handleShadowOr(I);
7139 break;
7140
7141 case Intrinsic::aarch64_neon_st1x2:
7142 case Intrinsic::aarch64_neon_st1x3:
7143 case Intrinsic::aarch64_neon_st1x4:
7144 case Intrinsic::aarch64_neon_st2:
7145 case Intrinsic::aarch64_neon_st3:
7146 case Intrinsic::aarch64_neon_st4: {
7147 handleNEONVectorStoreIntrinsic(I, false);
7148 break;
7149 }
7150
7151 case Intrinsic::aarch64_neon_st2lane:
7152 case Intrinsic::aarch64_neon_st3lane:
7153 case Intrinsic::aarch64_neon_st4lane: {
7154 handleNEONVectorStoreIntrinsic(I, true);
7155 break;
7156 }
7157
7158 // Arm NEON vector table intrinsics have the source/table register(s) as
7159 // arguments, followed by the index register. They return the output.
7160 //
7161 // 'TBL writes a zero if an index is out-of-range, while TBX leaves the
7162 // original value unchanged in the destination register.'
7163 // Conveniently, zero denotes a clean shadow, which means out-of-range
7164 // indices for TBL will initialize the user data with zero and also clean
7165 // the shadow. (For TBX, neither the user data nor the shadow will be
7166 // updated, which is also correct.)
7167 case Intrinsic::aarch64_neon_tbl1:
7168 case Intrinsic::aarch64_neon_tbl2:
7169 case Intrinsic::aarch64_neon_tbl3:
7170 case Intrinsic::aarch64_neon_tbl4:
7171 case Intrinsic::aarch64_neon_tbx1:
7172 case Intrinsic::aarch64_neon_tbx2:
7173 case Intrinsic::aarch64_neon_tbx3:
7174 case Intrinsic::aarch64_neon_tbx4: {
7175 // The last trailing argument (index register) should be handled verbatim
7176 handleIntrinsicByApplyingToShadow(
7177 I, /*shadowIntrinsicID=*/I.getIntrinsicID(),
7178 /*trailingVerbatimArgs*/ 1);
7179 break;
7180 }
7181
7182 case Intrinsic::aarch64_neon_fmulx:
7183 case Intrinsic::aarch64_neon_pmul:
7184 case Intrinsic::aarch64_neon_pmull:
7185 case Intrinsic::aarch64_neon_smull:
7186 case Intrinsic::aarch64_neon_pmull64:
7187 case Intrinsic::aarch64_neon_umull: {
7188 handleNEONVectorMultiplyIntrinsic(I);
7189 break;
7190 }
7191
7192 case Intrinsic::aarch64_neon_smmla:
7193 case Intrinsic::aarch64_neon_ummla:
7194 case Intrinsic::aarch64_neon_usmmla:
7195 case Intrinsic::aarch64_neon_bfmmla:
7196 handleNEONMatrixMultiply(I);
7197 break;
7198
7199 // <2 x i32> @llvm.aarch64.neon.{u,s,us}dot.v2i32.v8i8
7200 // (<2 x i32> %acc, <8 x i8> %a, <8 x i8> %b)
7201 // <4 x i32> @llvm.aarch64.neon.{u,s,us}dot.v4i32.v16i8
7202 // (<4 x i32> %acc, <16 x i8> %a, <16 x i8> %b)
7203 case Intrinsic::aarch64_neon_sdot:
7204 case Intrinsic::aarch64_neon_udot:
7205 case Intrinsic::aarch64_neon_usdot:
7206 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/4,
7207 /*ZeroPurifies=*/true,
7208 /*EltSizeInBits=*/0,
7209 /*Lanes=*/kBothLanes);
7210 break;
7211
7212 // <2 x float> @llvm.aarch64.neon.bfdot.v2f32.v4bf16
7213 // (<2 x float> %acc, <4 x bfloat> %a, <4 x bfloat> %b)
7214 // <4 x float> @llvm.aarch64.neon.bfdot.v4f32.v8bf16
7215 // (<4 x float> %acc, <8 x bfloat> %a, <8 x bfloat> %b)
7216 case Intrinsic::aarch64_neon_bfdot:
7217 handleVectorDotProductIntrinsic(I, /*ReductionFactor=*/2,
7218 /*ZeroPurifies=*/false,
7219 /*EltSizeInBits=*/0,
7220 /*Lanes=*/kBothLanes);
7221 break;
7222
7223 // Floating-Point Absolute Compare Greater Than/Equal
7224 case Intrinsic::aarch64_neon_facge:
7225 case Intrinsic::aarch64_neon_facgt:
7226 handleVectorComparePackedIntrinsic(I, /*PredicateAsOperand=*/false);
7227 break;
7228
7229 default:
7230 return false;
7231 }
7232
7233 return true;
7234 }
7235
7236 void visitIntrinsicInst(IntrinsicInst &I) {
7237 if (maybeHandleCrossPlatformIntrinsic(I))
7238 return;
7239
7240 if (maybeHandleX86SIMDIntrinsic(I))
7241 return;
7242
7243 if (maybeHandleArmSIMDIntrinsic(I))
7244 return;
7245
7246 if (maybeHandleUnknownIntrinsic(I))
7247 return;
7248
7249 visitInstruction(I);
7250 }
7251
7252 void visitLibAtomicLoad(CallBase &CB) {
7253 // Since we use getNextNode here, we can't have CB terminate the BB.
7254 assert(isa<CallInst>(CB));
7255
7256 IRBuilder<> IRB(&CB);
7257 Value *Size = CB.getArgOperand(0);
7258 Value *SrcPtr = CB.getArgOperand(1);
7259 Value *DstPtr = CB.getArgOperand(2);
7260 Value *Ordering = CB.getArgOperand(3);
7261 // Convert the call to have at least Acquire ordering to make sure
7262 // the shadow operations aren't reordered before it.
7263 Value *NewOrdering =
7264 IRB.CreateExtractElement(makeAddAcquireOrderingTable(IRB), Ordering);
7265 CB.setArgOperand(3, NewOrdering);
7266
7267 NextNodeIRBuilder NextIRB(&CB);
7268 Value *SrcShadowPtr, *SrcOriginPtr;
7269 std::tie(SrcShadowPtr, SrcOriginPtr) =
7270 getShadowOriginPtr(SrcPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
7271 /*isStore*/ false);
7272 Value *DstShadowPtr =
7273 getShadowOriginPtr(DstPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
7274 /*isStore*/ true)
7275 .first;
7276
7277 NextIRB.CreateMemCpy(DstShadowPtr, Align(1), SrcShadowPtr, Align(1), Size);
7278 if (MS.TrackOrigins) {
7279 Value *SrcOrigin = NextIRB.CreateAlignedLoad(MS.OriginTy, SrcOriginPtr,
7281 Value *NewOrigin = updateOrigin(SrcOrigin, NextIRB);
7282 NextIRB.CreateCall(MS.MsanSetOriginFn, {DstPtr, Size, NewOrigin});
7283 }
7284 }
7285
7286 void visitLibAtomicStore(CallBase &CB) {
7287 IRBuilder<> IRB(&CB);
7288 Value *Size = CB.getArgOperand(0);
7289 Value *DstPtr = CB.getArgOperand(2);
7290 Value *Ordering = CB.getArgOperand(3);
7291 // Convert the call to have at least Release ordering to make sure
7292 // the shadow operations aren't reordered after it.
7293 Value *NewOrdering =
7294 IRB.CreateExtractElement(makeAddReleaseOrderingTable(IRB), Ordering);
7295 CB.setArgOperand(3, NewOrdering);
7296
7297 Value *DstShadowPtr =
7298 getShadowOriginPtr(DstPtr, IRB, IRB.getInt8Ty(), Align(1),
7299 /*isStore*/ true)
7300 .first;
7301
7302 // Atomic store always paints clean shadow/origin. See file header.
7303 IRB.CreateMemSet(DstShadowPtr, getCleanShadow(IRB.getInt8Ty()), Size,
7304 Align(1));
7305 }
7306
7307 void visitCallBase(CallBase &CB) {
7308 assert(!CB.getMetadata(LLVMContext::MD_nosanitize));
7309 if (CB.isInlineAsm()) {
7310 // For inline asm (either a call to asm function, or callbr instruction),
7311 // do the usual thing: check argument shadow and mark all outputs as
7312 // clean. Note that any side effects of the inline asm that are not
7313 // immediately visible in its constraints are not handled.
7315 visitAsmInstruction(CB);
7316 else
7317 visitInstruction(CB);
7318 return;
7319 }
7320 LibFunc LF;
7321 if (TLI->getLibFunc(CB, LF)) {
7322 // libatomic.a functions need to have special handling because there isn't
7323 // a good way to intercept them or compile the library with
7324 // instrumentation.
7325 switch (LF) {
7326 case LibFunc_atomic_load:
7327 if (!isa<CallInst>(CB)) {
7328 llvm::errs() << "MSAN -- cannot instrument invoke of libatomic load."
7329 "Ignoring!\n";
7330 break;
7331 }
7332 visitLibAtomicLoad(CB);
7333 return;
7334 case LibFunc_atomic_store:
7335 visitLibAtomicStore(CB);
7336 return;
7337 default:
7338 break;
7339 }
7340 }
7341
7342 if (auto *Call = dyn_cast<CallInst>(&CB)) {
7343 assert(!isa<IntrinsicInst>(Call) && "intrinsics are handled elsewhere");
7344
7345 // We are going to insert code that relies on the fact that the callee
7346 // will become a non-readonly function after it is instrumented by us. To
7347 // prevent this code from being optimized out, mark that function
7348 // non-readonly in advance.
7349 // TODO: We can likely do better than dropping memory() completely here.
7350 AttributeMask B;
7351 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
7352
7354 if (Function *Func = Call->getCalledFunction()) {
7355 Func->removeFnAttrs(B);
7356 }
7357
7359 }
7360 IRBuilder<> IRB(&CB);
7361 bool MayCheckCall = MS.EagerChecks;
7362 if (Function *Func = CB.getCalledFunction()) {
7363 // __sanitizer_unaligned_{load,store} functions may be called by users
7364 // and always expects shadows in the TLS. So don't check them.
7365 MayCheckCall &= !Func->getName().starts_with("__sanitizer_unaligned_");
7366 }
7367
7368 unsigned ArgOffset = 0;
7369 LLVM_DEBUG(dbgs() << " CallSite: " << CB << "\n");
7370 for (const auto &[i, A] : llvm::enumerate(CB.args())) {
7371 if (!A->getType()->isSized()) {
7372 LLVM_DEBUG(dbgs() << "Arg " << i << " is not sized: " << CB << "\n");
7373 continue;
7374 }
7375
7376 if (A->getType()->isScalableTy()) {
7377 LLVM_DEBUG(dbgs() << "Arg " << i << " is vscale: " << CB << "\n");
7378 // Handle as noundef, but don't reserve tls slots.
7379 insertCheckShadowOf(A, &CB);
7380 continue;
7381 }
7382
7383 unsigned Size = 0;
7384 const DataLayout &DL = F.getDataLayout();
7385
7386 bool ByVal = CB.paramHasAttr(i, Attribute::ByVal);
7387 bool NoUndef = CB.paramHasAttr(i, Attribute::NoUndef);
7388 bool EagerCheck = MayCheckCall && !ByVal && NoUndef;
7389
7390 if (EagerCheck) {
7391 insertCheckShadowOf(A, &CB);
7392 Size = DL.getTypeAllocSize(A->getType());
7393 } else {
7394 [[maybe_unused]] Value *Store = nullptr;
7395 // Compute the Shadow for arg even if it is ByVal, because
7396 // in that case getShadow() will copy the actual arg shadow to
7397 // __msan_param_tls.
7398 Value *ArgShadow = getShadow(A);
7399 Value *ArgShadowBase = getShadowPtrForArgument(IRB, ArgOffset);
7400 LLVM_DEBUG(dbgs() << " Arg#" << i << ": " << *A
7401 << " Shadow: " << *ArgShadow << "\n");
7402 if (ByVal) {
7403 // ByVal requires some special handling as it's too big for a single
7404 // load
7405 assert(A->getType()->isPointerTy() &&
7406 "ByVal argument is not a pointer!");
7407 Size = DL.getTypeAllocSize(CB.getParamByValType(i));
7408 if (ArgOffset + Size > kParamTLSSize)
7409 break;
7410 const MaybeAlign ParamAlignment(CB.getParamAlign(i));
7411 MaybeAlign Alignment = std::nullopt;
7412 if (ParamAlignment)
7413 Alignment = std::min(*ParamAlignment, kShadowTLSAlignment);
7414 Value *AShadowPtr, *AOriginPtr;
7415 std::tie(AShadowPtr, AOriginPtr) =
7416 getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), Alignment,
7417 /*isStore*/ false);
7418 if (!PropagateShadow) {
7419 Store = IRB.CreateMemSet(ArgShadowBase,
7421 Size, Alignment);
7422 } else {
7423 Store = IRB.CreateMemCpy(ArgShadowBase, Alignment, AShadowPtr,
7424 Alignment, Size);
7425 if (MS.TrackOrigins) {
7426 Value *ArgOriginBase = getOriginPtrForArgument(IRB, ArgOffset);
7427 // FIXME: OriginSize should be:
7428 // alignTo(A % kMinOriginAlignment + Size, kMinOriginAlignment)
7429 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
7430 IRB.CreateMemCpy(
7431 ArgOriginBase,
7432 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
7433 AOriginPtr,
7434 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginSize);
7435 }
7436 }
7437 } else {
7438 // Any other parameters mean we need bit-grained tracking of uninit
7439 // data
7440 Size = DL.getTypeAllocSize(A->getType());
7441 if (ArgOffset + Size > kParamTLSSize)
7442 break;
7443 Store = IRB.CreateAlignedStore(ArgShadow, ArgShadowBase,
7445 Constant *Cst = dyn_cast<Constant>(ArgShadow);
7446 if (MS.TrackOrigins && !(Cst && Cst->isNullValue())) {
7447 IRB.CreateStore(getOrigin(A),
7448 getOriginPtrForArgument(IRB, ArgOffset));
7449 }
7450 }
7451 assert(Store != nullptr);
7452 LLVM_DEBUG(dbgs() << " Param:" << *Store << "\n");
7453 }
7454 assert(Size != 0);
7455 ArgOffset += alignTo(Size, kShadowTLSAlignment);
7456 }
7457 LLVM_DEBUG(dbgs() << " done with call args\n");
7458
7459 FunctionType *FT = CB.getFunctionType();
7460 if (FT->isVarArg()) {
7461 VAHelper->visitCallBase(CB, IRB);
7462 }
7463
7464 // Now, get the shadow for the RetVal.
7465 if (!CB.getType()->isSized())
7466 return;
7467 // Don't emit the epilogue for musttail call returns.
7468 if (isa<CallInst>(CB) && cast<CallInst>(CB).isMustTailCall())
7469 return;
7470
7471 if (MayCheckCall && CB.hasRetAttr(Attribute::NoUndef)) {
7472 setShadow(&CB, getCleanShadow(&CB));
7473 setOrigin(&CB, getCleanOrigin());
7474 return;
7475 }
7476
7477 IRBuilder<> IRBBefore(&CB);
7478 // Until we have full dynamic coverage, make sure the retval shadow is 0.
7479 Value *Base = getShadowPtrForRetval(IRBBefore);
7480 IRBBefore.CreateAlignedStore(getCleanShadow(&CB), Base,
7482 BasicBlock::iterator NextInsn;
7483 if (isa<CallInst>(CB)) {
7484 NextInsn = ++CB.getIterator();
7485 assert(NextInsn != CB.getParent()->end());
7486 } else {
7487 BasicBlock *NormalDest = cast<InvokeInst>(CB).getNormalDest();
7488 if (!NormalDest->getSinglePredecessor()) {
7489 // FIXME: this case is tricky, so we are just conservative here.
7490 // Perhaps we need to split the edge between this BB and NormalDest,
7491 // but a naive attempt to use SplitEdge leads to a crash.
7492 setShadow(&CB, getCleanShadow(&CB));
7493 setOrigin(&CB, getCleanOrigin());
7494 return;
7495 }
7496 // FIXME: NextInsn is likely in a basic block that has not been visited
7497 // yet. Anything inserted there will be instrumented by MSan later!
7498 NextInsn = NormalDest->getFirstInsertionPt();
7499 assert(NextInsn != NormalDest->end() &&
7500 "Could not find insertion point for retval shadow load");
7501 }
7502 IRBuilder<> IRBAfter(&*NextInsn);
7503 Value *RetvalShadow = IRBAfter.CreateAlignedLoad(
7504 getShadowTy(&CB), getShadowPtrForRetval(IRBAfter), kShadowTLSAlignment,
7505 "_msret");
7506 setShadow(&CB, RetvalShadow);
7507 if (MS.TrackOrigins)
7508 setOrigin(&CB, IRBAfter.CreateLoad(MS.OriginTy, getOriginPtrForRetval()));
7509 }
7510
7511 bool isAMustTailRetVal(Value *RetVal) {
7512 if (auto *I = dyn_cast<BitCastInst>(RetVal)) {
7513 RetVal = I->getOperand(0);
7514 }
7515 if (auto *I = dyn_cast<CallInst>(RetVal)) {
7516 return I->isMustTailCall();
7517 }
7518 return false;
7519 }
7520
7521 void visitReturnInst(ReturnInst &I) {
7522 IRBuilder<> IRB(&I);
7523 Value *RetVal = I.getReturnValue();
7524 if (!RetVal)
7525 return;
7526 // Don't emit the epilogue for musttail call returns.
7527 if (isAMustTailRetVal(RetVal))
7528 return;
7529 Value *ShadowPtr = getShadowPtrForRetval(IRB);
7530 bool HasNoUndef = F.hasRetAttribute(Attribute::NoUndef);
7531 bool StoreShadow = !(MS.EagerChecks && HasNoUndef);
7532 // FIXME: Consider using SpecialCaseList to specify a list of functions that
7533 // must always return fully initialized values. For now, we hardcode "main".
7534 bool EagerCheck = (MS.EagerChecks && HasNoUndef) || (F.getName() == "main");
7535
7536 Value *Shadow = getShadow(RetVal);
7537 bool StoreOrigin = true;
7538 if (EagerCheck) {
7539 insertCheckShadowOf(RetVal, &I);
7540 Shadow = getCleanShadow(RetVal);
7541 StoreOrigin = false;
7542 }
7543
7544 // The caller may still expect information passed over TLS if we pass our
7545 // check
7546 if (StoreShadow) {
7547 IRB.CreateAlignedStore(Shadow, ShadowPtr, kShadowTLSAlignment);
7548 if (MS.TrackOrigins && StoreOrigin)
7549 IRB.CreateStore(getOrigin(RetVal), getOriginPtrForRetval());
7550 }
7551 }
7552
7553 void visitPHINode(PHINode &I) {
7554 IRBuilder<> IRB(&I);
7555 if (!PropagateShadow) {
7556 setShadow(&I, getCleanShadow(&I));
7557 setOrigin(&I, getCleanOrigin());
7558 return;
7559 }
7560
7561 ShadowPHINodes.push_back(&I);
7562 setShadow(&I, IRB.CreatePHI(getShadowTy(&I), I.getNumIncomingValues(),
7563 "_msphi_s"));
7564 if (MS.TrackOrigins)
7565 setOrigin(
7566 &I, IRB.CreatePHI(MS.OriginTy, I.getNumIncomingValues(), "_msphi_o"));
7567 }
7568
7569 Value *getLocalVarIdptr(AllocaInst &I) {
7570 ConstantInt *IntConst =
7571 ConstantInt::get(Type::getInt32Ty((*F.getParent()).getContext()), 0);
7572 return new GlobalVariable(*F.getParent(), IntConst->getType(),
7573 /*isConstant=*/false, GlobalValue::PrivateLinkage,
7574 IntConst);
7575 }
7576
7577 Value *getLocalVarDescription(AllocaInst &I) {
7578 return createPrivateConstGlobalForString(*F.getParent(), I.getName());
7579 }
7580
7581 void poisonAllocaUserspace(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7582 if (PoisonStack && ClPoisonStackWithCall) {
7583 IRB.CreateCall(MS.MsanPoisonStackFn, {&I, Len});
7584 } else {
7585 Value *ShadowBase, *OriginBase;
7586 std::tie(ShadowBase, OriginBase) = getShadowOriginPtr(
7587 &I, IRB, IRB.getInt8Ty(), Align(1), /*isStore*/ true);
7588
7589 Value *PoisonValue = IRB.getInt8(PoisonStack ? ClPoisonStackPattern : 0);
7590 IRB.CreateMemSet(ShadowBase, PoisonValue, Len, I.getAlign());
7591 }
7592
7593 if (PoisonStack && MS.TrackOrigins) {
7594 Value *Idptr = getLocalVarIdptr(I);
7595 if (ClPrintStackNames) {
7596 Value *Descr = getLocalVarDescription(I);
7597 IRB.CreateCall(MS.MsanSetAllocaOriginWithDescriptionFn,
7598 {&I, Len, Idptr, Descr});
7599 } else {
7600 IRB.CreateCall(MS.MsanSetAllocaOriginNoDescriptionFn, {&I, Len, Idptr});
7601 }
7602 }
7603 }
7604
7605 void poisonAllocaKmsan(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7606 Value *Descr = getLocalVarDescription(I);
7607 if (PoisonStack) {
7608 IRB.CreateCall(MS.MsanPoisonAllocaFn, {&I, Len, Descr});
7609 } else {
7610 IRB.CreateCall(MS.MsanUnpoisonAllocaFn, {&I, Len});
7611 }
7612 }
7613
7614 void instrumentAlloca(AllocaInst &I, Instruction *InsPoint = nullptr) {
7615 if (!InsPoint)
7616 InsPoint = &I;
7617 NextNodeIRBuilder IRB(InsPoint);
7618 Value *Len = IRB.CreateAllocationSize(MS.IntptrTy, &I);
7619
7620 if (MS.CompileKernel)
7621 poisonAllocaKmsan(I, IRB, Len);
7622 else
7623 poisonAllocaUserspace(I, IRB, Len);
7624 }
7625
7626 void visitAllocaInst(AllocaInst &I) {
7627 setShadow(&I, getCleanShadow(&I));
7628 setOrigin(&I, getCleanOrigin());
7629 // We'll get to this alloca later unless it's poisoned at the corresponding
7630 // llvm.lifetime.start.
7631 AllocaSet.insert(&I);
7632 }
7633
7634 void visitSelectInst(SelectInst &I) {
7635 // a = select b, c, d
7636 Value *B = I.getCondition();
7637 Value *C = I.getTrueValue();
7638 Value *D = I.getFalseValue();
7639
7640 handleSelectLikeInst(I, B, C, D);
7641 }
7642
7643 void handleSelectLikeInst(Instruction &I, Value *B, Value *C, Value *D) {
7644 IRBuilder<> IRB(&I);
7645
7646 Value *Sb = getShadow(B);
7647 Value *Sc = getShadow(C);
7648 Value *Sd = getShadow(D);
7649
7650 Value *Ob = MS.TrackOrigins ? getOrigin(B) : nullptr;
7651 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
7652 Value *Od = MS.TrackOrigins ? getOrigin(D) : nullptr;
7653
7654 // Result shadow if condition shadow is 0.
7655 Value *Sa0 = IRB.CreateSelect(B, Sc, Sd);
7656 Value *Sa1;
7657 if (I.getType()->isAggregateType()) {
7658 // To avoid "sign extending" i1 to an arbitrary aggregate type, we just do
7659 // an extra "select". This results in much more compact IR.
7660 // Sa = select Sb, poisoned, (select b, Sc, Sd)
7661 Sa1 = getPoisonedShadow(getShadowTy(I.getType()));
7662 } else if (isScalableNonVectorType(I.getType())) {
7663 // This is intended to handle target("aarch64.svcount"), which can't be
7664 // handled in the else branch because of incompatibility with CreateXor
7665 // ("The supported LLVM operations on this type are limited to load,
7666 // store, phi, select and alloca instructions").
7667
7668 // TODO: this currently underapproximates. Use Arm SVE EOR in the else
7669 // branch as needed instead.
7670 Sa1 = getCleanShadow(getShadowTy(I.getType()));
7671 } else {
7672 // Sa = select Sb, [ (c^d) | Sc | Sd ], [ b ? Sc : Sd ]
7673 // If Sb (condition is poisoned), look for bits in c and d that are equal
7674 // and both unpoisoned.
7675 // If !Sb (condition is unpoisoned), simply pick one of Sc and Sd.
7676
7677 // Cast arguments to shadow-compatible type.
7678 C = CreateAppToShadowCast(IRB, C);
7679 D = CreateAppToShadowCast(IRB, D);
7680
7681 // Result shadow if condition shadow is 1.
7682 Sa1 = IRB.CreateOr({IRB.CreateXor(C, D), Sc, Sd});
7683 }
7684 Value *Sa = IRB.CreateSelect(Sb, Sa1, Sa0, "_msprop_select");
7685 setShadow(&I, Sa);
7686 if (MS.TrackOrigins) {
7687 // Origins are always i32, so any vector conditions must be flattened.
7688 // FIXME: consider tracking vector origins for app vectors?
7689 if (B->getType()->isVectorTy()) {
7690 B = convertToBool(B, IRB);
7691 Sb = convertToBool(Sb, IRB);
7692 }
7693 // a = select b, c, d
7694 // Oa = Sb ? Ob : (b ? Oc : Od)
7695 setOrigin(&I, IRB.CreateSelect(Sb, Ob, IRB.CreateSelect(B, Oc, Od)));
7696 }
7697 }
7698
7699 void visitLandingPadInst(LandingPadInst &I) {
7700 // Do nothing.
7701 // See https://github.com/google/sanitizers/issues/504
7702 setShadow(&I, getCleanShadow(&I));
7703 setOrigin(&I, getCleanOrigin());
7704 }
7705
7706 void visitCatchSwitchInst(CatchSwitchInst &I) {
7707 setShadow(&I, getCleanShadow(&I));
7708 setOrigin(&I, getCleanOrigin());
7709 }
7710
7711 void visitFuncletPadInst(FuncletPadInst &I) {
7712 setShadow(&I, getCleanShadow(&I));
7713 setOrigin(&I, getCleanOrigin());
7714 }
7715
7716 void visitGetElementPtrInst(GetElementPtrInst &I) { handleShadowOr(I); }
7717
7718 void visitExtractValueInst(ExtractValueInst &I) {
7719 IRBuilder<> IRB(&I);
7720 Value *Agg = I.getAggregateOperand();
7721 LLVM_DEBUG(dbgs() << "ExtractValue: " << I << "\n");
7722 Value *AggShadow = getShadow(Agg);
7723 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7724 Value *ResShadow = IRB.CreateExtractValue(AggShadow, I.getIndices());
7725 LLVM_DEBUG(dbgs() << " ResShadow: " << *ResShadow << "\n");
7726 setShadow(&I, ResShadow);
7727 setOriginForNaryOp(I);
7728 }
7729
7730 void visitInsertValueInst(InsertValueInst &I) {
7731 IRBuilder<> IRB(&I);
7732 LLVM_DEBUG(dbgs() << "InsertValue: " << I << "\n");
7733 Value *AggShadow = getShadow(I.getAggregateOperand());
7734 Value *InsShadow = getShadow(I.getInsertedValueOperand());
7735 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7736 LLVM_DEBUG(dbgs() << " InsShadow: " << *InsShadow << "\n");
7737 Value *Res = IRB.CreateInsertValue(AggShadow, InsShadow, I.getIndices());
7738 LLVM_DEBUG(dbgs() << " Res: " << *Res << "\n");
7739 setShadow(&I, Res);
7740 setOriginForNaryOp(I);
7741 }
7742
7743 void dumpInst(Instruction &I, const Twine &Prefix) {
7744 // Instruction name only
7745 // For intrinsics, the full/overloaded name is used
7746 //
7747 // e.g., "call llvm.aarch64.neon.uqsub.v16i8"
7748 if (CallInst *CI = dyn_cast<CallInst>(&I)) {
7749 errs() << "ZZZ:" << Prefix << " call "
7750 << CI->getCalledFunction()->getName() << "\n";
7751 } else {
7752 errs() << "ZZZ:" << Prefix << " " << I.getOpcodeName() << "\n";
7753 }
7754
7755 // Instruction prototype (including return type and parameter types)
7756 // For intrinsics, we use the base/non-overloaded name
7757 //
7758 // e.g., "call <16 x i8> @llvm.aarch64.neon.uqsub(<16 x i8>, <16 x i8>)"
7759 unsigned NumOperands = I.getNumOperands();
7760 if (CallInst *CI = dyn_cast<CallInst>(&I)) {
7761 errs() << "YYY:" << Prefix << " call " << *I.getType() << " @";
7762
7763 if (IntrinsicInst *II = dyn_cast<IntrinsicInst>(CI))
7764 errs() << Intrinsic::getBaseName(II->getIntrinsicID());
7765 else
7766 errs() << CI->getCalledFunction()->getName();
7767
7768 errs() << "(";
7769
7770 // The last operand of a CallInst is the function itself.
7771 NumOperands--;
7772 } else
7773 errs() << "YYY:" << Prefix << " " << *I.getType() << " "
7774 << I.getOpcodeName() << "(";
7775
7776 for (size_t i = 0; i < NumOperands; i++) {
7777 if (i > 0)
7778 errs() << ", ";
7779
7780 errs() << *(I.getOperand(i)->getType());
7781 }
7782
7783 errs() << ")\n";
7784
7785 // Full instruction, including types and operand values
7786 // For intrinsics, the full/overloaded name is used
7787 //
7788 // e.g., "%vqsubq_v.i15 = call noundef <16 x i8>
7789 // @llvm.aarch64.neon.uqsub.v16i8(<16 x i8> %vext21.i,
7790 // <16 x i8> splat (i8 1)), !dbg !66"
7791 errs() << "QQQ:" << Prefix << " " << I << "\n";
7792 }
7793
7794 void visitResumeInst(ResumeInst &I) {
7795 LLVM_DEBUG(dbgs() << "Resume: " << I << "\n");
7796 // Nothing to do here.
7797 }
7798
7799 void visitCleanupReturnInst(CleanupReturnInst &CRI) {
7800 LLVM_DEBUG(dbgs() << "CleanupReturn: " << CRI << "\n");
7801 // Nothing to do here.
7802 }
7803
7804 void visitCatchReturnInst(CatchReturnInst &CRI) {
7805 LLVM_DEBUG(dbgs() << "CatchReturn: " << CRI << "\n");
7806 // Nothing to do here.
7807 }
7808
7809 void instrumentAsmArgument(Value *Operand, Type *ElemTy, Instruction &I,
7810 IRBuilder<> &IRB, const DataLayout &DL,
7811 bool isOutput) {
7812 // For each assembly argument, we check its value for being initialized.
7813 // If the argument is a pointer, we assume it points to a single element
7814 // of the corresponding type (or to a 8-byte word, if the type is unsized).
7815 // Each such pointer is instrumented with a call to the runtime library.
7816 Type *OpType = Operand->getType();
7817 // Check the operand value itself.
7818 insertCheckShadowOf(Operand, &I);
7819 if (!OpType->isPointerTy() || !isOutput) {
7820 assert(!isOutput);
7821 return;
7822 }
7823 if (!ElemTy->isSized())
7824 return;
7825 auto Size = DL.getTypeStoreSize(ElemTy);
7826 Value *SizeVal = IRB.CreateTypeSize(MS.IntptrTy, Size);
7827 if (MS.CompileKernel) {
7828 IRB.CreateCall(MS.MsanInstrumentAsmStoreFn, {Operand, SizeVal});
7829 } else {
7830 // ElemTy, derived from elementtype(), does not encode the alignment of
7831 // the pointer. Conservatively assume that the shadow memory is unaligned.
7832 // When Size is large, avoid StoreInst as it would expand to many
7833 // instructions.
7834 auto [ShadowPtr, _] =
7835 getShadowOriginPtrUserspace(Operand, IRB, IRB.getInt8Ty(), Align(1));
7836 if (Size <= 32)
7837 IRB.CreateAlignedStore(getCleanShadow(ElemTy), ShadowPtr, Align(1));
7838 else
7839 IRB.CreateMemSet(ShadowPtr, ConstantInt::getNullValue(IRB.getInt8Ty()),
7840 SizeVal, Align(1));
7841 }
7842 }
7843
7844 /// Get the number of output arguments returned by pointers.
7845 int getNumOutputArgs(InlineAsm *IA, CallBase *CB) {
7846 int NumRetOutputs = 0;
7847 int NumOutputs = 0;
7848 Type *RetTy = cast<Value>(CB)->getType();
7849 if (!RetTy->isVoidTy()) {
7850 // Register outputs are returned via the CallInst return value.
7851 auto *ST = dyn_cast<StructType>(RetTy);
7852 if (ST)
7853 NumRetOutputs = ST->getNumElements();
7854 else
7855 NumRetOutputs = 1;
7856 }
7857 InlineAsm::ConstraintInfoVector Constraints = IA->ParseConstraints();
7858 for (const InlineAsm::ConstraintInfo &Info : Constraints) {
7859 switch (Info.Type) {
7861 NumOutputs++;
7862 break;
7863 default:
7864 break;
7865 }
7866 }
7867 return NumOutputs - NumRetOutputs;
7868 }
7869
7870 void visitAsmInstruction(Instruction &I) {
7871 // Conservative inline assembly handling: check for poisoned shadow of
7872 // asm() arguments, then unpoison the result and all the memory locations
7873 // pointed to by those arguments.
7874 // An inline asm() statement in C++ contains lists of input and output
7875 // arguments used by the assembly code. These are mapped to operands of the
7876 // CallInst as follows:
7877 // - nR register outputs ("=r) are returned by value in a single structure
7878 // (SSA value of the CallInst);
7879 // - nO other outputs ("=m" and others) are returned by pointer as first
7880 // nO operands of the CallInst;
7881 // - nI inputs ("r", "m" and others) are passed to CallInst as the
7882 // remaining nI operands.
7883 // The total number of asm() arguments in the source is nR+nO+nI, and the
7884 // corresponding CallInst has nO+nI+1 operands (the last operand is the
7885 // function to be called).
7886 const DataLayout &DL = F.getDataLayout();
7887 CallBase *CB = cast<CallBase>(&I);
7888 IRBuilder<> IRB(&I);
7889 InlineAsm *IA = cast<InlineAsm>(CB->getCalledOperand());
7890 int OutputArgs = getNumOutputArgs(IA, CB);
7891 // The last operand of a CallInst is the function itself.
7892 int NumOperands = CB->getNumOperands() - 1;
7893
7894 // Check input arguments. Doing so before unpoisoning output arguments, so
7895 // that we won't overwrite uninit values before checking them.
7896 for (int i = OutputArgs; i < NumOperands; i++) {
7897 Value *Operand = CB->getOperand(i);
7898 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7899 /*isOutput*/ false);
7900 }
7901 // Unpoison output arguments. This must happen before the actual InlineAsm
7902 // call, so that the shadow for memory published in the asm() statement
7903 // remains valid.
7904 for (int i = 0; i < OutputArgs; i++) {
7905 Value *Operand = CB->getOperand(i);
7906 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7907 /*isOutput*/ true);
7908 }
7909
7910 setShadow(&I, getCleanShadow(&I));
7911 setOrigin(&I, getCleanOrigin());
7912 }
7913
7914 void visitFreezeInst(FreezeInst &I) {
7915 // Freeze always returns a fully defined value.
7916 setShadow(&I, getCleanShadow(&I));
7917 setOrigin(&I, getCleanOrigin());
7918 }
7919
7920 void visitInstruction(Instruction &I) {
7921 // Everything else: stop propagating and check for poisoned shadow.
7923 dumpInst(I, "Strict");
7924 LLVM_DEBUG(dbgs() << "DEFAULT: " << I << "\n");
7925 for (size_t i = 0, n = I.getNumOperands(); i < n; i++) {
7926 Value *Operand = I.getOperand(i);
7927 if (Operand->getType()->isSized())
7928 insertCheckShadowOf(Operand, &I);
7929 }
7930 setShadow(&I, getCleanShadow(&I));
7931 setOrigin(&I, getCleanOrigin());
7932 }
7933};
7934
7935struct VarArgHelperBase : public VarArgHelper {
7936 Function &F;
7937 MemorySanitizer &MS;
7938 MemorySanitizerVisitor &MSV;
7939 SmallVector<CallInst *, 16> VAStartInstrumentationList;
7940 const unsigned VAListTagSize;
7941
7942 VarArgHelperBase(Function &F, MemorySanitizer &MS,
7943 MemorySanitizerVisitor &MSV, unsigned VAListTagSize)
7944 : F(F), MS(MS), MSV(MSV), VAListTagSize(VAListTagSize) {}
7945
7946 Value *getShadowAddrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7947 Value *Base = IRB.CreatePointerCast(MS.VAArgTLS, MS.IntptrTy);
7948 return IRB.CreateAdd(Base, ConstantInt::get(MS.IntptrTy, ArgOffset));
7949 }
7950
7951 /// Compute the shadow address for a given va_arg.
7952 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7953 return IRB.CreatePtrAdd(
7954 MS.VAArgTLS, ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg_va_s");
7955 }
7956
7957 /// Compute the shadow address for a given va_arg.
7958 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset,
7959 unsigned ArgSize) {
7960 // Make sure we don't overflow __msan_va_arg_tls.
7961 if (ArgOffset + ArgSize > kParamTLSSize)
7962 return nullptr;
7963 return getShadowPtrForVAArgument(IRB, ArgOffset);
7964 }
7965
7966 /// Compute the origin address for a given va_arg.
7967 Value *getOriginPtrForVAArgument(IRBuilder<> &IRB, int ArgOffset) {
7968 // getOriginPtrForVAArgument() is always called after
7969 // getShadowPtrForVAArgument(), so __msan_va_arg_origin_tls can never
7970 // overflow.
7971 return IRB.CreatePtrAdd(MS.VAArgOriginTLS,
7972 ConstantInt::get(MS.IntptrTy, ArgOffset),
7973 "_msarg_va_o");
7974 }
7975
7976 void CleanUnusedTLS(IRBuilder<> &IRB, Value *ShadowBase,
7977 unsigned BaseOffset) {
7978 // The tails of __msan_va_arg_tls is not large enough to fit full
7979 // value shadow, but it will be copied to backup anyway. Make it
7980 // clean.
7981 if (BaseOffset >= kParamTLSSize)
7982 return;
7983 Value *TailSize =
7984 ConstantInt::getSigned(IRB.getInt32Ty(), kParamTLSSize - BaseOffset);
7985 IRB.CreateMemSet(ShadowBase, ConstantInt::getNullValue(IRB.getInt8Ty()),
7986 TailSize, Align(8));
7987 }
7988
7989 void unpoisonVAListTagForInst(IntrinsicInst &I) {
7990 IRBuilder<> IRB(&I);
7991 Value *VAListTag = I.getArgOperand(0);
7992 const Align Alignment = Align(8);
7993 auto [ShadowPtr, OriginPtr] = MSV.getShadowOriginPtr(
7994 VAListTag, IRB, IRB.getInt8Ty(), Alignment, /*isStore*/ true);
7995 // Unpoison the whole __va_list_tag.
7996 IRB.CreateMemSet(ShadowPtr, Constant::getNullValue(IRB.getInt8Ty()),
7997 VAListTagSize, Alignment, false);
7998 }
7999
8000 void visitVAStartInst(VAStartInst &I) override {
8001 if (F.getCallingConv() == CallingConv::Win64)
8002 return;
8003 VAStartInstrumentationList.push_back(&I);
8004 unpoisonVAListTagForInst(I);
8005 }
8006
8007 void visitVACopyInst(VACopyInst &I) override {
8008 if (F.getCallingConv() == CallingConv::Win64)
8009 return;
8010 unpoisonVAListTagForInst(I);
8011 }
8012};
8013
8014/// AMD64-specific implementation of VarArgHelper.
8015struct VarArgAMD64Helper : public VarArgHelperBase {
8016 // An unfortunate workaround for asymmetric lowering of va_arg stuff.
8017 // See a comment in visitCallBase for more details.
8018 static const unsigned AMD64GpEndOffset = 48; // AMD64 ABI Draft 0.99.6 p3.5.7
8019 static const unsigned AMD64FpEndOffsetSSE = 176;
8020 // If SSE is disabled, fp_offset in va_list is zero.
8021 static const unsigned AMD64FpEndOffsetNoSSE = AMD64GpEndOffset;
8022
8023 unsigned AMD64FpEndOffset;
8024 AllocaInst *VAArgTLSCopy = nullptr;
8025 AllocaInst *VAArgTLSOriginCopy = nullptr;
8026 Value *VAArgOverflowSize = nullptr;
8027
8028 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
8029
8030 VarArgAMD64Helper(Function &F, MemorySanitizer &MS,
8031 MemorySanitizerVisitor &MSV)
8032 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/24) {
8033 AMD64FpEndOffset = AMD64FpEndOffsetSSE;
8034 for (const auto &Attr : F.getAttributes().getFnAttrs()) {
8035 if (Attr.isStringAttribute() &&
8036 (Attr.getKindAsString() == "target-features")) {
8037 if (Attr.getValueAsString().contains("-sse"))
8038 AMD64FpEndOffset = AMD64FpEndOffsetNoSSE;
8039 break;
8040 }
8041 }
8042 }
8043
8044 ArgKind classifyArgument(Value *arg) {
8045 // A very rough approximation of X86_64 argument classification rules.
8046 Type *T = arg->getType();
8047 if (T->isX86_FP80Ty())
8048 return AK_Memory;
8049 if (T->isFPOrFPVectorTy())
8050 return AK_FloatingPoint;
8051 if (T->isIntegerTy() && T->getPrimitiveSizeInBits() <= 64)
8052 return AK_GeneralPurpose;
8053 if (T->isPointerTy())
8054 return AK_GeneralPurpose;
8055 return AK_Memory;
8056 }
8057
8058 // For VarArg functions, store the argument shadow in an ABI-specific format
8059 // that corresponds to va_list layout.
8060 // We do this because Clang lowers va_arg in the frontend, and this pass
8061 // only sees the low level code that deals with va_list internals.
8062 // A much easier alternative (provided that Clang emits va_arg instructions)
8063 // would have been to associate each live instance of va_list with a copy of
8064 // MSanParamTLS, and extract shadow on va_arg() call in the argument list
8065 // order.
8066 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8067 unsigned GpOffset = 0;
8068 unsigned FpOffset = AMD64GpEndOffset;
8069 unsigned OverflowOffset = AMD64FpEndOffset;
8070 const DataLayout &DL = F.getDataLayout();
8071
8072 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8073 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8074 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8075 if (IsByVal) {
8076 // ByVal arguments always go to the overflow area.
8077 // Fixed arguments passed through the overflow area will be stepped
8078 // over by va_start, so don't count them towards the offset.
8079 if (IsFixed)
8080 continue;
8081 assert(A->getType()->isPointerTy());
8082 Type *RealTy = CB.getParamByValType(ArgNo);
8083 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8084 uint64_t AlignedSize = alignTo(ArgSize, 8);
8085 unsigned BaseOffset = OverflowOffset;
8086 Value *ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
8087 Value *OriginBase = nullptr;
8088 if (MS.TrackOrigins)
8089 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
8090 OverflowOffset += AlignedSize;
8091
8092 if (OverflowOffset > kParamTLSSize) {
8093 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
8094 continue; // We have no space to copy shadow there.
8095 }
8096
8097 Value *ShadowPtr, *OriginPtr;
8098 std::tie(ShadowPtr, OriginPtr) =
8099 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), kShadowTLSAlignment,
8100 /*isStore*/ false);
8101 IRB.CreateMemCpy(ShadowBase, kShadowTLSAlignment, ShadowPtr,
8102 kShadowTLSAlignment, ArgSize);
8103 if (MS.TrackOrigins)
8104 IRB.CreateMemCpy(OriginBase, kShadowTLSAlignment, OriginPtr,
8105 kShadowTLSAlignment, ArgSize);
8106 } else {
8107 ArgKind AK = classifyArgument(A);
8108 if (AK == AK_GeneralPurpose && GpOffset >= AMD64GpEndOffset)
8109 AK = AK_Memory;
8110 if (AK == AK_FloatingPoint && FpOffset >= AMD64FpEndOffset)
8111 AK = AK_Memory;
8112 Value *ShadowBase, *OriginBase = nullptr;
8113 switch (AK) {
8114 case AK_GeneralPurpose:
8115 ShadowBase = getShadowPtrForVAArgument(IRB, GpOffset);
8116 if (MS.TrackOrigins)
8117 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset);
8118 GpOffset += 8;
8119 assert(GpOffset <= kParamTLSSize);
8120 break;
8121 case AK_FloatingPoint:
8122 ShadowBase = getShadowPtrForVAArgument(IRB, FpOffset);
8123 if (MS.TrackOrigins)
8124 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
8125 FpOffset += 16;
8126 assert(FpOffset <= kParamTLSSize);
8127 break;
8128 case AK_Memory:
8129 if (IsFixed)
8130 continue;
8131 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8132 uint64_t AlignedSize = alignTo(ArgSize, 8);
8133 unsigned BaseOffset = OverflowOffset;
8134 ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
8135 if (MS.TrackOrigins) {
8136 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
8137 }
8138 OverflowOffset += AlignedSize;
8139 if (OverflowOffset > kParamTLSSize) {
8140 // We have no space to copy shadow there.
8141 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
8142 continue;
8143 }
8144 }
8145 // Take fixed arguments into account for GpOffset and FpOffset,
8146 // but don't actually store shadows for them.
8147 // TODO(glider): don't call get*PtrForVAArgument() for them.
8148 if (IsFixed)
8149 continue;
8150 Value *Shadow = MSV.getShadow(A);
8151 IRB.CreateAlignedStore(Shadow, ShadowBase, kShadowTLSAlignment);
8152 if (MS.TrackOrigins) {
8153 Value *Origin = MSV.getOrigin(A);
8154 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
8155 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
8157 }
8158 }
8159 }
8160 Constant *OverflowSize =
8161 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AMD64FpEndOffset);
8162 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8163 }
8164
8165 void finalizeInstrumentation() override {
8166 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8167 "finalizeInstrumentation called twice");
8168 if (!VAStartInstrumentationList.empty()) {
8169 // If there is a va_start in this function, make a backup copy of
8170 // va_arg_tls somewhere in the function entry block.
8171 IRBuilder<> IRB(MSV.FnPrologueEnd);
8172 VAArgOverflowSize =
8173 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8174 Value *CopySize = IRB.CreateAdd(
8175 ConstantInt::get(MS.IntptrTy, AMD64FpEndOffset), VAArgOverflowSize);
8176 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8177 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8178 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8179 CopySize, kShadowTLSAlignment, false);
8180
8181 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8182 Intrinsic::umin, CopySize,
8183 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8184 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8185 kShadowTLSAlignment, SrcSize);
8186 if (MS.TrackOrigins) {
8187 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8188 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
8189 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
8190 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
8191 }
8192 }
8193
8194 // Instrument va_start.
8195 // Copy va_list shadow from the backup copy of the TLS contents.
8196 for (CallInst *OrigInst : VAStartInstrumentationList) {
8197 NextNodeIRBuilder IRB(OrigInst);
8198 Value *VAListTag = OrigInst->getArgOperand(0);
8199
8200 Value *RegSaveAreaPtrPtr =
8201 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 16));
8202 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8203 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8204 const Align Alignment = Align(16);
8205 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8206 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8207 Alignment, /*isStore*/ true);
8208 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8209 AMD64FpEndOffset);
8210 if (MS.TrackOrigins)
8211 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
8212 Alignment, AMD64FpEndOffset);
8213 Value *OverflowArgAreaPtrPtr =
8214 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 8));
8215 Value *OverflowArgAreaPtr =
8216 IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
8217 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
8218 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
8219 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
8220 Alignment, /*isStore*/ true);
8221 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
8222 AMD64FpEndOffset);
8223 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
8224 VAArgOverflowSize);
8225 if (MS.TrackOrigins) {
8226 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
8227 AMD64FpEndOffset);
8228 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
8229 VAArgOverflowSize);
8230 }
8231 }
8232 }
8233};
8234
8235/// AArch64-specific implementation of VarArgHelper.
8236struct VarArgAArch64Helper : public VarArgHelperBase {
8237 static const unsigned kAArch64GrArgSize = 64;
8238 static const unsigned kAArch64VrArgSize = 128;
8239
8240 static const unsigned AArch64GrBegOffset = 0;
8241 static const unsigned AArch64GrEndOffset = kAArch64GrArgSize;
8242 // Make VR space aligned to 16 bytes.
8243 static const unsigned AArch64VrBegOffset = AArch64GrEndOffset;
8244 static const unsigned AArch64VrEndOffset =
8245 AArch64VrBegOffset + kAArch64VrArgSize;
8246 static const unsigned AArch64VAEndOffset = AArch64VrEndOffset;
8247
8248 AllocaInst *VAArgTLSCopy = nullptr;
8249 Value *VAArgOverflowSize = nullptr;
8250
8251 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
8252
8253 VarArgAArch64Helper(Function &F, MemorySanitizer &MS,
8254 MemorySanitizerVisitor &MSV)
8255 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/32) {}
8256
8257 // A very rough approximation of aarch64 argument classification rules.
8258 std::pair<ArgKind, uint64_t> classifyArgument(Type *T) {
8259 if (T->isIntOrPtrTy() && T->getPrimitiveSizeInBits() <= 64)
8260 return {AK_GeneralPurpose, 1};
8261 if (T->isFloatingPointTy() && T->getPrimitiveSizeInBits() <= 128)
8262 return {AK_FloatingPoint, 1};
8263
8264 if (T->isArrayTy()) {
8265 auto R = classifyArgument(T->getArrayElementType());
8266 R.second *= T->getScalarType()->getArrayNumElements();
8267 return R;
8268 }
8269
8270 if (const FixedVectorType *FV = dyn_cast<FixedVectorType>(T)) {
8271 auto R = classifyArgument(FV->getScalarType());
8272 R.second *= FV->getNumElements();
8273 return R;
8274 }
8275
8276 LLVM_DEBUG(errs() << "Unknown vararg type: " << *T << "\n");
8277 return {AK_Memory, 0};
8278 }
8279
8280 // The instrumentation stores the argument shadow in a non ABI-specific
8281 // format because it does not know which argument is named (since Clang,
8282 // like x86_64 case, lowers the va_args in the frontend and this pass only
8283 // sees the low level code that deals with va_list internals).
8284 // The first seven GR registers are saved in the first 56 bytes of the
8285 // va_arg tls arra, followed by the first 8 FP/SIMD registers, and then
8286 // the remaining arguments.
8287 // Using constant offset within the va_arg TLS array allows fast copy
8288 // in the finalize instrumentation.
8289 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8290 unsigned GrOffset = AArch64GrBegOffset;
8291 unsigned VrOffset = AArch64VrBegOffset;
8292 unsigned OverflowOffset = AArch64VAEndOffset;
8293
8294 const DataLayout &DL = F.getDataLayout();
8295 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8296 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8297 auto [AK, RegNum] = classifyArgument(A->getType());
8298 if (AK == AK_GeneralPurpose &&
8299 (GrOffset + RegNum * 8) > AArch64GrEndOffset)
8300 AK = AK_Memory;
8301 if (AK == AK_FloatingPoint &&
8302 (VrOffset + RegNum * 16) > AArch64VrEndOffset)
8303 AK = AK_Memory;
8304 Value *Base;
8305 switch (AK) {
8306 case AK_GeneralPurpose:
8307 Base = getShadowPtrForVAArgument(IRB, GrOffset);
8308 GrOffset += 8 * RegNum;
8309 break;
8310 case AK_FloatingPoint:
8311 Base = getShadowPtrForVAArgument(IRB, VrOffset);
8312 VrOffset += 16 * RegNum;
8313 break;
8314 case AK_Memory:
8315 // Don't count fixed arguments in the overflow area - va_start will
8316 // skip right over them.
8317 if (IsFixed)
8318 continue;
8319 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8320 uint64_t AlignedSize = alignTo(ArgSize, 8);
8321 unsigned BaseOffset = OverflowOffset;
8322 Base = getShadowPtrForVAArgument(IRB, BaseOffset);
8323 OverflowOffset += AlignedSize;
8324 if (OverflowOffset > kParamTLSSize) {
8325 // We have no space to copy shadow there.
8326 CleanUnusedTLS(IRB, Base, BaseOffset);
8327 continue;
8328 }
8329 break;
8330 }
8331 // Count Gp/Vr fixed arguments to their respective offsets, but don't
8332 // bother to actually store a shadow.
8333 if (IsFixed)
8334 continue;
8335 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8336 }
8337 Constant *OverflowSize =
8338 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AArch64VAEndOffset);
8339 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8340 }
8341
8342 // Retrieve a va_list field of 'void*' size.
8343 Value *getVAField64(IRBuilder<> &IRB, Value *VAListTag, int offset) {
8344 Value *SaveAreaPtrPtr =
8345 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
8346 return IRB.CreateLoad(Type::getInt64Ty(*MS.C), SaveAreaPtrPtr);
8347 }
8348
8349 // Retrieve a va_list field of 'int' size.
8350 Value *getVAField32(IRBuilder<> &IRB, Value *VAListTag, int offset) {
8351 Value *SaveAreaPtr =
8352 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
8353 Value *SaveArea32 = IRB.CreateLoad(IRB.getInt32Ty(), SaveAreaPtr);
8354 return IRB.CreateSExt(SaveArea32, MS.IntptrTy);
8355 }
8356
8357 void finalizeInstrumentation() override {
8358 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8359 "finalizeInstrumentation called twice");
8360 if (!VAStartInstrumentationList.empty()) {
8361 // If there is a va_start in this function, make a backup copy of
8362 // va_arg_tls somewhere in the function entry block.
8363 IRBuilder<> IRB(MSV.FnPrologueEnd);
8364 VAArgOverflowSize =
8365 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8366 Value *CopySize = IRB.CreateAdd(
8367 ConstantInt::get(MS.IntptrTy, AArch64VAEndOffset), VAArgOverflowSize);
8368 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8369 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8370 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8371 CopySize, kShadowTLSAlignment, false);
8372
8373 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8374 Intrinsic::umin, CopySize,
8375 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8376 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8377 kShadowTLSAlignment, SrcSize);
8378 }
8379
8380 Value *GrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64GrArgSize);
8381 Value *VrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64VrArgSize);
8382
8383 // Instrument va_start, copy va_list shadow from the backup copy of
8384 // the TLS contents.
8385 for (CallInst *OrigInst : VAStartInstrumentationList) {
8386 NextNodeIRBuilder IRB(OrigInst);
8387
8388 Value *VAListTag = OrigInst->getArgOperand(0);
8389
8390 // The variadic ABI for AArch64 creates two areas to save the incoming
8391 // argument registers (one for 64-bit general register xn-x7 and another
8392 // for 128-bit FP/SIMD vn-v7).
8393 // We need then to propagate the shadow arguments on both regions
8394 // 'va::__gr_top + va::__gr_offs' and 'va::__vr_top + va::__vr_offs'.
8395 // The remaining arguments are saved on shadow for 'va::stack'.
8396 // One caveat is it requires only to propagate the non-named arguments,
8397 // however on the call site instrumentation 'all' the arguments are
8398 // saved. So to copy the shadow values from the va_arg TLS array
8399 // we need to adjust the offset for both GR and VR fields based on
8400 // the __{gr,vr}_offs value (since they are stores based on incoming
8401 // named arguments).
8402 Type *RegSaveAreaPtrTy = IRB.getPtrTy();
8403
8404 // Read the stack pointer from the va_list.
8405 Value *StackSaveAreaPtr =
8406 IRB.CreateIntToPtr(getVAField64(IRB, VAListTag, 0), RegSaveAreaPtrTy);
8407
8408 // Read both the __gr_top and __gr_off and add them up.
8409 Value *GrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 8);
8410 Value *GrOffSaveArea = getVAField32(IRB, VAListTag, 24);
8411
8412 Value *GrRegSaveAreaPtr = IRB.CreateIntToPtr(
8413 IRB.CreateAdd(GrTopSaveAreaPtr, GrOffSaveArea), RegSaveAreaPtrTy);
8414
8415 // Read both the __vr_top and __vr_off and add them up.
8416 Value *VrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 16);
8417 Value *VrOffSaveArea = getVAField32(IRB, VAListTag, 28);
8418
8419 Value *VrRegSaveAreaPtr = IRB.CreateIntToPtr(
8420 IRB.CreateAdd(VrTopSaveAreaPtr, VrOffSaveArea), RegSaveAreaPtrTy);
8421
8422 // It does not know how many named arguments is being used and, on the
8423 // callsite all the arguments were saved. Since __gr_off is defined as
8424 // '0 - ((8 - named_gr) * 8)', the idea is to just propagate the variadic
8425 // argument by ignoring the bytes of shadow from named arguments.
8426 Value *GrRegSaveAreaShadowPtrOff =
8427 IRB.CreateAdd(GrArgSize, GrOffSaveArea);
8428
8429 Value *GrRegSaveAreaShadowPtr =
8430 MSV.getShadowOriginPtr(GrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8431 Align(8), /*isStore*/ true)
8432 .first;
8433
8434 Value *GrSrcPtr =
8435 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy, GrRegSaveAreaShadowPtrOff);
8436 Value *GrCopySize = IRB.CreateSub(GrArgSize, GrRegSaveAreaShadowPtrOff);
8437
8438 IRB.CreateMemCpy(GrRegSaveAreaShadowPtr, Align(8), GrSrcPtr, Align(8),
8439 GrCopySize);
8440
8441 // Again, but for FP/SIMD values.
8442 Value *VrRegSaveAreaShadowPtrOff =
8443 IRB.CreateAdd(VrArgSize, VrOffSaveArea);
8444
8445 Value *VrRegSaveAreaShadowPtr =
8446 MSV.getShadowOriginPtr(VrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8447 Align(8), /*isStore*/ true)
8448 .first;
8449
8450 Value *VrSrcPtr = IRB.CreateInBoundsPtrAdd(
8451 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy,
8452 IRB.getInt32(AArch64VrBegOffset)),
8453 VrRegSaveAreaShadowPtrOff);
8454 Value *VrCopySize = IRB.CreateSub(VrArgSize, VrRegSaveAreaShadowPtrOff);
8455
8456 IRB.CreateMemCpy(VrRegSaveAreaShadowPtr, Align(8), VrSrcPtr, Align(8),
8457 VrCopySize);
8458
8459 // And finally for remaining arguments.
8460 Value *StackSaveAreaShadowPtr =
8461 MSV.getShadowOriginPtr(StackSaveAreaPtr, IRB, IRB.getInt8Ty(),
8462 Align(16), /*isStore*/ true)
8463 .first;
8464
8465 Value *StackSrcPtr = IRB.CreateInBoundsPtrAdd(
8466 VAArgTLSCopy, IRB.getInt32(AArch64VAEndOffset));
8467
8468 IRB.CreateMemCpy(StackSaveAreaShadowPtr, Align(16), StackSrcPtr,
8469 Align(16), VAArgOverflowSize);
8470 }
8471 }
8472};
8473
8474/// PowerPC64-specific implementation of VarArgHelper.
8475struct VarArgPowerPC64Helper : public VarArgHelperBase {
8476 AllocaInst *VAArgTLSCopy = nullptr;
8477 Value *VAArgSize = nullptr;
8478
8479 VarArgPowerPC64Helper(Function &F, MemorySanitizer &MS,
8480 MemorySanitizerVisitor &MSV)
8481 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/8) {}
8482
8483 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8484 // For PowerPC, we need to deal with alignment of stack arguments -
8485 // they are mostly aligned to 8 bytes, but vectors and i128 arrays
8486 // are aligned to 16 bytes, byvals can be aligned to 8 or 16 bytes,
8487 // For that reason, we compute current offset from stack pointer (which is
8488 // always properly aligned), and offset for the first vararg, then subtract
8489 // them.
8490 unsigned VAArgBase;
8491 Triple TargetTriple(F.getParent()->getTargetTriple());
8492 // Parameter save area starts at 48 bytes from frame pointer for ABIv1,
8493 // and 32 bytes for ABIv2. This is usually determined by target
8494 // endianness, but in theory could be overridden by function attribute.
8495 if (TargetTriple.isPPC64ELFv2ABI())
8496 VAArgBase = 32;
8497 else
8498 VAArgBase = 48;
8499 unsigned VAArgOffset = VAArgBase;
8500 const DataLayout &DL = F.getDataLayout();
8501 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8502 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8503 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8504 if (IsByVal) {
8505 assert(A->getType()->isPointerTy());
8506 Type *RealTy = CB.getParamByValType(ArgNo);
8507 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8508 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(8));
8509 if (ArgAlign < 8)
8510 ArgAlign = Align(8);
8511 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8512 if (!IsFixed) {
8513 Value *Base =
8514 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8515 if (Base) {
8516 Value *AShadowPtr, *AOriginPtr;
8517 std::tie(AShadowPtr, AOriginPtr) =
8518 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8519 kShadowTLSAlignment, /*isStore*/ false);
8520
8521 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8522 kShadowTLSAlignment, ArgSize);
8523 }
8524 }
8525 VAArgOffset += alignTo(ArgSize, Align(8));
8526 } else {
8527 Value *Base;
8528 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8529 Align ArgAlign = Align(8);
8530 if (A->getType()->isArrayTy()) {
8531 // Arrays are aligned to element size, except for long double
8532 // arrays, which are aligned to 8 bytes.
8533 Type *ElementTy = A->getType()->getArrayElementType();
8534 if (!ElementTy->isPPC_FP128Ty())
8535 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
8536 } else if (A->getType()->isVectorTy()) {
8537 // Vectors are naturally aligned.
8538 ArgAlign = Align(ArgSize);
8539 }
8540 if (ArgAlign < 8)
8541 ArgAlign = Align(8);
8542 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8543 if (DL.isBigEndian()) {
8544 // Adjusting the shadow for argument with size < 8 to match the
8545 // placement of bits in big endian system
8546 if (ArgSize < 8)
8547 VAArgOffset += (8 - ArgSize);
8548 }
8549 if (!IsFixed) {
8550 Base =
8551 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8552 if (Base)
8553 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8554 }
8555 VAArgOffset += ArgSize;
8556 VAArgOffset = alignTo(VAArgOffset, Align(8));
8557 }
8558 if (IsFixed)
8559 VAArgBase = VAArgOffset;
8560 }
8561
8562 Constant *TotalVAArgSize =
8563 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
8564 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8565 // a new class member i.e. it is the total size of all VarArgs.
8566 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8567 }
8568
8569 void finalizeInstrumentation() override {
8570 assert(!VAArgSize && !VAArgTLSCopy &&
8571 "finalizeInstrumentation called twice");
8572 IRBuilder<> IRB(MSV.FnPrologueEnd);
8573 VAArgSize = IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8574 Value *CopySize = VAArgSize;
8575
8576 if (!VAStartInstrumentationList.empty()) {
8577 // If there is a va_start in this function, make a backup copy of
8578 // va_arg_tls somewhere in the function entry block.
8579
8580 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8581 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8582 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8583 CopySize, kShadowTLSAlignment, false);
8584
8585 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8586 Intrinsic::umin, CopySize,
8587 ConstantInt::get(IRB.getInt64Ty(), kParamTLSSize));
8588 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8589 kShadowTLSAlignment, SrcSize);
8590 }
8591
8592 // Instrument va_start.
8593 // Copy va_list shadow from the backup copy of the TLS contents.
8594 for (CallInst *OrigInst : VAStartInstrumentationList) {
8595 NextNodeIRBuilder IRB(OrigInst);
8596 Value *VAListTag = OrigInst->getArgOperand(0);
8597 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8598
8599 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8600
8601 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8602 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8603 const DataLayout &DL = F.getDataLayout();
8604 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8605 const Align Alignment = Align(IntptrSize);
8606 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8607 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8608 Alignment, /*isStore*/ true);
8609 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8610 CopySize);
8611 }
8612 }
8613};
8614
8615/// PowerPC32-specific implementation of VarArgHelper.
8616struct VarArgPowerPC32Helper : public VarArgHelperBase {
8617 AllocaInst *VAArgTLSCopy = nullptr;
8618 Value *VAArgSize = nullptr;
8619
8620 VarArgPowerPC32Helper(Function &F, MemorySanitizer &MS,
8621 MemorySanitizerVisitor &MSV)
8622 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/12) {}
8623
8624 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8625 unsigned VAArgBase;
8626 // Parameter save area is 8 bytes from frame pointer in PPC32
8627 VAArgBase = 8;
8628 unsigned VAArgOffset = VAArgBase;
8629 const DataLayout &DL = F.getDataLayout();
8630 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8631 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8632 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8633 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8634 if (IsByVal) {
8635 assert(A->getType()->isPointerTy());
8636 Type *RealTy = CB.getParamByValType(ArgNo);
8637 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8638 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
8639 if (ArgAlign < IntptrSize)
8640 ArgAlign = Align(IntptrSize);
8641 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8642 if (!IsFixed) {
8643 Value *Base =
8644 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8645 if (Base) {
8646 Value *AShadowPtr, *AOriginPtr;
8647 std::tie(AShadowPtr, AOriginPtr) =
8648 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8649 kShadowTLSAlignment, /*isStore*/ false);
8650
8651 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8652 kShadowTLSAlignment, ArgSize);
8653 }
8654 }
8655 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
8656 } else {
8657 Value *Base;
8658 Type *ArgTy = A->getType();
8659
8660 // On PPC 32 floating point variable arguments are stored in separate
8661 // area: fp_save_area = reg_save_area + 4*8. We do not copy shaodow for
8662 // them as they will be found when checking call arguments.
8663 if (!ArgTy->isFloatingPointTy()) {
8664 uint64_t ArgSize = DL.getTypeAllocSize(ArgTy);
8665 Align ArgAlign = Align(IntptrSize);
8666 if (ArgTy->isArrayTy()) {
8667 // Arrays are aligned to element size, except for long double
8668 // arrays, which are aligned to 8 bytes.
8669 Type *ElementTy = ArgTy->getArrayElementType();
8670 if (!ElementTy->isPPC_FP128Ty())
8671 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
8672 } else if (ArgTy->isVectorTy()) {
8673 // Vectors are naturally aligned.
8674 ArgAlign = Align(ArgSize);
8675 }
8676 if (ArgAlign < IntptrSize)
8677 ArgAlign = Align(IntptrSize);
8678 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8679 if (DL.isBigEndian()) {
8680 // Adjusting the shadow for argument with size < IntptrSize to match
8681 // the placement of bits in big endian system
8682 if (ArgSize < IntptrSize)
8683 VAArgOffset += (IntptrSize - ArgSize);
8684 }
8685 if (!IsFixed) {
8686 Base = getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase,
8687 ArgSize);
8688 if (Base)
8689 IRB.CreateAlignedStore(MSV.getShadow(A), Base,
8691 }
8692 VAArgOffset += ArgSize;
8693 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
8694 }
8695 }
8696 }
8697
8698 Constant *TotalVAArgSize =
8699 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
8700 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8701 // a new class member i.e. it is the total size of all VarArgs.
8702 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8703 }
8704
8705 void finalizeInstrumentation() override {
8706 assert(!VAArgSize && !VAArgTLSCopy &&
8707 "finalizeInstrumentation called twice");
8708 IRBuilder<> IRB(MSV.FnPrologueEnd);
8709 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8710 Value *CopySize = VAArgSize;
8711
8712 if (!VAStartInstrumentationList.empty()) {
8713 // If there is a va_start in this function, make a backup copy of
8714 // va_arg_tls somewhere in the function entry block.
8715
8716 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8717 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8718 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8719 CopySize, kShadowTLSAlignment, false);
8720
8721 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8722 Intrinsic::umin, CopySize,
8723 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8724 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8725 kShadowTLSAlignment, SrcSize);
8726 }
8727
8728 // Instrument va_start.
8729 // Copy va_list shadow from the backup copy of the TLS contents.
8730 for (CallInst *OrigInst : VAStartInstrumentationList) {
8731 NextNodeIRBuilder IRB(OrigInst);
8732 Value *VAListTag = OrigInst->getArgOperand(0);
8733 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8734 Value *RegSaveAreaSize = CopySize;
8735
8736 // In PPC32 va_list_tag is a struct
8737 RegSaveAreaPtrPtr =
8738 IRB.CreateAdd(RegSaveAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 8));
8739
8740 // On PPC 32 reg_save_area can only hold 32 bytes of data
8741 RegSaveAreaSize = IRB.CreateBinaryIntrinsic(
8742 Intrinsic::umin, CopySize, ConstantInt::get(MS.IntptrTy, 32));
8743
8744 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8745 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8746
8747 const DataLayout &DL = F.getDataLayout();
8748 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8749 const Align Alignment = Align(IntptrSize);
8750
8751 { // Copy reg save area
8752 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8753 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8754 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8755 Alignment, /*isStore*/ true);
8756 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy,
8757 Alignment, RegSaveAreaSize);
8758
8759 RegSaveAreaShadowPtr =
8760 IRB.CreatePtrToInt(RegSaveAreaShadowPtr, MS.IntptrTy);
8761 Value *FPSaveArea = IRB.CreateAdd(RegSaveAreaShadowPtr,
8762 ConstantInt::get(MS.IntptrTy, 32));
8763 FPSaveArea = IRB.CreateIntToPtr(FPSaveArea, MS.PtrTy);
8764 // We fill fp shadow with zeroes as uninitialized fp args should have
8765 // been found during call base check
8766 IRB.CreateMemSet(FPSaveArea, ConstantInt::getNullValue(IRB.getInt8Ty()),
8767 ConstantInt::get(MS.IntptrTy, 32), Alignment);
8768 }
8769
8770 { // Copy overflow area
8771 // RegSaveAreaSize is min(CopySize, 32) -> no overflow can occur
8772 Value *OverflowAreaSize = IRB.CreateSub(CopySize, RegSaveAreaSize);
8773
8774 Value *OverflowAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8775 OverflowAreaPtrPtr =
8776 IRB.CreateAdd(OverflowAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 4));
8777 OverflowAreaPtrPtr = IRB.CreateIntToPtr(OverflowAreaPtrPtr, MS.PtrTy);
8778
8779 Value *OverflowAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowAreaPtrPtr);
8780
8781 Value *OverflowAreaShadowPtr, *OverflowAreaOriginPtr;
8782 std::tie(OverflowAreaShadowPtr, OverflowAreaOriginPtr) =
8783 MSV.getShadowOriginPtr(OverflowAreaPtr, IRB, IRB.getInt8Ty(),
8784 Alignment, /*isStore*/ true);
8785
8786 Value *OverflowVAArgTLSCopyPtr =
8787 IRB.CreatePtrToInt(VAArgTLSCopy, MS.IntptrTy);
8788 OverflowVAArgTLSCopyPtr =
8789 IRB.CreateAdd(OverflowVAArgTLSCopyPtr, RegSaveAreaSize);
8790
8791 OverflowVAArgTLSCopyPtr =
8792 IRB.CreateIntToPtr(OverflowVAArgTLSCopyPtr, MS.PtrTy);
8793 IRB.CreateMemCpy(OverflowAreaShadowPtr, Alignment,
8794 OverflowVAArgTLSCopyPtr, Alignment, OverflowAreaSize);
8795 }
8796 }
8797 }
8798};
8799
8800/// SystemZ-specific implementation of VarArgHelper.
8801struct VarArgSystemZHelper : public VarArgHelperBase {
8802 static const unsigned SystemZGpOffset = 16;
8803 static const unsigned SystemZGpEndOffset = 56;
8804 static const unsigned SystemZFpOffset = 128;
8805 static const unsigned SystemZFpEndOffset = 160;
8806 static const unsigned SystemZMaxVrArgs = 8;
8807 static const unsigned SystemZRegSaveAreaSize = 160;
8808 static const unsigned SystemZOverflowOffset = 160;
8809 static const unsigned SystemZVAListTagSize = 32;
8810 static const unsigned SystemZOverflowArgAreaPtrOffset = 16;
8811 static const unsigned SystemZRegSaveAreaPtrOffset = 24;
8812
8813 bool IsSoftFloatABI;
8814 AllocaInst *VAArgTLSCopy = nullptr;
8815 AllocaInst *VAArgTLSOriginCopy = nullptr;
8816 Value *VAArgOverflowSize = nullptr;
8817
8818 enum class ArgKind {
8819 GeneralPurpose,
8820 FloatingPoint,
8821 Vector,
8822 Memory,
8823 Indirect,
8824 };
8825
8826 enum class ShadowExtension { None, Zero, Sign };
8827
8828 VarArgSystemZHelper(Function &F, MemorySanitizer &MS,
8829 MemorySanitizerVisitor &MSV)
8830 : VarArgHelperBase(F, MS, MSV, SystemZVAListTagSize),
8831 IsSoftFloatABI(F.getFnAttribute("use-soft-float").getValueAsBool()) {}
8832
8833 ArgKind classifyArgument(Type *T) {
8834 // T is a SystemZABIInfo::classifyArgumentType() output, and there are
8835 // only a few possibilities of what it can be. In particular, enums, single
8836 // element structs and large types have already been taken care of.
8837
8838 // Some i128 and fp128 arguments are converted to pointers only in the
8839 // back end.
8840 if (T->isIntegerTy(128) || T->isFP128Ty())
8841 return ArgKind::Indirect;
8842 if (T->isFloatingPointTy())
8843 return IsSoftFloatABI ? ArgKind::GeneralPurpose : ArgKind::FloatingPoint;
8844 if (T->isIntegerTy() || T->isPointerTy())
8845 return ArgKind::GeneralPurpose;
8846 if (T->isVectorTy())
8847 return ArgKind::Vector;
8848 return ArgKind::Memory;
8849 }
8850
8851 ShadowExtension getShadowExtension(const CallBase &CB, unsigned ArgNo) {
8852 // ABI says: "One of the simple integer types no more than 64 bits wide.
8853 // ... If such an argument is shorter than 64 bits, replace it by a full
8854 // 64-bit integer representing the same number, using sign or zero
8855 // extension". Shadow for an integer argument has the same type as the
8856 // argument itself, so it can be sign or zero extended as well.
8857 bool ZExt = CB.paramHasAttr(ArgNo, Attribute::ZExt);
8858 bool SExt = CB.paramHasAttr(ArgNo, Attribute::SExt);
8859 if (ZExt) {
8860 assert(!SExt);
8861 return ShadowExtension::Zero;
8862 }
8863 if (SExt) {
8864 assert(!ZExt);
8865 return ShadowExtension::Sign;
8866 }
8867 return ShadowExtension::None;
8868 }
8869
8870 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8871 unsigned GpOffset = SystemZGpOffset;
8872 unsigned FpOffset = SystemZFpOffset;
8873 unsigned VrIndex = 0;
8874 unsigned OverflowOffset = SystemZOverflowOffset;
8875 const DataLayout &DL = F.getDataLayout();
8876 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8877 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8878 // SystemZABIInfo does not produce ByVal parameters.
8879 assert(!CB.paramHasAttr(ArgNo, Attribute::ByVal));
8880 Type *T = A->getType();
8881 ArgKind AK = classifyArgument(T);
8882 if (AK == ArgKind::Indirect) {
8883 T = MS.PtrTy;
8884 AK = ArgKind::GeneralPurpose;
8885 }
8886 if (AK == ArgKind::GeneralPurpose && GpOffset >= SystemZGpEndOffset)
8887 AK = ArgKind::Memory;
8888 if (AK == ArgKind::FloatingPoint && FpOffset >= SystemZFpEndOffset)
8889 AK = ArgKind::Memory;
8890 if (AK == ArgKind::Vector && (VrIndex >= SystemZMaxVrArgs || !IsFixed))
8891 AK = ArgKind::Memory;
8892 Value *ShadowBase = nullptr;
8893 Value *OriginBase = nullptr;
8894 ShadowExtension SE = ShadowExtension::None;
8895 switch (AK) {
8896 case ArgKind::GeneralPurpose: {
8897 // Always keep track of GpOffset, but store shadow only for varargs.
8898 uint64_t ArgSize = 8;
8899 if (GpOffset + ArgSize <= kParamTLSSize) {
8900 if (!IsFixed) {
8901 SE = getShadowExtension(CB, ArgNo);
8902 uint64_t GapSize = 0;
8903 if (SE == ShadowExtension::None) {
8904 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8905 assert(ArgAllocSize <= ArgSize);
8906 GapSize = ArgSize - ArgAllocSize;
8907 }
8908 ShadowBase = getShadowAddrForVAArgument(IRB, GpOffset + GapSize);
8909 if (MS.TrackOrigins)
8910 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset + GapSize);
8911 }
8912 GpOffset += ArgSize;
8913 } else {
8914 GpOffset = kParamTLSSize;
8915 }
8916 break;
8917 }
8918 case ArgKind::FloatingPoint: {
8919 // Always keep track of FpOffset, but store shadow only for varargs.
8920 uint64_t ArgSize = 8;
8921 if (FpOffset + ArgSize <= kParamTLSSize) {
8922 if (!IsFixed) {
8923 // PoP says: "A short floating-point datum requires only the
8924 // left-most 32 bit positions of a floating-point register".
8925 // Therefore, in contrast to AK_GeneralPurpose and AK_Memory,
8926 // don't extend shadow and don't mind the gap.
8927 ShadowBase = getShadowAddrForVAArgument(IRB, FpOffset);
8928 if (MS.TrackOrigins)
8929 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
8930 }
8931 FpOffset += ArgSize;
8932 } else {
8933 FpOffset = kParamTLSSize;
8934 }
8935 break;
8936 }
8937 case ArgKind::Vector: {
8938 // Keep track of VrIndex. No need to store shadow, since vector varargs
8939 // go through AK_Memory.
8940 assert(IsFixed);
8941 VrIndex++;
8942 break;
8943 }
8944 case ArgKind::Memory: {
8945 // Keep track of OverflowOffset and store shadow only for varargs.
8946 // Ignore fixed args, since we need to copy only the vararg portion of
8947 // the overflow area shadow.
8948 if (!IsFixed) {
8949 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8950 uint64_t ArgSize = alignTo(ArgAllocSize, 8);
8951 if (OverflowOffset + ArgSize <= kParamTLSSize) {
8952 SE = getShadowExtension(CB, ArgNo);
8953 uint64_t GapSize =
8954 SE == ShadowExtension::None ? ArgSize - ArgAllocSize : 0;
8955 ShadowBase =
8956 getShadowAddrForVAArgument(IRB, OverflowOffset + GapSize);
8957 if (MS.TrackOrigins)
8958 OriginBase =
8959 getOriginPtrForVAArgument(IRB, OverflowOffset + GapSize);
8960 OverflowOffset += ArgSize;
8961 } else {
8962 OverflowOffset = kParamTLSSize;
8963 }
8964 }
8965 break;
8966 }
8967 case ArgKind::Indirect:
8968 llvm_unreachable("Indirect must be converted to GeneralPurpose");
8969 }
8970 if (ShadowBase == nullptr)
8971 continue;
8972 Value *Shadow = MSV.getShadow(A);
8973 if (SE != ShadowExtension::None)
8974 Shadow = MSV.CreateShadowCast(IRB, Shadow, IRB.getInt64Ty(),
8975 /*Signed*/ SE == ShadowExtension::Sign);
8976 ShadowBase = IRB.CreateIntToPtr(ShadowBase, MS.PtrTy, "_msarg_va_s");
8977 IRB.CreateStore(Shadow, ShadowBase);
8978 if (MS.TrackOrigins) {
8979 Value *Origin = MSV.getOrigin(A);
8980 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
8981 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
8983 }
8984 }
8985 Constant *OverflowSize = ConstantInt::get(
8986 IRB.getInt64Ty(), OverflowOffset - SystemZOverflowOffset);
8987 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8988 }
8989
8990 void copyRegSaveArea(IRBuilder<> &IRB, Value *VAListTag) {
8991 Value *RegSaveAreaPtrPtr = IRB.CreateIntToPtr(
8992 IRB.CreateAdd(
8993 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8994 ConstantInt::get(MS.IntptrTy, SystemZRegSaveAreaPtrOffset)),
8995 MS.PtrTy);
8996 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8997 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8998 const Align Alignment = Align(8);
8999 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
9000 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(), Alignment,
9001 /*isStore*/ true);
9002 // TODO(iii): copy only fragments filled by visitCallBase()
9003 // TODO(iii): support packed-stack && !use-soft-float
9004 // For use-soft-float functions, it is enough to copy just the GPRs.
9005 unsigned RegSaveAreaSize =
9006 IsSoftFloatABI ? SystemZGpEndOffset : SystemZRegSaveAreaSize;
9007 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
9008 RegSaveAreaSize);
9009 if (MS.TrackOrigins)
9010 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
9011 Alignment, RegSaveAreaSize);
9012 }
9013
9014 // FIXME: This implementation limits OverflowOffset to kParamTLSSize, so we
9015 // don't know real overflow size and can't clear shadow beyond kParamTLSSize.
9016 void copyOverflowArea(IRBuilder<> &IRB, Value *VAListTag) {
9017 Value *OverflowArgAreaPtrPtr = IRB.CreateIntToPtr(
9018 IRB.CreateAdd(
9019 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
9020 ConstantInt::get(MS.IntptrTy, SystemZOverflowArgAreaPtrOffset)),
9021 MS.PtrTy);
9022 Value *OverflowArgAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
9023 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
9024 const Align Alignment = Align(8);
9025 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
9026 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
9027 Alignment, /*isStore*/ true);
9028 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
9029 SystemZOverflowOffset);
9030 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
9031 VAArgOverflowSize);
9032 if (MS.TrackOrigins) {
9033 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
9034 SystemZOverflowOffset);
9035 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
9036 VAArgOverflowSize);
9037 }
9038 }
9039
9040 void finalizeInstrumentation() override {
9041 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
9042 "finalizeInstrumentation called twice");
9043 if (!VAStartInstrumentationList.empty()) {
9044 // If there is a va_start in this function, make a backup copy of
9045 // va_arg_tls somewhere in the function entry block.
9046 IRBuilder<> IRB(MSV.FnPrologueEnd);
9047 VAArgOverflowSize =
9048 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
9049 Value *CopySize =
9050 IRB.CreateAdd(ConstantInt::get(MS.IntptrTy, SystemZOverflowOffset),
9051 VAArgOverflowSize);
9052 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
9053 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
9054 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
9055 CopySize, kShadowTLSAlignment, false);
9056
9057 Value *SrcSize = IRB.CreateBinaryIntrinsic(
9058 Intrinsic::umin, CopySize,
9059 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
9060 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
9061 kShadowTLSAlignment, SrcSize);
9062 if (MS.TrackOrigins) {
9063 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
9064 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
9065 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
9066 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
9067 }
9068 }
9069
9070 // Instrument va_start.
9071 // Copy va_list shadow from the backup copy of the TLS contents.
9072 for (CallInst *OrigInst : VAStartInstrumentationList) {
9073 NextNodeIRBuilder IRB(OrigInst);
9074 Value *VAListTag = OrigInst->getArgOperand(0);
9075 copyRegSaveArea(IRB, VAListTag);
9076 copyOverflowArea(IRB, VAListTag);
9077 }
9078 }
9079};
9080
9081/// i386-specific implementation of VarArgHelper.
9082struct VarArgI386Helper : public VarArgHelperBase {
9083 AllocaInst *VAArgTLSCopy = nullptr;
9084 Value *VAArgSize = nullptr;
9085
9086 VarArgI386Helper(Function &F, MemorySanitizer &MS,
9087 MemorySanitizerVisitor &MSV)
9088 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/4) {}
9089
9090 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
9091 const DataLayout &DL = F.getDataLayout();
9092 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
9093 unsigned VAArgOffset = 0;
9094 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
9095 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
9096 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
9097 if (IsByVal) {
9098 assert(A->getType()->isPointerTy());
9099 Type *RealTy = CB.getParamByValType(ArgNo);
9100 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
9101 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
9102 if (ArgAlign < IntptrSize)
9103 ArgAlign = Align(IntptrSize);
9104 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
9105 if (!IsFixed) {
9106 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
9107 if (Base) {
9108 Value *AShadowPtr, *AOriginPtr;
9109 std::tie(AShadowPtr, AOriginPtr) =
9110 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
9111 kShadowTLSAlignment, /*isStore*/ false);
9112
9113 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
9114 kShadowTLSAlignment, ArgSize);
9115 }
9116 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
9117 }
9118 } else {
9119 Value *Base;
9120 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
9121 Align ArgAlign = Align(IntptrSize);
9122 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
9123 if (DL.isBigEndian()) {
9124 // Adjusting the shadow for argument with size < IntptrSize to match
9125 // the placement of bits in big endian system
9126 if (ArgSize < IntptrSize)
9127 VAArgOffset += (IntptrSize - ArgSize);
9128 }
9129 if (!IsFixed) {
9130 Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
9131 if (Base)
9132 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
9133 VAArgOffset += ArgSize;
9134 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
9135 }
9136 }
9137 }
9138
9139 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
9140 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
9141 // a new class member i.e. it is the total size of all VarArgs.
9142 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
9143 }
9144
9145 void finalizeInstrumentation() override {
9146 assert(!VAArgSize && !VAArgTLSCopy &&
9147 "finalizeInstrumentation called twice");
9148 IRBuilder<> IRB(MSV.FnPrologueEnd);
9149 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
9150 Value *CopySize = VAArgSize;
9151
9152 if (!VAStartInstrumentationList.empty()) {
9153 // If there is a va_start in this function, make a backup copy of
9154 // va_arg_tls somewhere in the function entry block.
9155 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
9156 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
9157 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
9158 CopySize, kShadowTLSAlignment, false);
9159
9160 Value *SrcSize = IRB.CreateBinaryIntrinsic(
9161 Intrinsic::umin, CopySize,
9162 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
9163 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
9164 kShadowTLSAlignment, SrcSize);
9165 }
9166
9167 // Instrument va_start.
9168 // Copy va_list shadow from the backup copy of the TLS contents.
9169 for (CallInst *OrigInst : VAStartInstrumentationList) {
9170 NextNodeIRBuilder IRB(OrigInst);
9171 Value *VAListTag = OrigInst->getArgOperand(0);
9172 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
9173 Value *RegSaveAreaPtrPtr =
9174 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
9175 PointerType::get(*MS.C, 0));
9176 Value *RegSaveAreaPtr =
9177 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
9178 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
9179 const DataLayout &DL = F.getDataLayout();
9180 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
9181 const Align Alignment = Align(IntptrSize);
9182 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
9183 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
9184 Alignment, /*isStore*/ true);
9185 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
9186 CopySize);
9187 }
9188 }
9189};
9190
9191/// Implementation of VarArgHelper that is used for ARM32, MIPS, RISCV,
9192/// LoongArch64.
9193struct VarArgGenericHelper : public VarArgHelperBase {
9194 AllocaInst *VAArgTLSCopy = nullptr;
9195 Value *VAArgSize = nullptr;
9196
9197 VarArgGenericHelper(Function &F, MemorySanitizer &MS,
9198 MemorySanitizerVisitor &MSV, const unsigned VAListTagSize)
9199 : VarArgHelperBase(F, MS, MSV, VAListTagSize) {}
9200
9201 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
9202 unsigned VAArgOffset = 0;
9203 const DataLayout &DL = F.getDataLayout();
9204 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
9205 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
9206 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
9207 if (IsFixed)
9208 continue;
9209 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
9210 if (DL.isBigEndian()) {
9211 // Adjusting the shadow for argument with size < IntptrSize to match the
9212 // placement of bits in big endian system
9213 if (ArgSize < IntptrSize)
9214 VAArgOffset += (IntptrSize - ArgSize);
9215 }
9216 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
9217 VAArgOffset += ArgSize;
9218 VAArgOffset = alignTo(VAArgOffset, IntptrSize);
9219 if (!Base)
9220 continue;
9221 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
9222 }
9223
9224 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
9225 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
9226 // a new class member i.e. it is the total size of all VarArgs.
9227 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
9228 }
9229
9230 void finalizeInstrumentation() override {
9231 assert(!VAArgSize && !VAArgTLSCopy &&
9232 "finalizeInstrumentation called twice");
9233 IRBuilder<> IRB(MSV.FnPrologueEnd);
9234 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
9235 Value *CopySize = VAArgSize;
9236
9237 if (!VAStartInstrumentationList.empty()) {
9238 // If there is a va_start in this function, make a backup copy of
9239 // va_arg_tls somewhere in the function entry block.
9240 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
9241 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
9242 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
9243 CopySize, kShadowTLSAlignment, false);
9244
9245 Value *SrcSize = IRB.CreateBinaryIntrinsic(
9246 Intrinsic::umin, CopySize,
9247 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
9248 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
9249 kShadowTLSAlignment, SrcSize);
9250 }
9251
9252 // Instrument va_start.
9253 // Copy va_list shadow from the backup copy of the TLS contents.
9254 for (CallInst *OrigInst : VAStartInstrumentationList) {
9255 NextNodeIRBuilder IRB(OrigInst);
9256 Value *VAListTag = OrigInst->getArgOperand(0);
9257 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
9258 Value *RegSaveAreaPtrPtr =
9259 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
9260 PointerType::get(*MS.C, 0));
9261 Value *RegSaveAreaPtr =
9262 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
9263 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
9264 const DataLayout &DL = F.getDataLayout();
9265 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
9266 const Align Alignment = Align(IntptrSize);
9267 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
9268 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
9269 Alignment, /*isStore*/ true);
9270 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
9271 CopySize);
9272 }
9273 }
9274};
9275
9276// ARM32, Loongarch64, MIPS and RISCV share the same calling conventions
9277// regarding VAArgs.
9278using VarArgARM32Helper = VarArgGenericHelper;
9279using VarArgRISCVHelper = VarArgGenericHelper;
9280using VarArgMIPSHelper = VarArgGenericHelper;
9281using VarArgLoongArch64Helper = VarArgGenericHelper;
9282using VarArgHexagonHelper = VarArgGenericHelper;
9283
9284/// A no-op implementation of VarArgHelper.
9285struct VarArgNoOpHelper : public VarArgHelper {
9286 VarArgNoOpHelper(Function &F, MemorySanitizer &MS,
9287 MemorySanitizerVisitor &MSV) {}
9288
9289 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {}
9290
9291 void visitVAStartInst(VAStartInst &I) override {}
9292
9293 void visitVACopyInst(VACopyInst &I) override {}
9294
9295 void finalizeInstrumentation() override {}
9296};
9297
9298} // end anonymous namespace
9299
9300static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
9301 MemorySanitizerVisitor &Visitor) {
9302 // VarArg handling is only implemented on AMD64. False positives are possible
9303 // on other platforms.
9304 Triple TargetTriple(Func.getParent()->getTargetTriple());
9305
9306 if (TargetTriple.getArch() == Triple::x86)
9307 return new VarArgI386Helper(Func, Msan, Visitor);
9308
9309 if (TargetTriple.getArch() == Triple::x86_64)
9310 return new VarArgAMD64Helper(Func, Msan, Visitor);
9311
9312 if (TargetTriple.isARM())
9313 return new VarArgARM32Helper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9314
9315 if (TargetTriple.isAArch64())
9316 return new VarArgAArch64Helper(Func, Msan, Visitor);
9317
9318 if (TargetTriple.isSystemZ())
9319 return new VarArgSystemZHelper(Func, Msan, Visitor);
9320
9321 // On PowerPC32 VAListTag is a struct
9322 // {char, char, i16 padding, char *, char *}
9323 if (TargetTriple.isPPC32())
9324 return new VarArgPowerPC32Helper(Func, Msan, Visitor);
9325
9326 if (TargetTriple.isPPC64())
9327 return new VarArgPowerPC64Helper(Func, Msan, Visitor);
9328
9329 if (TargetTriple.isRISCV32())
9330 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9331
9332 if (TargetTriple.isRISCV64())
9333 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
9334
9335 if (TargetTriple.isMIPS32())
9336 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
9337
9338 if (TargetTriple.isMIPS64())
9339 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
9340
9341 if (TargetTriple.isLoongArch64())
9342 return new VarArgLoongArch64Helper(Func, Msan, Visitor,
9343 /*VAListTagSize=*/8);
9344
9345 if (TargetTriple.getArch() == Triple::hexagon)
9346 return new VarArgHexagonHelper(Func, Msan, Visitor, /*VAListTagSize=*/12);
9347
9348 return new VarArgNoOpHelper(Func, Msan, Visitor);
9349}
9350
9351bool MemorySanitizer::sanitizeFunction(Function &F, TargetLibraryInfo &TLI) {
9352 if (!CompileKernel && F.getName() == kMsanModuleCtorName)
9353 return false;
9354
9355 if (F.hasFnAttribute(Attribute::DisableSanitizerInstrumentation))
9356 return false;
9357
9358 MemorySanitizerVisitor Visitor(F, *this, TLI);
9359
9360 // Clear out memory attributes.
9362 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
9363 F.removeFnAttrs(B);
9364
9365 return Visitor.runOnFunction();
9366}
#define Success
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
constexpr LLT S1
AMDGPU Uniform Intrinsic Combine
This file implements a class to represent arbitrary precision integral constant values and operations...
static bool isStore(int Opcode)
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
static cl::opt< ITMode > IT(cl::desc("IT block support"), cl::Hidden, cl::init(DefaultIT), cl::values(clEnumValN(DefaultIT, "arm-default-it", "Generate any type of IT block"), clEnumValN(RestrictedIT, "arm-restrict-it", "Disallow complex IT blocks")))
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClWithComdat("asan-with-comdat", cl::desc("Place ASan constructors in comdat sections"), cl::Hidden, cl::init(true))
VarLocInsertPt getNextNode(const DbgRecord *DVR)
Atomic ordering constants.
This file contains the simple types necessary to represent the attributes associated with functions a...
#define X(NUM, ENUM, NAME)
Definition ELF.h:849
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< StatepointGC > D("statepoint-example", "an example strategy for statepoint")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
This file contains the declarations for the subclasses of Constant, which represent the different fla...
const MemoryMapParams Linux_LoongArch64_MemoryMapParams
const MemoryMapParams Linux_X86_64_MemoryMapParams
static cl::opt< int > ClTrackOrigins("dfsan-track-origins", cl::desc("Track origins of labels"), cl::Hidden, cl::init(0))
static AtomicOrdering addReleaseOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_S390X_MemoryMapParams
static AtomicOrdering addAcquireOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_AArch64_MemoryMapParams
static bool isAMustTailRetVal(Value *RetVal)
This file provides an implementation of debug counters.
#define DEBUG_COUNTER(VARNAME, COUNTERNAME, DESC)
This file defines the DenseMap class.
This file builds on the ADT/GraphTraits.h file to build generic depth first graph iterator.
@ Default
static bool runOnFunction(Function &F, bool PostInlining)
This is the interface for a simple mod/ref and alias analysis over globals.
static size_t TypeSizeToSizeIndex(uint32_t TypeSize)
#define op(i)
Hexagon Common GEP
#define _
Module.h This file contains the declarations for the Module class.
static LVOptions Options
Definition LVOptions.cpp:25
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
Machine Check Debug Module
static const PlatformMemoryMapParams Linux_S390_MemoryMapParams
static const Align kMinOriginAlignment
static cl::opt< uint64_t > ClShadowBase("msan-shadow-base", cl::desc("Define custom MSan ShadowBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClPoisonUndef("msan-poison-undef", cl::desc("Poison fully undef temporary values. " "Partially undefined constant vectors " "are unaffected by this flag (see " "-msan-poison-undef-vectors)."), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_X86_MemoryMapParams
static cl::opt< uint64_t > ClOriginBase("msan-origin-base", cl::desc("Define custom MSan OriginBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClCheckConstantShadow("msan-check-constant-shadow", cl::desc("Insert checks for constant shadow values"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams
static const MemoryMapParams NetBSD_X86_64_MemoryMapParams
static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams
static const unsigned kOriginSize
static cl::opt< bool > ClWithComdat("msan-with-comdat", cl::desc("Place MSan constructors in comdat sections"), cl::Hidden, cl::init(false))
static cl::opt< int > ClTrackOrigins("msan-track-origins", cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden, cl::init(0))
Track origins of uninitialized values.
static cl::opt< int > ClInstrumentationWithCallThreshold("msan-instrumentation-with-call-threshold", cl::desc("If the function being instrumented requires more than " "this number of checks and origin stores, use callbacks instead of " "inline checks (-1 means never use callbacks)."), cl::Hidden, cl::init(3500))
static cl::opt< int > ClPoisonStackPattern("msan-poison-stack-pattern", cl::desc("poison uninitialized stack variables with the given pattern"), cl::Hidden, cl::init(0xff))
static const Align kShadowTLSAlignment
static cl::opt< bool > ClHandleICmpExact("msan-handle-icmp-exact", cl::desc("exact handling of relational integer ICmp"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams
static cl::opt< bool > ClDumpStrictInstructions("msan-dump-strict-instructions", cl::desc("print out instructions with default strict semantics i.e.," "check that all the inputs are fully initialized, and mark " "the output as fully initialized. These semantics are applied " "to instructions that could not be handled explicitly nor " "heuristically."), cl::Hidden, cl::init(false))
static Constant * getOrInsertGlobal(Module &M, StringRef Name, Type *Ty)
static cl::opt< bool > ClPreciseDisjointOr("msan-precise-disjoint-or", cl::desc("Precisely poison disjoint OR. If false (legacy behavior), " "disjointedness is ignored (i.e., 1|1 is initialized)."), cl::Hidden, cl::init(false))
static const PlatformMemoryMapParams Linux_Hexagon_MemoryMapParams_P
static cl::opt< bool > ClPoisonStack("msan-poison-stack", cl::desc("poison uninitialized stack variables"), cl::Hidden, cl::init(true))
static const MemoryMapParams Linux_I386_MemoryMapParams
const char kMsanInitName[]
static cl::opt< bool > ClPoisonUndefVectors("msan-poison-undef-vectors", cl::desc("Precisely poison partially undefined constant vectors. " "If false (legacy behavior), the entire vector is " "considered fully initialized, which may lead to false " "negatives. Fully undefined constant vectors are " "unaffected by this flag (see -msan-poison-undef)."), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPrintStackNames("msan-print-stack-names", cl::desc("Print name of local stack variable"), cl::Hidden, cl::init(true))
OddOrEvenLanes
@ kOddLanes
@ kEvenLanes
@ kBothLanes
static cl::opt< uint64_t > ClAndMask("msan-and-mask", cl::desc("Define custom MSan AndMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleLifetimeIntrinsics("msan-handle-lifetime-intrinsics", cl::desc("when possible, poison scoped variables at the beginning of the scope " "(slower, but more precise)"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClKeepGoing("msan-keep-going", cl::desc("keep going after reporting a UMR"), cl::Hidden, cl::init(false))
static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams
static GlobalVariable * createPrivateConstGlobalForString(Module &M, StringRef Str)
Create a non-const global initialized with the given string.
static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClEagerChecks("msan-eager-checks", cl::desc("check arguments and return values at function call boundaries"), cl::Hidden, cl::init(false))
static cl::opt< int > ClDisambiguateWarning("msan-disambiguate-warning-threshold", cl::desc("Define threshold for number of checks per " "debug location to force origin update."), cl::Hidden, cl::init(3))
static VarArgHelper * CreateVarArgHelper(Function &Func, MemorySanitizer &Msan, MemorySanitizerVisitor &Visitor)
static const MemoryMapParams Linux_MIPS64_MemoryMapParams
static const MemoryMapParams Linux_PowerPC64_MemoryMapParams
static cl::opt< int > ClSwitchPrecision("msan-switch-precision", cl::desc("Controls the number of cases considered by MSan for LLVM switch " "instructions. 0 means no UUMs detected. Higher values lead to " "fewer false negatives but may impact compiler and/or " "application performance. N.B. LLVM switch instructions do not " "correspond exactly to C++ switch statements."), cl::Hidden, cl::init(99))
static cl::opt< uint64_t > ClXorMask("msan-xor-mask", cl::desc("Define custom MSan XorMask"), cl::Hidden, cl::init(0))
static const MemoryMapParams Linux_Hexagon_MemoryMapParams
static cl::opt< bool > ClHandleAsmConservative("msan-handle-asm-conservative", cl::desc("conservative handling of inline assembly"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams
static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams
static const unsigned kParamTLSSize
static cl::opt< bool > ClHandleICmp("msan-handle-icmp", cl::desc("propagate shadow through ICmpEQ and ICmpNE"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClEnableKmsan("msan-kernel", cl::desc("Enable KernelMemorySanitizer instrumentation"), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPoisonStackWithCall("msan-poison-stack-with-call", cl::desc("poison uninitialized stack variables with a call"), cl::Hidden, cl::init(false))
static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams
static cl::opt< bool > ClDumpHeuristicInstructions("msan-dump-heuristic-instructions", cl::desc("Prints 'unknown' instructions that were handled heuristically. " "Use -msan-dump-strict-instructions to print instructions that " "could not be handled explicitly nor heuristically."), cl::Hidden, cl::init(false))
static const unsigned kRetvalTLSSize
static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams
const char kMsanModuleCtorName[]
static const MemoryMapParams FreeBSD_I386_MemoryMapParams
static cl::opt< bool > ClCheckAccessAddress("msan-check-access-address", cl::desc("report accesses through a pointer which has poisoned shadow"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClDisableChecks("msan-disable-checks", cl::desc("Apply no_sanitize to the whole file"), cl::Hidden, cl::init(false))
#define T
uint64_t IntrinsicInst * II
FunctionAnalysisManager FAM
if(PassOpts->AAPipeline)
const SmallVectorImpl< MachineOperand > & Cond
static const char * name
static void visit(BasicBlock &Start, std::function< bool(BasicBlock *)> op)
This file implements a set that has insertion order iteration characteristics.
This file defines the SmallPtrSet class.
This file defines the SmallVector class.
This file contains some functions that are useful when dealing with strings.
#define LLVM_DEBUG(...)
Definition Debug.h:114
static SymbolRef::Type getType(const Symbol *Sym)
Definition TapiFile.cpp:39
Value * RHS
Value * LHS
static APInt getSignedMinValue(unsigned numBits)
Gets minimum signed value of APInt for a specific bit width.
Definition APInt.h:220
void setAlignment(Align Align)
PassT::Result & getResult(IRUnitT &IR, ExtraArgTs... ExtraArgs)
Get the result of an analysis pass for a given IR unit.
const T & front() const
front - Get the first element.
Definition ArrayRef.h:145
static LLVM_ABI ArrayType * get(Type *ElementType, uint64_t NumElements)
This static method is the primary way to construct an ArrayType.
This class stores enough information to efficiently remove some attributes from an existing AttrBuild...
AttributeMask & addAttribute(Attribute::AttrKind Val)
Add an attribute to the mask.
iterator end()
Definition BasicBlock.h:474
LLVM_ABI const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
LLVM_ABI const BasicBlock * getSinglePredecessor() const
Return the predecessor of this block if it has a single predecessor block.
InstListType::iterator iterator
Instruction iterators...
Definition BasicBlock.h:170
bool isInlineAsm() const
Check if this call is an inline asm statement.
Function * getCalledFunction() const
Returns the function called, or null if this is an indirect function invocation or the function signa...
bool hasRetAttr(Attribute::AttrKind Kind) const
Determine whether the return value has the given attribute.
LLVM_ABI bool paramHasAttr(unsigned ArgNo, Attribute::AttrKind Kind) const
Determine whether the argument or parameter has the given attribute.
void removeFnAttrs(const AttributeMask &AttrsToRemove)
Removes the attributes from the function.
void setCannotMerge()
MaybeAlign getParamAlign(unsigned ArgNo) const
Extract the alignment for a call or parameter (0=unknown).
Type * getParamByValType(unsigned ArgNo) const
Extract the byval type for a call or parameter.
Value * getCalledOperand() const
Type * getParamElementType(unsigned ArgNo) const
Extract the elementtype type for a parameter.
Value * getArgOperand(unsigned i) const
void setArgOperand(unsigned i, Value *v)
FunctionType * getFunctionType() const
iterator_range< User::op_iterator > args()
Iteration adapter for range-for loops.
void addParamAttr(unsigned ArgNo, Attribute::AttrKind Kind)
Adds the attribute to the indicated argument.
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:676
@ ICMP_SLT
signed less than
Definition InstrTypes.h:705
@ ICMP_SLE
signed less or equal
Definition InstrTypes.h:706
@ ICMP_SGT
signed greater than
Definition InstrTypes.h:703
@ ICMP_SGE
signed greater or equal
Definition InstrTypes.h:704
static LLVM_ABI Constant * get(ArrayType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getString(LLVMContext &Context, StringRef Initializer, bool AddNull=true, bool ByteString=false)
This method constructs a CDS and initializes it with a text string.
static LLVM_ABI Constant * get(LLVMContext &Context, ArrayRef< uint8_t > Elts)
get() constructors - Return a constant with vector type with an element count and element type matchi...
static ConstantInt * getSigned(IntegerType *Ty, int64_t V, bool ImplicitTrunc=false)
Return a ConstantInt with the specified value for the specified type.
Definition Constants.h:135
static LLVM_ABI ConstantInt * getBool(LLVMContext &Context, bool V)
static LLVM_ABI Constant * get(StructType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getSplat(ElementCount EC, Constant *Elt)
Return a ConstantVector with the specified constant in each element.
static LLVM_ABI Constant * get(ArrayRef< Constant * > V)
This is an important base class in LLVM.
Definition Constant.h:43
static LLVM_ABI Constant * getAllOnesValue(Type *Ty)
LLVM_ABI bool isAllOnesValue() const
Return true if this is the value that would be returned by getAllOnesValue.
Definition Constants.cpp:95
static LLVM_ABI Constant * getNullValue(Type *Ty)
Constructor to create a '0' constant of arbitrary type.
LLVM_ABI Constant * getAggregateElement(unsigned Elt) const
For aggregates (struct/array/vector) return the constant that corresponds to the specified element if...
LLVM_ABI bool isNullValue() const
Return true if this is the value that would be returned by getNullValue.
Definition Constants.cpp:74
static bool shouldExecute(CounterInfo &Counter)
bool empty() const
Definition DenseMap.h:109
unsigned getNumElements() const
static LLVM_ABI FixedVectorType * get(Type *ElementType, unsigned NumElts)
Definition Type.cpp:873
static FixedVectorType * getHalfElementsVectorType(FixedVectorType *VTy)
A handy container for a FunctionType+Callee-pointer pair, which can be passed around as a single enti...
unsigned getNumParams() const
Return the number of fixed parameters this function type requires.
LLVM_ABI void setComdat(Comdat *C)
Definition Globals.cpp:217
@ PrivateLinkage
Like Internal, but omit from symbol table.
Definition GlobalValue.h:61
@ ExternalLinkage
Externally visible function.
Definition GlobalValue.h:53
Analysis pass providing a never-invalidated alias analysis result.
ConstantInt * getInt1(bool V)
Get a constant value representing either true or false.
Definition IRBuilder.h:497
Value * CreateInsertElement(Type *VecTy, Value *NewElt, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2584
Value * CreateConstGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition IRBuilder.h:1980
AllocaInst * CreateAlloca(Type *Ty, unsigned AddrSpace, Value *ArraySize=nullptr, const Twine &Name="")
Definition IRBuilder.h:1860
IntegerType * getInt1Ty()
Fetch the type representing a single bit.
Definition IRBuilder.h:564
LLVM_ABI CallInst * CreateMaskedCompressStore(Value *Val, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr)
Create a call to Masked Compress Store intrinsic.
Value * CreateInsertValue(Value *Agg, Value *Val, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2638
LLVM_ABI Value * CreateAllocationSize(Type *DestTy, AllocaInst *AI)
Get allocation size of an alloca as a runtime Value* (handles both static and dynamic allocas and vsc...
Value * CreateExtractElement(Value *Vec, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2572
IntegerType * getIntNTy(unsigned N)
Fetch the type representing an N-bit integer.
Definition IRBuilder.h:592
LoadInst * CreateAlignedLoad(Type *Ty, Value *Ptr, MaybeAlign Align, const char *Name)
Definition IRBuilder.h:1894
CallInst * CreateMemCpy(Value *Dst, MaybeAlign DstAlign, Value *Src, MaybeAlign SrcAlign, uint64_t Size, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memcpy between the specified pointers.
Definition IRBuilder.h:710
LLVM_ABI CallInst * CreateAndReduce(Value *Src)
Create a vector int AND reduction intrinsic of the source vector.
Value * CreatePointerCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2246
Value * CreateExtractValue(Value *Agg, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2631
LLVM_ABI CallInst * CreateMaskedLoad(Type *Ty, Value *Ptr, Align Alignment, Value *Mask, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Load intrinsic.
LLVM_ABI Value * CreateSelect(Value *C, Value *True, Value *False, const Twine &Name="", Instruction *MDFrom=nullptr)
BasicBlock::iterator GetInsertPoint() const
Definition IRBuilder.h:202
Value * CreateSExt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2089
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2194
Value * CreateLShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1539
IntegerType * getInt32Ty()
Fetch the type representing a 32-bit integer.
Definition IRBuilder.h:579
ConstantInt * getInt8(uint8_t C)
Get a constant 8-bit value.
Definition IRBuilder.h:512
Value * CreatePtrAdd(Value *Ptr, Value *Offset, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:2048
IntegerType * getInt64Ty()
Fetch the type representing a 64-bit integer.
Definition IRBuilder.h:584
Value * CreateUDiv(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1480
Value * CreateICmpNE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2335
Value * CreateGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:1967
Value * CreateNeg(Value *V, const Twine &Name="", bool HasNSW=false)
Definition IRBuilder.h:1811
LLVM_ABI CallInst * CreateOrReduce(Value *Src)
Create a vector int OR reduction intrinsic of the source vector.
LLVM_ABI Value * CreateBinaryIntrinsic(Intrinsic::ID ID, Value *LHS, Value *RHS, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with 2 operands which is mangled on the first type.
LLVM_ABI CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > Types, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with Args, mangled using Types.
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition IRBuilder.h:522
PHINode * CreatePHI(Type *Ty, unsigned NumReservedValues, const Twine &Name="")
Definition IRBuilder.h:2496
Value * CreateNot(Value *V, const Twine &Name="")
Definition IRBuilder.h:1835
Value * CreateICmpEQ(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2331
LLVM_ABI DebugLoc getCurrentDebugLocation() const
Get location information used by debugging information.
Definition IRBuilder.cpp:64
Value * CreateSub(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1446
Value * CreateBitCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2199
ConstantInt * getIntN(unsigned N, uint64_t C)
Get a constant N-bit value, zero extended from a 64-bit value.
Definition IRBuilder.h:532
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition IRBuilder.h:1877
Value * CreateShl(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1518
CallInst * CreateMemSet(Value *Ptr, Value *Val, uint64_t Size, MaybeAlign Align, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memset to the specified pointer and the specified value.
Definition IRBuilder.h:653
Value * CreateZExt(Value *V, Type *DestTy, const Twine &Name="", bool IsNonNeg=false)
Definition IRBuilder.h:2077
Value * CreateShuffleVector(Value *V1, Value *V2, Value *Mask, const Twine &Name="")
Definition IRBuilder.h:2606
LLVMContext & getContext() const
Definition IRBuilder.h:203
Value * CreateAnd(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1577
StoreInst * CreateStore(Value *Val, Value *Ptr, bool isVolatile=false)
Definition IRBuilder.h:1890
LLVM_ABI CallInst * CreateMaskedStore(Value *Val, Value *Ptr, Align Alignment, Value *Mask)
Create a call to Masked Store intrinsic.
Value * CreateAdd(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1429
Value * CreatePtrToInt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2189
Value * CreateIsNotNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg != 0.
Definition IRBuilder.h:2664
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:2510
Value * CreateTrunc(Value *V, Type *DestTy, const Twine &Name="", bool IsNUW=false, bool IsNSW=false)
Definition IRBuilder.h:2063
PointerType * getPtrTy(unsigned AddrSpace=0)
Fetch the type representing a pointer.
Definition IRBuilder.h:622
Value * CreateBinOp(Instruction::BinaryOps Opc, Value *LHS, Value *RHS, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:1734
Value * CreateICmpSLT(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2363
LLVM_ABI Value * CreateTypeSize(Type *Ty, TypeSize Size)
Create an expression which evaluates to the number of units in Size at runtime.
Value * CreateICmpUGE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2343
Value * CreateIntCast(Value *V, Type *DestTy, bool isSigned, const Twine &Name="")
Definition IRBuilder.h:2272
Value * CreateIsNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg == 0.
Definition IRBuilder.h:2659
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition IRBuilder.h:207
Type * getVoidTy()
Fetch the type representing void.
Definition IRBuilder.h:617
StoreInst * CreateAlignedStore(Value *Val, Value *Ptr, MaybeAlign Align, bool isVolatile=false)
Definition IRBuilder.h:1913
LLVM_ABI CallInst * CreateMaskedExpandLoad(Type *Ty, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Expand Load intrinsic.
Value * CreateInBoundsPtrAdd(Value *Ptr, Value *Offset, const Twine &Name="")
Definition IRBuilder.h:2053
Value * CreateAShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1558
Value * CreateXor(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1625
Value * CreateICmp(CmpInst::Predicate P, Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2441
Value * CreateOr(Value *LHS, Value *RHS, const Twine &Name="", bool IsDisjoint=false)
Definition IRBuilder.h:1599
IntegerType * getInt8Ty()
Fetch the type representing an 8-bit integer.
Definition IRBuilder.h:569
Value * CreateMul(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1463
LLVM_ABI CallInst * CreateMaskedScatter(Value *Val, Value *Ptrs, Align Alignment, Value *Mask=nullptr)
Create a call to Masked Scatter intrinsic.
LLVM_ABI CallInst * CreateMaskedGather(Type *Ty, Value *Ptrs, Align Alignment, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Gather intrinsic.
Value * CreateFCmpULT(Value *LHS, Value *RHS, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:2426
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition IRBuilder.h:2811
std::vector< ConstraintInfo > ConstraintInfoVector
Definition InlineAsm.h:123
void visit(Iterator Start, Iterator End)
Definition InstVisitor.h:87
const DebugLoc & getDebugLoc() const
Return the debug location for this node as a DebugLoc.
LLVM_ABI InstListType::iterator eraseFromParent()
This method unlinks 'this' from the containing basic block and deletes it.
MDNode * getMetadata(unsigned KindID) const
Get the metadata of given kind attached to this Instruction.
LLVM_ABI bool comesBefore(const Instruction *Other) const
Given an instruction Other in the same basic block as this instruction, return true if this instructi...
static LLVM_ABI IntegerType * get(LLVMContext &C, unsigned NumBits)
This static method is the primary way of constructing an IntegerType.
Definition Type.cpp:354
LLVM_ABI MDNode * createUnlikelyBranchWeights()
Return metadata containing two branch weights, with significant bias towards false destination.
Definition MDBuilder.cpp:48
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
void addIncoming(Value *V, BasicBlock *BB)
Add an incoming value to the end of the PHI list.
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition Analysis.h:118
PreservedAnalyses & abandon()
Mark an analysis as abandoned.
Definition Analysis.h:171
bool remove(const value_type &X)
Remove an item from the set vector.
Definition SetVector.h:181
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:151
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
static LLVM_ABI StructType * get(LLVMContext &Context, ArrayRef< Type * > Elements, bool isPacked=false)
This static method is the primary way to create a literal StructType.
Definition Type.cpp:483
unsigned getNumElements() const
Random access to the elements.
Type * getElementType(unsigned N) const
Analysis pass providing the TargetLibraryInfo.
Provides information about what library functions are available for the current target.
AttributeList getAttrList(LLVMContext *C, ArrayRef< unsigned > ArgNos, bool Signed, bool Ret=false, AttributeList AL=AttributeList()) const
bool getLibFunc(StringRef funcName, LibFunc &F) const
Searches for a particular function name.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
bool isMIPS64() const
Tests whether the target is MIPS 64-bit (little and big endian).
Definition Triple.h:1089
@ loongarch64
Definition Triple.h:65
bool isRISCV32() const
Tests whether the target is 32-bit RISC-V.
Definition Triple.h:1132
bool isPPC32() const
Tests whether the target is 32-bit PowerPC (little and big endian).
Definition Triple.h:1105
ArchType getArch() const
Get the parsed architecture type of this triple.
Definition Triple.h:427
bool isRISCV64() const
Tests whether the target is 64-bit RISC-V.
Definition Triple.h:1137
bool isLoongArch64() const
Tests whether the target is 64-bit LoongArch.
Definition Triple.h:1078
bool isMIPS32() const
Tests whether the target is MIPS 32-bit (little and big endian).
Definition Triple.h:1084
bool isARM() const
Tests whether the target is ARM (little and big endian).
Definition Triple.h:964
bool isPPC64() const
Tests whether the target is 64-bit PowerPC (little and big endian).
Definition Triple.h:1110
bool isAArch64() const
Tests whether the target is AArch64 (little and big endian).
Definition Triple.h:1055
bool isSystemZ() const
Tests whether the target is SystemZ.
Definition Triple.h:1156
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:46
LLVM_ABI unsigned getIntegerBitWidth() const
bool isVectorTy() const
True if this is an instance of VectorType.
Definition Type.h:290
bool isArrayTy() const
True if this is an instance of ArrayType.
Definition Type.h:281
LLVM_ABI bool isScalableTy(SmallPtrSetImpl< const Type * > &Visited) const
Return true if this is a type whose size is a known multiple of vscale.
Definition Type.cpp:65
bool isIntOrIntVectorTy() const
Return true if this is an integer type or a vector of integer types.
Definition Type.h:263
bool isPointerTy() const
True if this is an instance of PointerType.
Definition Type.h:284
Type * getArrayElementType() const
Definition Type.h:427
bool isPPC_FP128Ty() const
Return true if this is powerpc long double.
Definition Type.h:167
static LLVM_ABI Type * getVoidTy(LLVMContext &C)
Definition Type.cpp:286
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Definition Type.h:370
LLVM_ABI TypeSize getPrimitiveSizeInBits() const LLVM_READONLY
Return the basic size of this type if it is a primitive type.
Definition Type.cpp:201
bool isSized(SmallPtrSetImpl< Type * > *Visited=nullptr) const
Return true if it makes sense to take the size of this type.
Definition Type.h:328
LLVM_ABI unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
Definition Type.cpp:236
bool isFloatingPointTy() const
Return true if this is one of the floating-point types.
Definition Type.h:186
bool isIntOrPtrTy() const
Return true if this is an integer type or a pointer type.
Definition Type.h:272
bool isIntegerTy() const
True if this is an instance of IntegerType.
Definition Type.h:257
bool isFPOrFPVectorTy() const
Return true if this is a FP type or a vector of FP.
Definition Type.h:227
bool isVoidTy() const
Return true if this is 'void'.
Definition Type.h:141
Value * getOperand(unsigned i) const
Definition User.h:207
unsigned getNumOperands() const
Definition User.h:229
size_type count(const KeyT &Val) const
Return 1 if the specified key is in the map, 0 otherwise.
Definition ValueMap.h:156
Type * getType() const
All values are typed, get the type of this value.
Definition Value.h:256
LLVM_ABI void setName(const Twine &Name)
Change the name of the value.
Definition Value.cpp:397
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition Value.cpp:322
ElementCount getElementCount() const
Return an ElementCount instance to represent the (possibly scalable) number of elements in the vector...
Type * getElementType() const
int getNumOccurrences() const
constexpr ScalarTy getFixedValue() const
Definition TypeSize.h:200
constexpr bool isScalable() const
Returns whether the quantity is scaled by a runtime quantity (vscale).
Definition TypeSize.h:168
An efficient, type-erasing, non-owning reference to a callable.
const ParentTy * getParent() const
Definition ilist_node.h:34
self_iterator getIterator()
Definition ilist_node.h:123
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
CallInst * Call
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ BasicBlock
Various leaf nodes.
Definition ISDOpcodes.h:81
LLVM_ABI StringRef getBaseName(ID id)
Return the LLVM name for an intrinsic, without encoded types for overloading, such as "llvm....
initializer< Ty > init(const Ty &Val)
Function * Kernel
Summary of a kernel (=entry point for target offloading).
Definition OpenMPOpt.h:21
NodeAddr< FuncNode * > Func
Definition RDFGraph.h:393
friend class Instruction
Iterator for Instructions in a `BasicBlock.
Definition BasicBlock.h:73
This is an optimization pass for GlobalISel generic memory operations.
unsigned Log2_32_Ceil(uint32_t Value)
Return the ceil log base 2 of the specified value, 32 if the value is zero.
Definition MathExtras.h:344
@ Offset
Definition DWP.cpp:532
FunctionAddr VTableAddr Value
Definition InstrProf.h:137
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition STLExtras.h:1669
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
Definition STLExtras.h:2554
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ Done
Definition Threading.h:60
bool isAligned(Align Lhs, uint64_t SizeInBytes)
Checks that SizeInBytes is a multiple of the alignment.
Definition Alignment.h:134
LLVM_ABI std::pair< Instruction *, Value * > SplitBlockAndInsertSimpleForLoop(Value *End, BasicBlock::iterator SplitBefore)
Insert a for (int i = 0; i < End; i++) loop structure (with the exception that End is assumed > 0,...
InnerAnalysisManagerProxy< FunctionAnalysisManager, Module > FunctionAnalysisManagerModuleProxy
Provide the FunctionAnalysisManager to Module proxy.
constexpr bool isPowerOf2_64(uint64_t Value)
Return true if the argument is a power of two > 0 (64 bit edition.)
Definition MathExtras.h:284
unsigned Log2_64(uint64_t Value)
Return the floor log base 2 of the specified value, -1 if the value is zero.
Definition MathExtras.h:337
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
LLVM_ABI std::pair< Function *, FunctionCallee > getOrCreateSanitizerCtorAndInitFunctions(Module &M, StringRef CtorName, StringRef InitName, ArrayRef< Type * > InitArgTypes, ArrayRef< Value * > InitArgs, function_ref< void(Function *, FunctionCallee)> FunctionsCreatedCallback, StringRef VersionCheckName=StringRef(), bool Weak=false)
Creates sanitizer constructor function lazily.
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:163
constexpr uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:547
LLVM_ABI bool isKnownNonZero(const Value *V, const SimplifyQuery &Q, unsigned Depth=0)
Return true if the given value is known to be non-zero when defined.
LLVM_ABI raw_fd_ostream & errs()
This returns a reference to a raw_ostream for standard error.
AtomicOrdering
Atomic ordering for LLVM's memory model.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:74
IRBuilder(LLVMContext &, FolderTy, InserterTy, MDNode *, ArrayRef< OperandBundleDef >) -> IRBuilder< FolderTy, InserterTy >
@ Or
Bitwise or logical OR of integers.
@ And
Bitwise or logical AND of integers.
@ Add
Sum of integers.
DWARFExpression::Operation Op
RoundingMode
Rounding mode.
ArrayRef(const T &OneElt) -> ArrayRef< T >
constexpr unsigned BitWidth
LLVM_ABI void appendToGlobalCtors(Module &M, Function *F, int Priority, Constant *Data=nullptr)
Append F to the list of global ctors of module M with the given Priority.
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:559
iterator_range< df_iterator< T > > depth_first(const T &G)
LLVM_ABI Instruction * SplitBlockAndInsertIfThen(Value *Cond, BasicBlock::iterator SplitBefore, bool Unreachable, MDNode *BranchWeights=nullptr, DomTreeUpdater *DTU=nullptr, LoopInfo *LI=nullptr, BasicBlock *ThenBlock=nullptr)
Split the containing block at the specified instruction - everything before SplitBefore stays in the ...
LLVM_ABI void maybeMarkSanitizerLibraryCallNoBuiltin(CallInst *CI, const TargetLibraryInfo *TLI)
Given a CallInst, check if it calls a string function known to CodeGen, and mark it with NoBuiltin if...
Definition Local.cpp:3889
LLVM_ABI bool removeUnreachableBlocks(Function &F, DomTreeUpdater *DTU=nullptr, MemorySSAUpdater *MSSAU=nullptr)
Remove all blocks that can not be reached from the function's entry.
Definition Local.cpp:2901
LLVM_ABI bool checkIfAlreadyInstrumented(Module &M, StringRef Flag)
Check if module has flag attached, if not add the flag.
std::string itostr(int64_t X)
AnalysisManager< Module > ModuleAnalysisManager
Convenience typedef for the Module analysis manager.
Definition MIRParser.h:39
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
LLVM_ABI void printPipeline(raw_ostream &OS, function_ref< StringRef(StringRef)> MapClassName2PassName)
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
A CRTP mix-in to automatically provide informational APIs needed for passes.
Definition PassManager.h:70