LLVM 22.0.0git
MemorySanitizer.cpp
Go to the documentation of this file.
1//===- MemorySanitizer.cpp - detector of uninitialized reads --------------===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9/// \file
10/// This file is a part of MemorySanitizer, a detector of uninitialized
11/// reads.
12///
13/// The algorithm of the tool is similar to Memcheck
14/// (https://static.usenix.org/event/usenix05/tech/general/full_papers/seward/seward_html/usenix2005.html)
15/// We associate a few shadow bits with every byte of the application memory,
16/// poison the shadow of the malloc-ed or alloca-ed memory, load the shadow,
17/// bits on every memory read, propagate the shadow bits through some of the
18/// arithmetic instruction (including MOV), store the shadow bits on every
19/// memory write, report a bug on some other instructions (e.g. JMP) if the
20/// associated shadow is poisoned.
21///
22/// But there are differences too. The first and the major one:
23/// compiler instrumentation instead of binary instrumentation. This
24/// gives us much better register allocation, possible compiler
25/// optimizations and a fast start-up. But this brings the major issue
26/// as well: msan needs to see all program events, including system
27/// calls and reads/writes in system libraries, so we either need to
28/// compile *everything* with msan or use a binary translation
29/// component (e.g. DynamoRIO) to instrument pre-built libraries.
30/// Another difference from Memcheck is that we use 8 shadow bits per
31/// byte of application memory and use a direct shadow mapping. This
32/// greatly simplifies the instrumentation code and avoids races on
33/// shadow updates (Memcheck is single-threaded so races are not a
34/// concern there. Memcheck uses 2 shadow bits per byte with a slow
35/// path storage that uses 8 bits per byte).
36///
37/// The default value of shadow is 0, which means "clean" (not poisoned).
38///
39/// Every module initializer should call __msan_init to ensure that the
40/// shadow memory is ready. On error, __msan_warning is called. Since
41/// parameters and return values may be passed via registers, we have a
42/// specialized thread-local shadow for return values
43/// (__msan_retval_tls) and parameters (__msan_param_tls).
44///
45/// Origin tracking.
46///
47/// MemorySanitizer can track origins (allocation points) of all uninitialized
48/// values. This behavior is controlled with a flag (msan-track-origins) and is
49/// disabled by default.
50///
51/// Origins are 4-byte values created and interpreted by the runtime library.
52/// They are stored in a second shadow mapping, one 4-byte value for 4 bytes
53/// of application memory. Propagation of origins is basically a bunch of
54/// "select" instructions that pick the origin of a dirty argument, if an
55/// instruction has one.
56///
57/// Every 4 aligned, consecutive bytes of application memory have one origin
58/// value associated with them. If these bytes contain uninitialized data
59/// coming from 2 different allocations, the last store wins. Because of this,
60/// MemorySanitizer reports can show unrelated origins, but this is unlikely in
61/// practice.
62///
63/// Origins are meaningless for fully initialized values, so MemorySanitizer
64/// avoids storing origin to memory when a fully initialized value is stored.
65/// This way it avoids needless overwriting origin of the 4-byte region on
66/// a short (i.e. 1 byte) clean store, and it is also good for performance.
67///
68/// Atomic handling.
69///
70/// Ideally, every atomic store of application value should update the
71/// corresponding shadow location in an atomic way. Unfortunately, atomic store
72/// of two disjoint locations can not be done without severe slowdown.
73///
74/// Therefore, we implement an approximation that may err on the safe side.
75/// In this implementation, every atomically accessed location in the program
76/// may only change from (partially) uninitialized to fully initialized, but
77/// not the other way around. We load the shadow _after_ the application load,
78/// and we store the shadow _before_ the app store. Also, we always store clean
79/// shadow (if the application store is atomic). This way, if the store-load
80/// pair constitutes a happens-before arc, shadow store and load are correctly
81/// ordered such that the load will get either the value that was stored, or
82/// some later value (which is always clean).
83///
84/// This does not work very well with Compare-And-Swap (CAS) and
85/// Read-Modify-Write (RMW) operations. To follow the above logic, CAS and RMW
86/// must store the new shadow before the app operation, and load the shadow
87/// after the app operation. Computers don't work this way. Current
88/// implementation ignores the load aspect of CAS/RMW, always returning a clean
89/// value. It implements the store part as a simple atomic store by storing a
90/// clean shadow.
91///
92/// Instrumenting inline assembly.
93///
94/// For inline assembly code LLVM has little idea about which memory locations
95/// become initialized depending on the arguments. It can be possible to figure
96/// out which arguments are meant to point to inputs and outputs, but the
97/// actual semantics can be only visible at runtime. In the Linux kernel it's
98/// also possible that the arguments only indicate the offset for a base taken
99/// from a segment register, so it's dangerous to treat any asm() arguments as
100/// pointers. We take a conservative approach generating calls to
101/// __msan_instrument_asm_store(ptr, size)
102/// , which defer the memory unpoisoning to the runtime library.
103/// The latter can perform more complex address checks to figure out whether
104/// it's safe to touch the shadow memory.
105/// Like with atomic operations, we call __msan_instrument_asm_store() before
106/// the assembly call, so that changes to the shadow memory will be seen by
107/// other threads together with main memory initialization.
108///
109/// KernelMemorySanitizer (KMSAN) implementation.
110///
111/// The major differences between KMSAN and MSan instrumentation are:
112/// - KMSAN always tracks the origins and implies msan-keep-going=true;
113/// - KMSAN allocates shadow and origin memory for each page separately, so
114/// there are no explicit accesses to shadow and origin in the
115/// instrumentation.
116/// Shadow and origin values for a particular X-byte memory location
117/// (X=1,2,4,8) are accessed through pointers obtained via the
118/// __msan_metadata_ptr_for_load_X(ptr)
119/// __msan_metadata_ptr_for_store_X(ptr)
120/// functions. The corresponding functions check that the X-byte accesses
121/// are possible and returns the pointers to shadow and origin memory.
122/// Arbitrary sized accesses are handled with:
123/// __msan_metadata_ptr_for_load_n(ptr, size)
124/// __msan_metadata_ptr_for_store_n(ptr, size);
125/// Note that the sanitizer code has to deal with how shadow/origin pairs
126/// returned by the these functions are represented in different ABIs. In
127/// the X86_64 ABI they are returned in RDX:RAX, in PowerPC64 they are
128/// returned in r3 and r4, and in the SystemZ ABI they are written to memory
129/// pointed to by a hidden parameter.
130/// - TLS variables are stored in a single per-task struct. A call to a
131/// function __msan_get_context_state() returning a pointer to that struct
132/// is inserted into every instrumented function before the entry block;
133/// - __msan_warning() takes a 32-bit origin parameter;
134/// - local variables are poisoned with __msan_poison_alloca() upon function
135/// entry and unpoisoned with __msan_unpoison_alloca() before leaving the
136/// function;
137/// - the pass doesn't declare any global variables or add global constructors
138/// to the translation unit.
139///
140/// Also, KMSAN currently ignores uninitialized memory passed into inline asm
141/// calls, making sure we're on the safe side wrt. possible false positives.
142///
143/// KernelMemorySanitizer only supports X86_64, SystemZ and PowerPC64 at the
144/// moment.
145///
146//
147// FIXME: This sanitizer does not yet handle scalable vectors
148//
149//===----------------------------------------------------------------------===//
150
152#include "llvm/ADT/APInt.h"
153#include "llvm/ADT/ArrayRef.h"
154#include "llvm/ADT/DenseMap.h"
156#include "llvm/ADT/SetVector.h"
157#include "llvm/ADT/SmallPtrSet.h"
158#include "llvm/ADT/SmallVector.h"
160#include "llvm/ADT/StringRef.h"
164#include "llvm/IR/Argument.h"
166#include "llvm/IR/Attributes.h"
167#include "llvm/IR/BasicBlock.h"
168#include "llvm/IR/CallingConv.h"
169#include "llvm/IR/Constant.h"
170#include "llvm/IR/Constants.h"
171#include "llvm/IR/DataLayout.h"
172#include "llvm/IR/DerivedTypes.h"
173#include "llvm/IR/Function.h"
174#include "llvm/IR/GlobalValue.h"
176#include "llvm/IR/IRBuilder.h"
177#include "llvm/IR/InlineAsm.h"
178#include "llvm/IR/InstVisitor.h"
179#include "llvm/IR/InstrTypes.h"
180#include "llvm/IR/Instruction.h"
181#include "llvm/IR/Instructions.h"
183#include "llvm/IR/Intrinsics.h"
184#include "llvm/IR/IntrinsicsAArch64.h"
185#include "llvm/IR/IntrinsicsX86.h"
186#include "llvm/IR/MDBuilder.h"
187#include "llvm/IR/Module.h"
188#include "llvm/IR/Type.h"
189#include "llvm/IR/Value.h"
190#include "llvm/IR/ValueMap.h"
193#include "llvm/Support/Casting.h"
195#include "llvm/Support/Debug.h"
205#include <algorithm>
206#include <cassert>
207#include <cstddef>
208#include <cstdint>
209#include <memory>
210#include <numeric>
211#include <string>
212#include <tuple>
213
214using namespace llvm;
215
216#define DEBUG_TYPE "msan"
217
218DEBUG_COUNTER(DebugInsertCheck, "msan-insert-check",
219 "Controls which checks to insert");
220
221DEBUG_COUNTER(DebugInstrumentInstruction, "msan-instrument-instruction",
222 "Controls which instruction to instrument");
223
224static const unsigned kOriginSize = 4;
227
228// These constants must be kept in sync with the ones in msan.h.
229// TODO: increase size to match SVE/SVE2/SME/SME2 limits
230static const unsigned kParamTLSSize = 800;
231static const unsigned kRetvalTLSSize = 800;
232
233// Accesses sizes are powers of two: 1, 2, 4, 8.
234static const size_t kNumberOfAccessSizes = 4;
235
236/// Track origins of uninitialized values.
237///
238/// Adds a section to MemorySanitizer report that points to the allocation
239/// (stack or heap) the uninitialized bits came from originally.
241 "msan-track-origins",
242 cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden,
243 cl::init(0));
244
245static cl::opt<bool> ClKeepGoing("msan-keep-going",
246 cl::desc("keep going after reporting a UMR"),
247 cl::Hidden, cl::init(false));
248
249static cl::opt<bool>
250 ClPoisonStack("msan-poison-stack",
251 cl::desc("poison uninitialized stack variables"), cl::Hidden,
252 cl::init(true));
253
255 "msan-poison-stack-with-call",
256 cl::desc("poison uninitialized stack variables with a call"), cl::Hidden,
257 cl::init(false));
258
260 "msan-poison-stack-pattern",
261 cl::desc("poison uninitialized stack variables with the given pattern"),
262 cl::Hidden, cl::init(0xff));
263
264static cl::opt<bool>
265 ClPrintStackNames("msan-print-stack-names",
266 cl::desc("Print name of local stack variable"),
267 cl::Hidden, cl::init(true));
268
269static cl::opt<bool>
270 ClPoisonUndef("msan-poison-undef",
271 cl::desc("Poison fully undef temporary values. "
272 "Partially undefined constant vectors "
273 "are unaffected by this flag (see "
274 "-msan-poison-undef-vectors)."),
275 cl::Hidden, cl::init(true));
276
278 "msan-poison-undef-vectors",
279 cl::desc("Precisely poison partially undefined constant vectors. "
280 "If false (legacy behavior), the entire vector is "
281 "considered fully initialized, which may lead to false "
282 "negatives. Fully undefined constant vectors are "
283 "unaffected by this flag (see -msan-poison-undef)."),
284 cl::Hidden, cl::init(false));
285
287 "msan-precise-disjoint-or",
288 cl::desc("Precisely poison disjoint OR. If false (legacy behavior), "
289 "disjointedness is ignored (i.e., 1|1 is initialized)."),
290 cl::Hidden, cl::init(false));
291
292static cl::opt<bool>
293 ClHandleICmp("msan-handle-icmp",
294 cl::desc("propagate shadow through ICmpEQ and ICmpNE"),
295 cl::Hidden, cl::init(true));
296
297static cl::opt<bool>
298 ClHandleICmpExact("msan-handle-icmp-exact",
299 cl::desc("exact handling of relational integer ICmp"),
300 cl::Hidden, cl::init(true));
301
303 "msan-handle-lifetime-intrinsics",
304 cl::desc(
305 "when possible, poison scoped variables at the beginning of the scope "
306 "(slower, but more precise)"),
307 cl::Hidden, cl::init(true));
308
309// When compiling the Linux kernel, we sometimes see false positives related to
310// MSan being unable to understand that inline assembly calls may initialize
311// local variables.
312// This flag makes the compiler conservatively unpoison every memory location
313// passed into an assembly call. Note that this may cause false positives.
314// Because it's impossible to figure out the array sizes, we can only unpoison
315// the first sizeof(type) bytes for each type* pointer.
317 "msan-handle-asm-conservative",
318 cl::desc("conservative handling of inline assembly"), cl::Hidden,
319 cl::init(true));
320
321// This flag controls whether we check the shadow of the address
322// operand of load or store. Such bugs are very rare, since load from
323// a garbage address typically results in SEGV, but still happen
324// (e.g. only lower bits of address are garbage, or the access happens
325// early at program startup where malloc-ed memory is more likely to
326// be zeroed. As of 2012-08-28 this flag adds 20% slowdown.
328 "msan-check-access-address",
329 cl::desc("report accesses through a pointer which has poisoned shadow"),
330 cl::Hidden, cl::init(true));
331
333 "msan-eager-checks",
334 cl::desc("check arguments and return values at function call boundaries"),
335 cl::Hidden, cl::init(false));
336
338 "msan-dump-strict-instructions",
339 cl::desc("print out instructions with default strict semantics i.e.,"
340 "check that all the inputs are fully initialized, and mark "
341 "the output as fully initialized. These semantics are applied "
342 "to instructions that could not be handled explicitly nor "
343 "heuristically."),
344 cl::Hidden, cl::init(false));
345
346// Currently, all the heuristically handled instructions are specifically
347// IntrinsicInst. However, we use the broader "HeuristicInstructions" name
348// to parallel 'msan-dump-strict-instructions', and to keep the door open to
349// handling non-intrinsic instructions heuristically.
351 "msan-dump-heuristic-instructions",
352 cl::desc("Prints 'unknown' instructions that were handled heuristically. "
353 "Use -msan-dump-strict-instructions to print instructions that "
354 "could not be handled explicitly nor heuristically."),
355 cl::Hidden, cl::init(false));
356
358 "msan-instrumentation-with-call-threshold",
359 cl::desc(
360 "If the function being instrumented requires more than "
361 "this number of checks and origin stores, use callbacks instead of "
362 "inline checks (-1 means never use callbacks)."),
363 cl::Hidden, cl::init(3500));
364
365static cl::opt<bool>
366 ClEnableKmsan("msan-kernel",
367 cl::desc("Enable KernelMemorySanitizer instrumentation"),
368 cl::Hidden, cl::init(false));
369
370static cl::opt<bool>
371 ClDisableChecks("msan-disable-checks",
372 cl::desc("Apply no_sanitize to the whole file"), cl::Hidden,
373 cl::init(false));
374
375static cl::opt<bool>
376 ClCheckConstantShadow("msan-check-constant-shadow",
377 cl::desc("Insert checks for constant shadow values"),
378 cl::Hidden, cl::init(true));
379
380// This is off by default because of a bug in gold:
381// https://sourceware.org/bugzilla/show_bug.cgi?id=19002
382static cl::opt<bool>
383 ClWithComdat("msan-with-comdat",
384 cl::desc("Place MSan constructors in comdat sections"),
385 cl::Hidden, cl::init(false));
386
387// These options allow to specify custom memory map parameters
388// See MemoryMapParams for details.
389static cl::opt<uint64_t> ClAndMask("msan-and-mask",
390 cl::desc("Define custom MSan AndMask"),
391 cl::Hidden, cl::init(0));
392
393static cl::opt<uint64_t> ClXorMask("msan-xor-mask",
394 cl::desc("Define custom MSan XorMask"),
395 cl::Hidden, cl::init(0));
396
397static cl::opt<uint64_t> ClShadowBase("msan-shadow-base",
398 cl::desc("Define custom MSan ShadowBase"),
399 cl::Hidden, cl::init(0));
400
401static cl::opt<uint64_t> ClOriginBase("msan-origin-base",
402 cl::desc("Define custom MSan OriginBase"),
403 cl::Hidden, cl::init(0));
404
405static cl::opt<int>
406 ClDisambiguateWarning("msan-disambiguate-warning-threshold",
407 cl::desc("Define threshold for number of checks per "
408 "debug location to force origin update."),
409 cl::Hidden, cl::init(3));
410
411const char kMsanModuleCtorName[] = "msan.module_ctor";
412const char kMsanInitName[] = "__msan_init";
413
414namespace {
415
416// Memory map parameters used in application-to-shadow address calculation.
417// Offset = (Addr & ~AndMask) ^ XorMask
418// Shadow = ShadowBase + Offset
419// Origin = OriginBase + Offset
420struct MemoryMapParams {
421 uint64_t AndMask;
422 uint64_t XorMask;
423 uint64_t ShadowBase;
424 uint64_t OriginBase;
425};
426
427struct PlatformMemoryMapParams {
428 const MemoryMapParams *bits32;
429 const MemoryMapParams *bits64;
430};
431
432} // end anonymous namespace
433
434// i386 Linux
435static const MemoryMapParams Linux_I386_MemoryMapParams = {
436 0x000080000000, // AndMask
437 0, // XorMask (not used)
438 0, // ShadowBase (not used)
439 0x000040000000, // OriginBase
440};
441
442// x86_64 Linux
443static const MemoryMapParams Linux_X86_64_MemoryMapParams = {
444 0, // AndMask (not used)
445 0x500000000000, // XorMask
446 0, // ShadowBase (not used)
447 0x100000000000, // OriginBase
448};
449
450// mips32 Linux
451// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
452// after picking good constants
453
454// mips64 Linux
455static const MemoryMapParams Linux_MIPS64_MemoryMapParams = {
456 0, // AndMask (not used)
457 0x008000000000, // XorMask
458 0, // ShadowBase (not used)
459 0x002000000000, // OriginBase
460};
461
462// ppc32 Linux
463// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
464// after picking good constants
465
466// ppc64 Linux
467static const MemoryMapParams Linux_PowerPC64_MemoryMapParams = {
468 0xE00000000000, // AndMask
469 0x100000000000, // XorMask
470 0x080000000000, // ShadowBase
471 0x1C0000000000, // OriginBase
472};
473
474// s390x Linux
475static const MemoryMapParams Linux_S390X_MemoryMapParams = {
476 0xC00000000000, // AndMask
477 0, // XorMask (not used)
478 0x080000000000, // ShadowBase
479 0x1C0000000000, // OriginBase
480};
481
482// arm32 Linux
483// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
484// after picking good constants
485
486// aarch64 Linux
487static const MemoryMapParams Linux_AArch64_MemoryMapParams = {
488 0, // AndMask (not used)
489 0x0B00000000000, // XorMask
490 0, // ShadowBase (not used)
491 0x0200000000000, // OriginBase
492};
493
494// loongarch64 Linux
495static const MemoryMapParams Linux_LoongArch64_MemoryMapParams = {
496 0, // AndMask (not used)
497 0x500000000000, // XorMask
498 0, // ShadowBase (not used)
499 0x100000000000, // OriginBase
500};
501
502// riscv32 Linux
503// FIXME: Remove -msan-origin-base -msan-and-mask added by PR #109284 to tests
504// after picking good constants
505
506// aarch64 FreeBSD
507static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams = {
508 0x1800000000000, // AndMask
509 0x0400000000000, // XorMask
510 0x0200000000000, // ShadowBase
511 0x0700000000000, // OriginBase
512};
513
514// i386 FreeBSD
515static const MemoryMapParams FreeBSD_I386_MemoryMapParams = {
516 0x000180000000, // AndMask
517 0x000040000000, // XorMask
518 0x000020000000, // ShadowBase
519 0x000700000000, // OriginBase
520};
521
522// x86_64 FreeBSD
523static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams = {
524 0xc00000000000, // AndMask
525 0x200000000000, // XorMask
526 0x100000000000, // ShadowBase
527 0x380000000000, // OriginBase
528};
529
530// x86_64 NetBSD
531static const MemoryMapParams NetBSD_X86_64_MemoryMapParams = {
532 0, // AndMask
533 0x500000000000, // XorMask
534 0, // ShadowBase
535 0x100000000000, // OriginBase
536};
537
538static const PlatformMemoryMapParams Linux_X86_MemoryMapParams = {
541};
542
543static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams = {
544 nullptr,
546};
547
548static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams = {
549 nullptr,
551};
552
553static const PlatformMemoryMapParams Linux_S390_MemoryMapParams = {
554 nullptr,
556};
557
558static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams = {
559 nullptr,
561};
562
563static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams = {
564 nullptr,
566};
567
568static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams = {
569 nullptr,
571};
572
573static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams = {
576};
577
578static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams = {
579 nullptr,
581};
582
583namespace {
584
585/// Instrument functions of a module to detect uninitialized reads.
586///
587/// Instantiating MemorySanitizer inserts the msan runtime library API function
588/// declarations into the module if they don't exist already. Instantiating
589/// ensures the __msan_init function is in the list of global constructors for
590/// the module.
591class MemorySanitizer {
592public:
593 MemorySanitizer(Module &M, MemorySanitizerOptions Options)
594 : CompileKernel(Options.Kernel), TrackOrigins(Options.TrackOrigins),
595 Recover(Options.Recover), EagerChecks(Options.EagerChecks) {
596 initializeModule(M);
597 }
598
599 // MSan cannot be moved or copied because of MapParams.
600 MemorySanitizer(MemorySanitizer &&) = delete;
601 MemorySanitizer &operator=(MemorySanitizer &&) = delete;
602 MemorySanitizer(const MemorySanitizer &) = delete;
603 MemorySanitizer &operator=(const MemorySanitizer &) = delete;
604
605 bool sanitizeFunction(Function &F, TargetLibraryInfo &TLI);
606
607private:
608 friend struct MemorySanitizerVisitor;
609 friend struct VarArgHelperBase;
610 friend struct VarArgAMD64Helper;
611 friend struct VarArgAArch64Helper;
612 friend struct VarArgPowerPC64Helper;
613 friend struct VarArgPowerPC32Helper;
614 friend struct VarArgSystemZHelper;
615 friend struct VarArgI386Helper;
616 friend struct VarArgGenericHelper;
617
618 void initializeModule(Module &M);
619 void initializeCallbacks(Module &M, const TargetLibraryInfo &TLI);
620 void createKernelApi(Module &M, const TargetLibraryInfo &TLI);
621 void createUserspaceApi(Module &M, const TargetLibraryInfo &TLI);
622
623 template <typename... ArgsTy>
624 FunctionCallee getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
625 ArgsTy... Args);
626
627 /// True if we're compiling the Linux kernel.
628 bool CompileKernel;
629 /// Track origins (allocation points) of uninitialized values.
630 int TrackOrigins;
631 bool Recover;
632 bool EagerChecks;
633
634 Triple TargetTriple;
635 LLVMContext *C;
636 Type *IntptrTy; ///< Integer type with the size of a ptr in default AS.
637 Type *OriginTy;
638 PointerType *PtrTy; ///< Integer type with the size of a ptr in default AS.
639
640 // XxxTLS variables represent the per-thread state in MSan and per-task state
641 // in KMSAN.
642 // For the userspace these point to thread-local globals. In the kernel land
643 // they point to the members of a per-task struct obtained via a call to
644 // __msan_get_context_state().
645
646 /// Thread-local shadow storage for function parameters.
647 Value *ParamTLS;
648
649 /// Thread-local origin storage for function parameters.
650 Value *ParamOriginTLS;
651
652 /// Thread-local shadow storage for function return value.
653 Value *RetvalTLS;
654
655 /// Thread-local origin storage for function return value.
656 Value *RetvalOriginTLS;
657
658 /// Thread-local shadow storage for in-register va_arg function.
659 Value *VAArgTLS;
660
661 /// Thread-local shadow storage for in-register va_arg function.
662 Value *VAArgOriginTLS;
663
664 /// Thread-local shadow storage for va_arg overflow area.
665 Value *VAArgOverflowSizeTLS;
666
667 /// Are the instrumentation callbacks set up?
668 bool CallbacksInitialized = false;
669
670 /// The run-time callback to print a warning.
671 FunctionCallee WarningFn;
672
673 // These arrays are indexed by log2(AccessSize).
674 FunctionCallee MaybeWarningFn[kNumberOfAccessSizes];
675 FunctionCallee MaybeWarningVarSizeFn;
676 FunctionCallee MaybeStoreOriginFn[kNumberOfAccessSizes];
677
678 /// Run-time helper that generates a new origin value for a stack
679 /// allocation.
680 FunctionCallee MsanSetAllocaOriginWithDescriptionFn;
681 // No description version
682 FunctionCallee MsanSetAllocaOriginNoDescriptionFn;
683
684 /// Run-time helper that poisons stack on function entry.
685 FunctionCallee MsanPoisonStackFn;
686
687 /// Run-time helper that records a store (or any event) of an
688 /// uninitialized value and returns an updated origin id encoding this info.
689 FunctionCallee MsanChainOriginFn;
690
691 /// Run-time helper that paints an origin over a region.
692 FunctionCallee MsanSetOriginFn;
693
694 /// MSan runtime replacements for memmove, memcpy and memset.
695 FunctionCallee MemmoveFn, MemcpyFn, MemsetFn;
696
697 /// KMSAN callback for task-local function argument shadow.
698 StructType *MsanContextStateTy;
699 FunctionCallee MsanGetContextStateFn;
700
701 /// Functions for poisoning/unpoisoning local variables
702 FunctionCallee MsanPoisonAllocaFn, MsanUnpoisonAllocaFn;
703
704 /// Pair of shadow/origin pointers.
705 Type *MsanMetadata;
706
707 /// Each of the MsanMetadataPtrXxx functions returns a MsanMetadata.
708 FunctionCallee MsanMetadataPtrForLoadN, MsanMetadataPtrForStoreN;
709 FunctionCallee MsanMetadataPtrForLoad_1_8[4];
710 FunctionCallee MsanMetadataPtrForStore_1_8[4];
711 FunctionCallee MsanInstrumentAsmStoreFn;
712
713 /// Storage for return values of the MsanMetadataPtrXxx functions.
714 Value *MsanMetadataAlloca;
715
716 /// Helper to choose between different MsanMetadataPtrXxx().
717 FunctionCallee getKmsanShadowOriginAccessFn(bool isStore, int size);
718
719 /// Memory map parameters used in application-to-shadow calculation.
720 const MemoryMapParams *MapParams;
721
722 /// Custom memory map parameters used when -msan-shadow-base or
723 // -msan-origin-base is provided.
724 MemoryMapParams CustomMapParams;
725
726 MDNode *ColdCallWeights;
727
728 /// Branch weights for origin store.
729 MDNode *OriginStoreWeights;
730};
731
732void insertModuleCtor(Module &M) {
735 /*InitArgTypes=*/{},
736 /*InitArgs=*/{},
737 // This callback is invoked when the functions are created the first
738 // time. Hook them into the global ctors list in that case:
739 [&](Function *Ctor, FunctionCallee) {
740 if (!ClWithComdat) {
741 appendToGlobalCtors(M, Ctor, 0);
742 return;
743 }
744 Comdat *MsanCtorComdat = M.getOrInsertComdat(kMsanModuleCtorName);
745 Ctor->setComdat(MsanCtorComdat);
746 appendToGlobalCtors(M, Ctor, 0, Ctor);
747 });
748}
749
750template <class T> T getOptOrDefault(const cl::opt<T> &Opt, T Default) {
751 return (Opt.getNumOccurrences() > 0) ? Opt : Default;
752}
753
754} // end anonymous namespace
755
757 bool EagerChecks)
758 : Kernel(getOptOrDefault(ClEnableKmsan, K)),
759 TrackOrigins(getOptOrDefault(ClTrackOrigins, Kernel ? 2 : TO)),
760 Recover(getOptOrDefault(ClKeepGoing, Kernel || R)),
761 EagerChecks(getOptOrDefault(ClEagerChecks, EagerChecks)) {}
762
765 // Return early if nosanitize_memory module flag is present for the module.
766 if (checkIfAlreadyInstrumented(M, "nosanitize_memory"))
767 return PreservedAnalyses::all();
768 bool Modified = false;
769 if (!Options.Kernel) {
770 insertModuleCtor(M);
771 Modified = true;
772 }
773
774 auto &FAM = AM.getResult<FunctionAnalysisManagerModuleProxy>(M).getManager();
775 for (Function &F : M) {
776 if (F.empty())
777 continue;
778 MemorySanitizer Msan(*F.getParent(), Options);
779 Modified |=
780 Msan.sanitizeFunction(F, FAM.getResult<TargetLibraryAnalysis>(F));
781 }
782
783 if (!Modified)
784 return PreservedAnalyses::all();
785
787 // GlobalsAA is considered stateless and does not get invalidated unless
788 // explicitly invalidated; PreservedAnalyses::none() is not enough. Sanitizers
789 // make changes that require GlobalsAA to be invalidated.
790 PA.abandon<GlobalsAA>();
791 return PA;
792}
793
795 raw_ostream &OS, function_ref<StringRef(StringRef)> MapClassName2PassName) {
797 OS, MapClassName2PassName);
798 OS << '<';
799 if (Options.Recover)
800 OS << "recover;";
801 if (Options.Kernel)
802 OS << "kernel;";
803 if (Options.EagerChecks)
804 OS << "eager-checks;";
805 OS << "track-origins=" << Options.TrackOrigins;
806 OS << '>';
807}
808
809/// Create a non-const global initialized with the given string.
810///
811/// Creates a writable global for Str so that we can pass it to the
812/// run-time lib. Runtime uses first 4 bytes of the string to store the
813/// frame ID, so the string needs to be mutable.
815 StringRef Str) {
816 Constant *StrConst = ConstantDataArray::getString(M.getContext(), Str);
817 return new GlobalVariable(M, StrConst->getType(), /*isConstant=*/true,
818 GlobalValue::PrivateLinkage, StrConst, "");
819}
820
821template <typename... ArgsTy>
823MemorySanitizer::getOrInsertMsanMetadataFunction(Module &M, StringRef Name,
824 ArgsTy... Args) {
825 if (TargetTriple.getArch() == Triple::systemz) {
826 // SystemZ ABI: shadow/origin pair is returned via a hidden parameter.
827 return M.getOrInsertFunction(Name, Type::getVoidTy(*C), PtrTy,
828 std::forward<ArgsTy>(Args)...);
829 }
830
831 return M.getOrInsertFunction(Name, MsanMetadata,
832 std::forward<ArgsTy>(Args)...);
833}
834
835/// Create KMSAN API callbacks.
836void MemorySanitizer::createKernelApi(Module &M, const TargetLibraryInfo &TLI) {
837 IRBuilder<> IRB(*C);
838
839 // These will be initialized in insertKmsanPrologue().
840 RetvalTLS = nullptr;
841 RetvalOriginTLS = nullptr;
842 ParamTLS = nullptr;
843 ParamOriginTLS = nullptr;
844 VAArgTLS = nullptr;
845 VAArgOriginTLS = nullptr;
846 VAArgOverflowSizeTLS = nullptr;
847
848 WarningFn = M.getOrInsertFunction("__msan_warning",
849 TLI.getAttrList(C, {0}, /*Signed=*/false),
850 IRB.getVoidTy(), IRB.getInt32Ty());
851
852 // Requests the per-task context state (kmsan_context_state*) from the
853 // runtime library.
854 MsanContextStateTy = StructType::get(
855 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
856 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8),
857 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8),
858 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8), /* va_arg_origin */
859 IRB.getInt64Ty(), ArrayType::get(OriginTy, kParamTLSSize / 4), OriginTy,
860 OriginTy);
861 MsanGetContextStateFn =
862 M.getOrInsertFunction("__msan_get_context_state", PtrTy);
863
864 MsanMetadata = StructType::get(PtrTy, PtrTy);
865
866 for (int ind = 0, size = 1; ind < 4; ind++, size <<= 1) {
867 std::string name_load =
868 "__msan_metadata_ptr_for_load_" + std::to_string(size);
869 std::string name_store =
870 "__msan_metadata_ptr_for_store_" + std::to_string(size);
871 MsanMetadataPtrForLoad_1_8[ind] =
872 getOrInsertMsanMetadataFunction(M, name_load, PtrTy);
873 MsanMetadataPtrForStore_1_8[ind] =
874 getOrInsertMsanMetadataFunction(M, name_store, PtrTy);
875 }
876
877 MsanMetadataPtrForLoadN = getOrInsertMsanMetadataFunction(
878 M, "__msan_metadata_ptr_for_load_n", PtrTy, IntptrTy);
879 MsanMetadataPtrForStoreN = getOrInsertMsanMetadataFunction(
880 M, "__msan_metadata_ptr_for_store_n", PtrTy, IntptrTy);
881
882 // Functions for poisoning and unpoisoning memory.
883 MsanPoisonAllocaFn = M.getOrInsertFunction(
884 "__msan_poison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
885 MsanUnpoisonAllocaFn = M.getOrInsertFunction(
886 "__msan_unpoison_alloca", IRB.getVoidTy(), PtrTy, IntptrTy);
887}
888
890 return M.getOrInsertGlobal(Name, Ty, [&] {
891 return new GlobalVariable(M, Ty, false, GlobalVariable::ExternalLinkage,
892 nullptr, Name, nullptr,
894 });
895}
896
897/// Insert declarations for userspace-specific functions and globals.
898void MemorySanitizer::createUserspaceApi(Module &M,
899 const TargetLibraryInfo &TLI) {
900 IRBuilder<> IRB(*C);
901
902 // Create the callback.
903 // FIXME: this function should have "Cold" calling conv,
904 // which is not yet implemented.
905 if (TrackOrigins) {
906 StringRef WarningFnName = Recover ? "__msan_warning_with_origin"
907 : "__msan_warning_with_origin_noreturn";
908 WarningFn = M.getOrInsertFunction(WarningFnName,
909 TLI.getAttrList(C, {0}, /*Signed=*/false),
910 IRB.getVoidTy(), IRB.getInt32Ty());
911 } else {
912 StringRef WarningFnName =
913 Recover ? "__msan_warning" : "__msan_warning_noreturn";
914 WarningFn = M.getOrInsertFunction(WarningFnName, IRB.getVoidTy());
915 }
916
917 // Create the global TLS variables.
918 RetvalTLS =
919 getOrInsertGlobal(M, "__msan_retval_tls",
920 ArrayType::get(IRB.getInt64Ty(), kRetvalTLSSize / 8));
921
922 RetvalOriginTLS = getOrInsertGlobal(M, "__msan_retval_origin_tls", OriginTy);
923
924 ParamTLS =
925 getOrInsertGlobal(M, "__msan_param_tls",
926 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
927
928 ParamOriginTLS =
929 getOrInsertGlobal(M, "__msan_param_origin_tls",
930 ArrayType::get(OriginTy, kParamTLSSize / 4));
931
932 VAArgTLS =
933 getOrInsertGlobal(M, "__msan_va_arg_tls",
934 ArrayType::get(IRB.getInt64Ty(), kParamTLSSize / 8));
935
936 VAArgOriginTLS =
937 getOrInsertGlobal(M, "__msan_va_arg_origin_tls",
938 ArrayType::get(OriginTy, kParamTLSSize / 4));
939
940 VAArgOverflowSizeTLS = getOrInsertGlobal(M, "__msan_va_arg_overflow_size_tls",
941 IRB.getIntPtrTy(M.getDataLayout()));
942
943 for (size_t AccessSizeIndex = 0; AccessSizeIndex < kNumberOfAccessSizes;
944 AccessSizeIndex++) {
945 unsigned AccessSize = 1 << AccessSizeIndex;
946 std::string FunctionName = "__msan_maybe_warning_" + itostr(AccessSize);
947 MaybeWarningFn[AccessSizeIndex] = M.getOrInsertFunction(
948 FunctionName, TLI.getAttrList(C, {0, 1}, /*Signed=*/false),
949 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), IRB.getInt32Ty());
950 MaybeWarningVarSizeFn = M.getOrInsertFunction(
951 "__msan_maybe_warning_N", TLI.getAttrList(C, {}, /*Signed=*/false),
952 IRB.getVoidTy(), PtrTy, IRB.getInt64Ty(), IRB.getInt32Ty());
953 FunctionName = "__msan_maybe_store_origin_" + itostr(AccessSize);
954 MaybeStoreOriginFn[AccessSizeIndex] = M.getOrInsertFunction(
955 FunctionName, TLI.getAttrList(C, {0, 2}, /*Signed=*/false),
956 IRB.getVoidTy(), IRB.getIntNTy(AccessSize * 8), PtrTy,
957 IRB.getInt32Ty());
958 }
959
960 MsanSetAllocaOriginWithDescriptionFn =
961 M.getOrInsertFunction("__msan_set_alloca_origin_with_descr",
962 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy, PtrTy);
963 MsanSetAllocaOriginNoDescriptionFn =
964 M.getOrInsertFunction("__msan_set_alloca_origin_no_descr",
965 IRB.getVoidTy(), PtrTy, IntptrTy, PtrTy);
966 MsanPoisonStackFn = M.getOrInsertFunction("__msan_poison_stack",
967 IRB.getVoidTy(), PtrTy, IntptrTy);
968}
969
970/// Insert extern declaration of runtime-provided functions and globals.
971void MemorySanitizer::initializeCallbacks(Module &M,
972 const TargetLibraryInfo &TLI) {
973 // Only do this once.
974 if (CallbacksInitialized)
975 return;
976
977 IRBuilder<> IRB(*C);
978 // Initialize callbacks that are common for kernel and userspace
979 // instrumentation.
980 MsanChainOriginFn = M.getOrInsertFunction(
981 "__msan_chain_origin",
982 TLI.getAttrList(C, {0}, /*Signed=*/false, /*Ret=*/true), IRB.getInt32Ty(),
983 IRB.getInt32Ty());
984 MsanSetOriginFn = M.getOrInsertFunction(
985 "__msan_set_origin", TLI.getAttrList(C, {2}, /*Signed=*/false),
986 IRB.getVoidTy(), PtrTy, IntptrTy, IRB.getInt32Ty());
987 MemmoveFn =
988 M.getOrInsertFunction("__msan_memmove", PtrTy, PtrTy, PtrTy, IntptrTy);
989 MemcpyFn =
990 M.getOrInsertFunction("__msan_memcpy", PtrTy, PtrTy, PtrTy, IntptrTy);
991 MemsetFn = M.getOrInsertFunction("__msan_memset",
992 TLI.getAttrList(C, {1}, /*Signed=*/true),
993 PtrTy, PtrTy, IRB.getInt32Ty(), IntptrTy);
994
995 MsanInstrumentAsmStoreFn = M.getOrInsertFunction(
996 "__msan_instrument_asm_store", IRB.getVoidTy(), PtrTy, IntptrTy);
997
998 if (CompileKernel) {
999 createKernelApi(M, TLI);
1000 } else {
1001 createUserspaceApi(M, TLI);
1002 }
1003 CallbacksInitialized = true;
1004}
1005
1006FunctionCallee MemorySanitizer::getKmsanShadowOriginAccessFn(bool isStore,
1007 int size) {
1008 FunctionCallee *Fns =
1009 isStore ? MsanMetadataPtrForStore_1_8 : MsanMetadataPtrForLoad_1_8;
1010 switch (size) {
1011 case 1:
1012 return Fns[0];
1013 case 2:
1014 return Fns[1];
1015 case 4:
1016 return Fns[2];
1017 case 8:
1018 return Fns[3];
1019 default:
1020 return nullptr;
1021 }
1022}
1023
1024/// Module-level initialization.
1025///
1026/// inserts a call to __msan_init to the module's constructor list.
1027void MemorySanitizer::initializeModule(Module &M) {
1028 auto &DL = M.getDataLayout();
1029
1030 TargetTriple = M.getTargetTriple();
1031
1032 bool ShadowPassed = ClShadowBase.getNumOccurrences() > 0;
1033 bool OriginPassed = ClOriginBase.getNumOccurrences() > 0;
1034 // Check the overrides first
1035 if (ShadowPassed || OriginPassed) {
1036 CustomMapParams.AndMask = ClAndMask;
1037 CustomMapParams.XorMask = ClXorMask;
1038 CustomMapParams.ShadowBase = ClShadowBase;
1039 CustomMapParams.OriginBase = ClOriginBase;
1040 MapParams = &CustomMapParams;
1041 } else {
1042 switch (TargetTriple.getOS()) {
1043 case Triple::FreeBSD:
1044 switch (TargetTriple.getArch()) {
1045 case Triple::aarch64:
1046 MapParams = FreeBSD_ARM_MemoryMapParams.bits64;
1047 break;
1048 case Triple::x86_64:
1049 MapParams = FreeBSD_X86_MemoryMapParams.bits64;
1050 break;
1051 case Triple::x86:
1052 MapParams = FreeBSD_X86_MemoryMapParams.bits32;
1053 break;
1054 default:
1055 report_fatal_error("unsupported architecture");
1056 }
1057 break;
1058 case Triple::NetBSD:
1059 switch (TargetTriple.getArch()) {
1060 case Triple::x86_64:
1061 MapParams = NetBSD_X86_MemoryMapParams.bits64;
1062 break;
1063 default:
1064 report_fatal_error("unsupported architecture");
1065 }
1066 break;
1067 case Triple::Linux:
1068 switch (TargetTriple.getArch()) {
1069 case Triple::x86_64:
1070 MapParams = Linux_X86_MemoryMapParams.bits64;
1071 break;
1072 case Triple::x86:
1073 MapParams = Linux_X86_MemoryMapParams.bits32;
1074 break;
1075 case Triple::mips64:
1076 case Triple::mips64el:
1077 MapParams = Linux_MIPS_MemoryMapParams.bits64;
1078 break;
1079 case Triple::ppc64:
1080 case Triple::ppc64le:
1081 MapParams = Linux_PowerPC_MemoryMapParams.bits64;
1082 break;
1083 case Triple::systemz:
1084 MapParams = Linux_S390_MemoryMapParams.bits64;
1085 break;
1086 case Triple::aarch64:
1087 case Triple::aarch64_be:
1088 MapParams = Linux_ARM_MemoryMapParams.bits64;
1089 break;
1091 MapParams = Linux_LoongArch_MemoryMapParams.bits64;
1092 break;
1093 default:
1094 report_fatal_error("unsupported architecture");
1095 }
1096 break;
1097 default:
1098 report_fatal_error("unsupported operating system");
1099 }
1100 }
1101
1102 C = &(M.getContext());
1103 IRBuilder<> IRB(*C);
1104 IntptrTy = IRB.getIntPtrTy(DL);
1105 OriginTy = IRB.getInt32Ty();
1106 PtrTy = IRB.getPtrTy();
1107
1108 ColdCallWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1109 OriginStoreWeights = MDBuilder(*C).createUnlikelyBranchWeights();
1110
1111 if (!CompileKernel) {
1112 if (TrackOrigins)
1113 M.getOrInsertGlobal("__msan_track_origins", IRB.getInt32Ty(), [&] {
1114 return new GlobalVariable(
1115 M, IRB.getInt32Ty(), true, GlobalValue::WeakODRLinkage,
1116 IRB.getInt32(TrackOrigins), "__msan_track_origins");
1117 });
1118
1119 if (Recover)
1120 M.getOrInsertGlobal("__msan_keep_going", IRB.getInt32Ty(), [&] {
1121 return new GlobalVariable(M, IRB.getInt32Ty(), true,
1122 GlobalValue::WeakODRLinkage,
1123 IRB.getInt32(Recover), "__msan_keep_going");
1124 });
1125 }
1126}
1127
1128namespace {
1129
1130/// A helper class that handles instrumentation of VarArg
1131/// functions on a particular platform.
1132///
1133/// Implementations are expected to insert the instrumentation
1134/// necessary to propagate argument shadow through VarArg function
1135/// calls. Visit* methods are called during an InstVisitor pass over
1136/// the function, and should avoid creating new basic blocks. A new
1137/// instance of this class is created for each instrumented function.
1138struct VarArgHelper {
1139 virtual ~VarArgHelper() = default;
1140
1141 /// Visit a CallBase.
1142 virtual void visitCallBase(CallBase &CB, IRBuilder<> &IRB) = 0;
1143
1144 /// Visit a va_start call.
1145 virtual void visitVAStartInst(VAStartInst &I) = 0;
1146
1147 /// Visit a va_copy call.
1148 virtual void visitVACopyInst(VACopyInst &I) = 0;
1149
1150 /// Finalize function instrumentation.
1151 ///
1152 /// This method is called after visiting all interesting (see above)
1153 /// instructions in a function.
1154 virtual void finalizeInstrumentation() = 0;
1155};
1156
1157struct MemorySanitizerVisitor;
1158
1159} // end anonymous namespace
1160
1161static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
1162 MemorySanitizerVisitor &Visitor);
1163
1164static unsigned TypeSizeToSizeIndex(TypeSize TS) {
1165 if (TS.isScalable())
1166 // Scalable types unconditionally take slowpaths.
1167 return kNumberOfAccessSizes;
1168 unsigned TypeSizeFixed = TS.getFixedValue();
1169 if (TypeSizeFixed <= 8)
1170 return 0;
1171 return Log2_32_Ceil((TypeSizeFixed + 7) / 8);
1172}
1173
1174namespace {
1175
1176/// Helper class to attach debug information of the given instruction onto new
1177/// instructions inserted after.
1178class NextNodeIRBuilder : public IRBuilder<> {
1179public:
1180 explicit NextNodeIRBuilder(Instruction *IP) : IRBuilder<>(IP->getNextNode()) {
1181 SetCurrentDebugLocation(IP->getDebugLoc());
1182 }
1183};
1184
1185/// This class does all the work for a given function. Store and Load
1186/// instructions store and load corresponding shadow and origin
1187/// values. Most instructions propagate shadow from arguments to their
1188/// return values. Certain instructions (most importantly, BranchInst)
1189/// test their argument shadow and print reports (with a runtime call) if it's
1190/// non-zero.
1191struct MemorySanitizerVisitor : public InstVisitor<MemorySanitizerVisitor> {
1192 Function &F;
1193 MemorySanitizer &MS;
1194 SmallVector<PHINode *, 16> ShadowPHINodes, OriginPHINodes;
1195 ValueMap<Value *, Value *> ShadowMap, OriginMap;
1196 std::unique_ptr<VarArgHelper> VAHelper;
1197 const TargetLibraryInfo *TLI;
1198 Instruction *FnPrologueEnd;
1199 SmallVector<Instruction *, 16> Instructions;
1200
1201 // The following flags disable parts of MSan instrumentation based on
1202 // exclusion list contents and command-line options.
1203 bool InsertChecks;
1204 bool PropagateShadow;
1205 bool PoisonStack;
1206 bool PoisonUndef;
1207 bool PoisonUndefVectors;
1208
1209 struct ShadowOriginAndInsertPoint {
1210 Value *Shadow;
1211 Value *Origin;
1212 Instruction *OrigIns;
1213
1214 ShadowOriginAndInsertPoint(Value *S, Value *O, Instruction *I)
1215 : Shadow(S), Origin(O), OrigIns(I) {}
1216 };
1218 DenseMap<const DILocation *, int> LazyWarningDebugLocationCount;
1219 SmallSetVector<AllocaInst *, 16> AllocaSet;
1222 int64_t SplittableBlocksCount = 0;
1223
1224 MemorySanitizerVisitor(Function &F, MemorySanitizer &MS,
1225 const TargetLibraryInfo &TLI)
1226 : F(F), MS(MS), VAHelper(CreateVarArgHelper(F, MS, *this)), TLI(&TLI) {
1227 bool SanitizeFunction =
1228 F.hasFnAttribute(Attribute::SanitizeMemory) && !ClDisableChecks;
1229 InsertChecks = SanitizeFunction;
1230 PropagateShadow = SanitizeFunction;
1231 PoisonStack = SanitizeFunction && ClPoisonStack;
1232 PoisonUndef = SanitizeFunction && ClPoisonUndef;
1233 PoisonUndefVectors = SanitizeFunction && ClPoisonUndefVectors;
1234
1235 // In the presence of unreachable blocks, we may see Phi nodes with
1236 // incoming nodes from such blocks. Since InstVisitor skips unreachable
1237 // blocks, such nodes will not have any shadow value associated with them.
1238 // It's easier to remove unreachable blocks than deal with missing shadow.
1240
1241 MS.initializeCallbacks(*F.getParent(), TLI);
1242 FnPrologueEnd =
1243 IRBuilder<>(&F.getEntryBlock(), F.getEntryBlock().getFirstNonPHIIt())
1244 .CreateIntrinsic(Intrinsic::donothing, {});
1245
1246 if (MS.CompileKernel) {
1247 IRBuilder<> IRB(FnPrologueEnd);
1248 insertKmsanPrologue(IRB);
1249 }
1250
1251 LLVM_DEBUG(if (!InsertChecks) dbgs()
1252 << "MemorySanitizer is not inserting checks into '"
1253 << F.getName() << "'\n");
1254 }
1255
1256 bool instrumentWithCalls(Value *V) {
1257 // Constants likely will be eliminated by follow-up passes.
1258 if (isa<Constant>(V))
1259 return false;
1260 ++SplittableBlocksCount;
1262 SplittableBlocksCount > ClInstrumentationWithCallThreshold;
1263 }
1264
1265 bool isInPrologue(Instruction &I) {
1266 return I.getParent() == FnPrologueEnd->getParent() &&
1267 (&I == FnPrologueEnd || I.comesBefore(FnPrologueEnd));
1268 }
1269
1270 // Creates a new origin and records the stack trace. In general we can call
1271 // this function for any origin manipulation we like. However it will cost
1272 // runtime resources. So use this wisely only if it can provide additional
1273 // information helpful to a user.
1274 Value *updateOrigin(Value *V, IRBuilder<> &IRB) {
1275 if (MS.TrackOrigins <= 1)
1276 return V;
1277 return IRB.CreateCall(MS.MsanChainOriginFn, V);
1278 }
1279
1280 Value *originToIntptr(IRBuilder<> &IRB, Value *Origin) {
1281 const DataLayout &DL = F.getDataLayout();
1282 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1283 if (IntptrSize == kOriginSize)
1284 return Origin;
1285 assert(IntptrSize == kOriginSize * 2);
1286 Origin = IRB.CreateIntCast(Origin, MS.IntptrTy, /* isSigned */ false);
1287 return IRB.CreateOr(Origin, IRB.CreateShl(Origin, kOriginSize * 8));
1288 }
1289
1290 /// Fill memory range with the given origin value.
1291 void paintOrigin(IRBuilder<> &IRB, Value *Origin, Value *OriginPtr,
1292 TypeSize TS, Align Alignment) {
1293 const DataLayout &DL = F.getDataLayout();
1294 const Align IntptrAlignment = DL.getABITypeAlign(MS.IntptrTy);
1295 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
1296 assert(IntptrAlignment >= kMinOriginAlignment);
1297 assert(IntptrSize >= kOriginSize);
1298
1299 // Note: The loop based formation works for fixed length vectors too,
1300 // however we prefer to unroll and specialize alignment below.
1301 if (TS.isScalable()) {
1302 Value *Size = IRB.CreateTypeSize(MS.IntptrTy, TS);
1303 Value *RoundUp =
1304 IRB.CreateAdd(Size, ConstantInt::get(MS.IntptrTy, kOriginSize - 1));
1305 Value *End =
1306 IRB.CreateUDiv(RoundUp, ConstantInt::get(MS.IntptrTy, kOriginSize));
1307 auto [InsertPt, Index] =
1309 IRB.SetInsertPoint(InsertPt);
1310
1311 Value *GEP = IRB.CreateGEP(MS.OriginTy, OriginPtr, Index);
1313 return;
1314 }
1315
1316 unsigned Size = TS.getFixedValue();
1317
1318 unsigned Ofs = 0;
1319 Align CurrentAlignment = Alignment;
1320 if (Alignment >= IntptrAlignment && IntptrSize > kOriginSize) {
1321 Value *IntptrOrigin = originToIntptr(IRB, Origin);
1322 Value *IntptrOriginPtr = IRB.CreatePointerCast(OriginPtr, MS.PtrTy);
1323 for (unsigned i = 0; i < Size / IntptrSize; ++i) {
1324 Value *Ptr = i ? IRB.CreateConstGEP1_32(MS.IntptrTy, IntptrOriginPtr, i)
1325 : IntptrOriginPtr;
1326 IRB.CreateAlignedStore(IntptrOrigin, Ptr, CurrentAlignment);
1327 Ofs += IntptrSize / kOriginSize;
1328 CurrentAlignment = IntptrAlignment;
1329 }
1330 }
1331
1332 for (unsigned i = Ofs; i < (Size + kOriginSize - 1) / kOriginSize; ++i) {
1333 Value *GEP =
1334 i ? IRB.CreateConstGEP1_32(MS.OriginTy, OriginPtr, i) : OriginPtr;
1335 IRB.CreateAlignedStore(Origin, GEP, CurrentAlignment);
1336 CurrentAlignment = kMinOriginAlignment;
1337 }
1338 }
1339
1340 void storeOrigin(IRBuilder<> &IRB, Value *Addr, Value *Shadow, Value *Origin,
1341 Value *OriginPtr, Align Alignment) {
1342 const DataLayout &DL = F.getDataLayout();
1343 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1344 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
1345 // ZExt cannot convert between vector and scalar
1346 Value *ConvertedShadow = convertShadowToScalar(Shadow, IRB);
1347 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1348 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1349 // Origin is not needed: value is initialized or const shadow is
1350 // ignored.
1351 return;
1352 }
1353 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1354 // Copy origin as the value is definitely uninitialized.
1355 paintOrigin(IRB, updateOrigin(Origin, IRB), OriginPtr, StoreSize,
1356 OriginAlignment);
1357 return;
1358 }
1359 // Fallback to runtime check, which still can be optimized out later.
1360 }
1361
1362 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1363 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1364 if (instrumentWithCalls(ConvertedShadow) &&
1365 SizeIndex < kNumberOfAccessSizes && !MS.CompileKernel) {
1366 FunctionCallee Fn = MS.MaybeStoreOriginFn[SizeIndex];
1367 Value *ConvertedShadow2 =
1368 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1369 CallBase *CB = IRB.CreateCall(Fn, {ConvertedShadow2, Addr, Origin});
1370 CB->addParamAttr(0, Attribute::ZExt);
1371 CB->addParamAttr(2, Attribute::ZExt);
1372 } else {
1373 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1375 Cmp, &*IRB.GetInsertPoint(), false, MS.OriginStoreWeights);
1376 IRBuilder<> IRBNew(CheckTerm);
1377 paintOrigin(IRBNew, updateOrigin(Origin, IRBNew), OriginPtr, StoreSize,
1378 OriginAlignment);
1379 }
1380 }
1381
1382 void materializeStores() {
1383 for (StoreInst *SI : StoreList) {
1384 IRBuilder<> IRB(SI);
1385 Value *Val = SI->getValueOperand();
1386 Value *Addr = SI->getPointerOperand();
1387 Value *Shadow = SI->isAtomic() ? getCleanShadow(Val) : getShadow(Val);
1388 Value *ShadowPtr, *OriginPtr;
1389 Type *ShadowTy = Shadow->getType();
1390 const Align Alignment = SI->getAlign();
1391 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
1392 std::tie(ShadowPtr, OriginPtr) =
1393 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ true);
1394
1395 [[maybe_unused]] StoreInst *NewSI =
1396 IRB.CreateAlignedStore(Shadow, ShadowPtr, Alignment);
1397 LLVM_DEBUG(dbgs() << " STORE: " << *NewSI << "\n");
1398
1399 if (SI->isAtomic())
1400 SI->setOrdering(addReleaseOrdering(SI->getOrdering()));
1401
1402 if (MS.TrackOrigins && !SI->isAtomic())
1403 storeOrigin(IRB, Addr, Shadow, getOrigin(Val), OriginPtr,
1404 OriginAlignment);
1405 }
1406 }
1407
1408 // Returns true if Debug Location corresponds to multiple warnings.
1409 bool shouldDisambiguateWarningLocation(const DebugLoc &DebugLoc) {
1410 if (MS.TrackOrigins < 2)
1411 return false;
1412
1413 if (LazyWarningDebugLocationCount.empty())
1414 for (const auto &I : InstrumentationList)
1415 ++LazyWarningDebugLocationCount[I.OrigIns->getDebugLoc()];
1416
1417 return LazyWarningDebugLocationCount[DebugLoc] >= ClDisambiguateWarning;
1418 }
1419
1420 /// Helper function to insert a warning at IRB's current insert point.
1421 void insertWarningFn(IRBuilder<> &IRB, Value *Origin) {
1422 if (!Origin)
1423 Origin = (Value *)IRB.getInt32(0);
1424 assert(Origin->getType()->isIntegerTy());
1425
1426 if (shouldDisambiguateWarningLocation(IRB.getCurrentDebugLocation())) {
1427 // Try to create additional origin with debug info of the last origin
1428 // instruction. It may provide additional information to the user.
1429 if (Instruction *OI = dyn_cast_or_null<Instruction>(Origin)) {
1430 assert(MS.TrackOrigins);
1431 auto NewDebugLoc = OI->getDebugLoc();
1432 // Origin update with missing or the same debug location provides no
1433 // additional value.
1434 if (NewDebugLoc && NewDebugLoc != IRB.getCurrentDebugLocation()) {
1435 // Insert update just before the check, so we call runtime only just
1436 // before the report.
1437 IRBuilder<> IRBOrigin(&*IRB.GetInsertPoint());
1438 IRBOrigin.SetCurrentDebugLocation(NewDebugLoc);
1439 Origin = updateOrigin(Origin, IRBOrigin);
1440 }
1441 }
1442 }
1443
1444 if (MS.CompileKernel || MS.TrackOrigins)
1445 IRB.CreateCall(MS.WarningFn, Origin)->setCannotMerge();
1446 else
1447 IRB.CreateCall(MS.WarningFn)->setCannotMerge();
1448 // FIXME: Insert UnreachableInst if !MS.Recover?
1449 // This may invalidate some of the following checks and needs to be done
1450 // at the very end.
1451 }
1452
1453 void materializeOneCheck(IRBuilder<> &IRB, Value *ConvertedShadow,
1454 Value *Origin) {
1455 const DataLayout &DL = F.getDataLayout();
1456 TypeSize TypeSizeInBits = DL.getTypeSizeInBits(ConvertedShadow->getType());
1457 unsigned SizeIndex = TypeSizeToSizeIndex(TypeSizeInBits);
1458 if (instrumentWithCalls(ConvertedShadow) && !MS.CompileKernel) {
1459 // ZExt cannot convert between vector and scalar
1460 ConvertedShadow = convertShadowToScalar(ConvertedShadow, IRB);
1461 Value *ConvertedShadow2 =
1462 IRB.CreateZExt(ConvertedShadow, IRB.getIntNTy(8 * (1 << SizeIndex)));
1463
1464 if (SizeIndex < kNumberOfAccessSizes) {
1465 FunctionCallee Fn = MS.MaybeWarningFn[SizeIndex];
1466 CallBase *CB = IRB.CreateCall(
1467 Fn,
1468 {ConvertedShadow2,
1469 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1470 CB->addParamAttr(0, Attribute::ZExt);
1471 CB->addParamAttr(1, Attribute::ZExt);
1472 } else {
1473 FunctionCallee Fn = MS.MaybeWarningVarSizeFn;
1474 Value *ShadowAlloca = IRB.CreateAlloca(ConvertedShadow2->getType(), 0u);
1475 IRB.CreateStore(ConvertedShadow2, ShadowAlloca);
1476 unsigned ShadowSize = DL.getTypeAllocSize(ConvertedShadow2->getType());
1477 CallBase *CB = IRB.CreateCall(
1478 Fn,
1479 {ShadowAlloca, ConstantInt::get(IRB.getInt64Ty(), ShadowSize),
1480 MS.TrackOrigins && Origin ? Origin : (Value *)IRB.getInt32(0)});
1481 CB->addParamAttr(1, Attribute::ZExt);
1482 CB->addParamAttr(2, Attribute::ZExt);
1483 }
1484 } else {
1485 Value *Cmp = convertToBool(ConvertedShadow, IRB, "_mscmp");
1487 Cmp, &*IRB.GetInsertPoint(),
1488 /* Unreachable */ !MS.Recover, MS.ColdCallWeights);
1489
1490 IRB.SetInsertPoint(CheckTerm);
1491 insertWarningFn(IRB, Origin);
1492 LLVM_DEBUG(dbgs() << " CHECK: " << *Cmp << "\n");
1493 }
1494 }
1495
1496 void materializeInstructionChecks(
1497 ArrayRef<ShadowOriginAndInsertPoint> InstructionChecks) {
1498 const DataLayout &DL = F.getDataLayout();
1499 // Disable combining in some cases. TrackOrigins checks each shadow to pick
1500 // correct origin.
1501 bool Combine = !MS.TrackOrigins;
1502 Instruction *Instruction = InstructionChecks.front().OrigIns;
1503 Value *Shadow = nullptr;
1504 for (const auto &ShadowData : InstructionChecks) {
1505 assert(ShadowData.OrigIns == Instruction);
1506 IRBuilder<> IRB(Instruction);
1507
1508 Value *ConvertedShadow = ShadowData.Shadow;
1509
1510 if (auto *ConstantShadow = dyn_cast<Constant>(ConvertedShadow)) {
1511 if (!ClCheckConstantShadow || ConstantShadow->isZeroValue()) {
1512 // Skip, value is initialized or const shadow is ignored.
1513 continue;
1514 }
1515 if (llvm::isKnownNonZero(ConvertedShadow, DL)) {
1516 // Report as the value is definitely uninitialized.
1517 insertWarningFn(IRB, ShadowData.Origin);
1518 if (!MS.Recover)
1519 return; // Always fail and stop here, not need to check the rest.
1520 // Skip entire instruction,
1521 continue;
1522 }
1523 // Fallback to runtime check, which still can be optimized out later.
1524 }
1525
1526 if (!Combine) {
1527 materializeOneCheck(IRB, ConvertedShadow, ShadowData.Origin);
1528 continue;
1529 }
1530
1531 if (!Shadow) {
1532 Shadow = ConvertedShadow;
1533 continue;
1534 }
1535
1536 Shadow = convertToBool(Shadow, IRB, "_mscmp");
1537 ConvertedShadow = convertToBool(ConvertedShadow, IRB, "_mscmp");
1538 Shadow = IRB.CreateOr(Shadow, ConvertedShadow, "_msor");
1539 }
1540
1541 if (Shadow) {
1542 assert(Combine);
1543 IRBuilder<> IRB(Instruction);
1544 materializeOneCheck(IRB, Shadow, nullptr);
1545 }
1546 }
1547
1548 static bool isAArch64SVCount(Type *Ty) {
1549 if (TargetExtType *TTy = dyn_cast<TargetExtType>(Ty))
1550 return TTy->getName() == "aarch64.svcount";
1551 return false;
1552 }
1553
1554 // This is intended to match the "AArch64 Predicate-as-Counter Type" (aka
1555 // 'target("aarch64.svcount")', but not e.g., <vscale x 4 x i32>.
1556 static bool isScalableNonVectorType(Type *Ty) {
1557 if (!isAArch64SVCount(Ty))
1558 LLVM_DEBUG(dbgs() << "isScalableNonVectorType: Unexpected type " << *Ty
1559 << "\n");
1560
1561 return Ty->isScalableTy() && !isa<VectorType>(Ty);
1562 }
1563
1564 void materializeChecks() {
1565#ifndef NDEBUG
1566 // For assert below.
1567 SmallPtrSet<Instruction *, 16> Done;
1568#endif
1569
1570 for (auto I = InstrumentationList.begin();
1571 I != InstrumentationList.end();) {
1572 auto OrigIns = I->OrigIns;
1573 // Checks are grouped by the original instruction. We call all
1574 // `insertShadowCheck` for an instruction at once.
1575 assert(Done.insert(OrigIns).second);
1576 auto J = std::find_if(I + 1, InstrumentationList.end(),
1577 [OrigIns](const ShadowOriginAndInsertPoint &R) {
1578 return OrigIns != R.OrigIns;
1579 });
1580 // Process all checks of instruction at once.
1581 materializeInstructionChecks(ArrayRef<ShadowOriginAndInsertPoint>(I, J));
1582 I = J;
1583 }
1584
1585 LLVM_DEBUG(dbgs() << "DONE:\n" << F);
1586 }
1587
1588 // Returns the last instruction in the new prologue
1589 void insertKmsanPrologue(IRBuilder<> &IRB) {
1590 Value *ContextState = IRB.CreateCall(MS.MsanGetContextStateFn, {});
1591 Constant *Zero = IRB.getInt32(0);
1592 MS.ParamTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1593 {Zero, IRB.getInt32(0)}, "param_shadow");
1594 MS.RetvalTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1595 {Zero, IRB.getInt32(1)}, "retval_shadow");
1596 MS.VAArgTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1597 {Zero, IRB.getInt32(2)}, "va_arg_shadow");
1598 MS.VAArgOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1599 {Zero, IRB.getInt32(3)}, "va_arg_origin");
1600 MS.VAArgOverflowSizeTLS =
1601 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1602 {Zero, IRB.getInt32(4)}, "va_arg_overflow_size");
1603 MS.ParamOriginTLS = IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1604 {Zero, IRB.getInt32(5)}, "param_origin");
1605 MS.RetvalOriginTLS =
1606 IRB.CreateGEP(MS.MsanContextStateTy, ContextState,
1607 {Zero, IRB.getInt32(6)}, "retval_origin");
1608 if (MS.TargetTriple.getArch() == Triple::systemz)
1609 MS.MsanMetadataAlloca = IRB.CreateAlloca(MS.MsanMetadata, 0u);
1610 }
1611
1612 /// Add MemorySanitizer instrumentation to a function.
1613 bool runOnFunction() {
1614 // Iterate all BBs in depth-first order and create shadow instructions
1615 // for all instructions (where applicable).
1616 // For PHI nodes we create dummy shadow PHIs which will be finalized later.
1617 for (BasicBlock *BB : depth_first(FnPrologueEnd->getParent()))
1618 visit(*BB);
1619
1620 // `visit` above only collects instructions. Process them after iterating
1621 // CFG to avoid requirement on CFG transformations.
1622 for (Instruction *I : Instructions)
1624
1625 // Finalize PHI nodes.
1626 for (PHINode *PN : ShadowPHINodes) {
1627 PHINode *PNS = cast<PHINode>(getShadow(PN));
1628 PHINode *PNO = MS.TrackOrigins ? cast<PHINode>(getOrigin(PN)) : nullptr;
1629 size_t NumValues = PN->getNumIncomingValues();
1630 for (size_t v = 0; v < NumValues; v++) {
1631 PNS->addIncoming(getShadow(PN, v), PN->getIncomingBlock(v));
1632 if (PNO)
1633 PNO->addIncoming(getOrigin(PN, v), PN->getIncomingBlock(v));
1634 }
1635 }
1636
1637 VAHelper->finalizeInstrumentation();
1638
1639 // Poison llvm.lifetime.start intrinsics, if we haven't fallen back to
1640 // instrumenting only allocas.
1642 for (auto Item : LifetimeStartList) {
1643 instrumentAlloca(*Item.second, Item.first);
1644 AllocaSet.remove(Item.second);
1645 }
1646 }
1647 // Poison the allocas for which we didn't instrument the corresponding
1648 // lifetime intrinsics.
1649 for (AllocaInst *AI : AllocaSet)
1650 instrumentAlloca(*AI);
1651
1652 // Insert shadow value checks.
1653 materializeChecks();
1654
1655 // Delayed instrumentation of StoreInst.
1656 // This may not add new address checks.
1657 materializeStores();
1658
1659 return true;
1660 }
1661
1662 /// Compute the shadow type that corresponds to a given Value.
1663 Type *getShadowTy(Value *V) { return getShadowTy(V->getType()); }
1664
1665 /// Compute the shadow type that corresponds to a given Type.
1666 Type *getShadowTy(Type *OrigTy) {
1667 if (!OrigTy->isSized()) {
1668 return nullptr;
1669 }
1670 // For integer type, shadow is the same as the original type.
1671 // This may return weird-sized types like i1.
1672 if (IntegerType *IT = dyn_cast<IntegerType>(OrigTy))
1673 return IT;
1674 const DataLayout &DL = F.getDataLayout();
1675 if (VectorType *VT = dyn_cast<VectorType>(OrigTy)) {
1676 uint32_t EltSize = DL.getTypeSizeInBits(VT->getElementType());
1677 return VectorType::get(IntegerType::get(*MS.C, EltSize),
1678 VT->getElementCount());
1679 }
1680 if (ArrayType *AT = dyn_cast<ArrayType>(OrigTy)) {
1681 return ArrayType::get(getShadowTy(AT->getElementType()),
1682 AT->getNumElements());
1683 }
1684 if (StructType *ST = dyn_cast<StructType>(OrigTy)) {
1686 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
1687 Elements.push_back(getShadowTy(ST->getElementType(i)));
1688 StructType *Res = StructType::get(*MS.C, Elements, ST->isPacked());
1689 LLVM_DEBUG(dbgs() << "getShadowTy: " << *ST << " ===> " << *Res << "\n");
1690 return Res;
1691 }
1692 if (isScalableNonVectorType(OrigTy)) {
1693 LLVM_DEBUG(dbgs() << "getShadowTy: Scalable non-vector type: " << *OrigTy
1694 << "\n");
1695 return OrigTy;
1696 }
1697
1698 uint32_t TypeSize = DL.getTypeSizeInBits(OrigTy);
1699 return IntegerType::get(*MS.C, TypeSize);
1700 }
1701
1702 /// Extract combined shadow of struct elements as a bool
1703 Value *collapseStructShadow(StructType *Struct, Value *Shadow,
1704 IRBuilder<> &IRB) {
1705 Value *FalseVal = IRB.getIntN(/* width */ 1, /* value */ 0);
1706 Value *Aggregator = FalseVal;
1707
1708 for (unsigned Idx = 0; Idx < Struct->getNumElements(); Idx++) {
1709 // Combine by ORing together each element's bool shadow
1710 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1711 Value *ShadowBool = convertToBool(ShadowItem, IRB);
1712
1713 if (Aggregator != FalseVal)
1714 Aggregator = IRB.CreateOr(Aggregator, ShadowBool);
1715 else
1716 Aggregator = ShadowBool;
1717 }
1718
1719 return Aggregator;
1720 }
1721
1722 // Extract combined shadow of array elements
1723 Value *collapseArrayShadow(ArrayType *Array, Value *Shadow,
1724 IRBuilder<> &IRB) {
1725 if (!Array->getNumElements())
1726 return IRB.getIntN(/* width */ 1, /* value */ 0);
1727
1728 Value *FirstItem = IRB.CreateExtractValue(Shadow, 0);
1729 Value *Aggregator = convertShadowToScalar(FirstItem, IRB);
1730
1731 for (unsigned Idx = 1; Idx < Array->getNumElements(); Idx++) {
1732 Value *ShadowItem = IRB.CreateExtractValue(Shadow, Idx);
1733 Value *ShadowInner = convertShadowToScalar(ShadowItem, IRB);
1734 Aggregator = IRB.CreateOr(Aggregator, ShadowInner);
1735 }
1736 return Aggregator;
1737 }
1738
1739 /// Convert a shadow value to it's flattened variant. The resulting
1740 /// shadow may not necessarily have the same bit width as the input
1741 /// value, but it will always be comparable to zero.
1742 Value *convertShadowToScalar(Value *V, IRBuilder<> &IRB) {
1743 if (StructType *Struct = dyn_cast<StructType>(V->getType()))
1744 return collapseStructShadow(Struct, V, IRB);
1745 if (ArrayType *Array = dyn_cast<ArrayType>(V->getType()))
1746 return collapseArrayShadow(Array, V, IRB);
1747 if (isa<VectorType>(V->getType())) {
1748 if (isa<ScalableVectorType>(V->getType()))
1749 return convertShadowToScalar(IRB.CreateOrReduce(V), IRB);
1750 unsigned BitWidth =
1751 V->getType()->getPrimitiveSizeInBits().getFixedValue();
1752 return IRB.CreateBitCast(V, IntegerType::get(*MS.C, BitWidth));
1753 }
1754 return V;
1755 }
1756
1757 // Convert a scalar value to an i1 by comparing with 0
1758 Value *convertToBool(Value *V, IRBuilder<> &IRB, const Twine &name = "") {
1759 Type *VTy = V->getType();
1760 if (!VTy->isIntegerTy())
1761 return convertToBool(convertShadowToScalar(V, IRB), IRB, name);
1762 if (VTy->getIntegerBitWidth() == 1)
1763 // Just converting a bool to a bool, so do nothing.
1764 return V;
1765 return IRB.CreateICmpNE(V, ConstantInt::get(VTy, 0), name);
1766 }
1767
1768 Type *ptrToIntPtrType(Type *PtrTy) const {
1769 if (VectorType *VectTy = dyn_cast<VectorType>(PtrTy)) {
1770 return VectorType::get(ptrToIntPtrType(VectTy->getElementType()),
1771 VectTy->getElementCount());
1772 }
1773 assert(PtrTy->isIntOrPtrTy());
1774 return MS.IntptrTy;
1775 }
1776
1777 Type *getPtrToShadowPtrType(Type *IntPtrTy, Type *ShadowTy) const {
1778 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1779 return VectorType::get(
1780 getPtrToShadowPtrType(VectTy->getElementType(), ShadowTy),
1781 VectTy->getElementCount());
1782 }
1783 assert(IntPtrTy == MS.IntptrTy);
1784 return MS.PtrTy;
1785 }
1786
1787 Constant *constToIntPtr(Type *IntPtrTy, uint64_t C) const {
1788 if (VectorType *VectTy = dyn_cast<VectorType>(IntPtrTy)) {
1790 VectTy->getElementCount(),
1791 constToIntPtr(VectTy->getElementType(), C));
1792 }
1793 assert(IntPtrTy == MS.IntptrTy);
1794 return ConstantInt::get(MS.IntptrTy, C);
1795 }
1796
1797 /// Returns the integer shadow offset that corresponds to a given
1798 /// application address, whereby:
1799 ///
1800 /// Offset = (Addr & ~AndMask) ^ XorMask
1801 /// Shadow = ShadowBase + Offset
1802 /// Origin = (OriginBase + Offset) & ~Alignment
1803 ///
1804 /// Note: for efficiency, many shadow mappings only require use the XorMask
1805 /// and OriginBase; the AndMask and ShadowBase are often zero.
1806 Value *getShadowPtrOffset(Value *Addr, IRBuilder<> &IRB) {
1807 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1808 Value *OffsetLong = IRB.CreatePointerCast(Addr, IntptrTy);
1809
1810 if (uint64_t AndMask = MS.MapParams->AndMask)
1811 OffsetLong = IRB.CreateAnd(OffsetLong, constToIntPtr(IntptrTy, ~AndMask));
1812
1813 if (uint64_t XorMask = MS.MapParams->XorMask)
1814 OffsetLong = IRB.CreateXor(OffsetLong, constToIntPtr(IntptrTy, XorMask));
1815 return OffsetLong;
1816 }
1817
1818 /// Compute the shadow and origin addresses corresponding to a given
1819 /// application address.
1820 ///
1821 /// Shadow = ShadowBase + Offset
1822 /// Origin = (OriginBase + Offset) & ~3ULL
1823 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1824 /// a single pointee.
1825 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1826 std::pair<Value *, Value *>
1827 getShadowOriginPtrUserspace(Value *Addr, IRBuilder<> &IRB, Type *ShadowTy,
1828 MaybeAlign Alignment) {
1829 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1830 if (!VectTy) {
1831 assert(Addr->getType()->isPointerTy());
1832 } else {
1833 assert(VectTy->getElementType()->isPointerTy());
1834 }
1835 Type *IntptrTy = ptrToIntPtrType(Addr->getType());
1836 Value *ShadowOffset = getShadowPtrOffset(Addr, IRB);
1837 Value *ShadowLong = ShadowOffset;
1838 if (uint64_t ShadowBase = MS.MapParams->ShadowBase) {
1839 ShadowLong =
1840 IRB.CreateAdd(ShadowLong, constToIntPtr(IntptrTy, ShadowBase));
1841 }
1842 Value *ShadowPtr = IRB.CreateIntToPtr(
1843 ShadowLong, getPtrToShadowPtrType(IntptrTy, ShadowTy));
1844
1845 Value *OriginPtr = nullptr;
1846 if (MS.TrackOrigins) {
1847 Value *OriginLong = ShadowOffset;
1848 uint64_t OriginBase = MS.MapParams->OriginBase;
1849 if (OriginBase != 0)
1850 OriginLong =
1851 IRB.CreateAdd(OriginLong, constToIntPtr(IntptrTy, OriginBase));
1852 if (!Alignment || *Alignment < kMinOriginAlignment) {
1853 uint64_t Mask = kMinOriginAlignment.value() - 1;
1854 OriginLong = IRB.CreateAnd(OriginLong, constToIntPtr(IntptrTy, ~Mask));
1855 }
1856 OriginPtr = IRB.CreateIntToPtr(
1857 OriginLong, getPtrToShadowPtrType(IntptrTy, MS.OriginTy));
1858 }
1859 return std::make_pair(ShadowPtr, OriginPtr);
1860 }
1861
1862 template <typename... ArgsTy>
1863 Value *createMetadataCall(IRBuilder<> &IRB, FunctionCallee Callee,
1864 ArgsTy... Args) {
1865 if (MS.TargetTriple.getArch() == Triple::systemz) {
1866 IRB.CreateCall(Callee,
1867 {MS.MsanMetadataAlloca, std::forward<ArgsTy>(Args)...});
1868 return IRB.CreateLoad(MS.MsanMetadata, MS.MsanMetadataAlloca);
1869 }
1870
1871 return IRB.CreateCall(Callee, {std::forward<ArgsTy>(Args)...});
1872 }
1873
1874 std::pair<Value *, Value *> getShadowOriginPtrKernelNoVec(Value *Addr,
1875 IRBuilder<> &IRB,
1876 Type *ShadowTy,
1877 bool isStore) {
1878 Value *ShadowOriginPtrs;
1879 const DataLayout &DL = F.getDataLayout();
1880 TypeSize Size = DL.getTypeStoreSize(ShadowTy);
1881
1882 FunctionCallee Getter = MS.getKmsanShadowOriginAccessFn(isStore, Size);
1883 Value *AddrCast = IRB.CreatePointerCast(Addr, MS.PtrTy);
1884 if (Getter) {
1885 ShadowOriginPtrs = createMetadataCall(IRB, Getter, AddrCast);
1886 } else {
1887 Value *SizeVal = ConstantInt::get(MS.IntptrTy, Size);
1888 ShadowOriginPtrs = createMetadataCall(
1889 IRB,
1890 isStore ? MS.MsanMetadataPtrForStoreN : MS.MsanMetadataPtrForLoadN,
1891 AddrCast, SizeVal);
1892 }
1893 Value *ShadowPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 0);
1894 ShadowPtr = IRB.CreatePointerCast(ShadowPtr, MS.PtrTy);
1895 Value *OriginPtr = IRB.CreateExtractValue(ShadowOriginPtrs, 1);
1896
1897 return std::make_pair(ShadowPtr, OriginPtr);
1898 }
1899
1900 /// Addr can be a ptr or <N x ptr>. In both cases ShadowTy the shadow type of
1901 /// a single pointee.
1902 /// Returns <shadow_ptr, origin_ptr> or <<N x shadow_ptr>, <N x origin_ptr>>.
1903 std::pair<Value *, Value *> getShadowOriginPtrKernel(Value *Addr,
1904 IRBuilder<> &IRB,
1905 Type *ShadowTy,
1906 bool isStore) {
1907 VectorType *VectTy = dyn_cast<VectorType>(Addr->getType());
1908 if (!VectTy) {
1909 assert(Addr->getType()->isPointerTy());
1910 return getShadowOriginPtrKernelNoVec(Addr, IRB, ShadowTy, isStore);
1911 }
1912
1913 // TODO: Support callbacs with vectors of addresses.
1914 unsigned NumElements = cast<FixedVectorType>(VectTy)->getNumElements();
1915 Value *ShadowPtrs = ConstantInt::getNullValue(
1916 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1917 Value *OriginPtrs = nullptr;
1918 if (MS.TrackOrigins)
1919 OriginPtrs = ConstantInt::getNullValue(
1920 FixedVectorType::get(IRB.getPtrTy(), NumElements));
1921 for (unsigned i = 0; i < NumElements; ++i) {
1922 Value *OneAddr =
1923 IRB.CreateExtractElement(Addr, ConstantInt::get(IRB.getInt32Ty(), i));
1924 auto [ShadowPtr, OriginPtr] =
1925 getShadowOriginPtrKernelNoVec(OneAddr, IRB, ShadowTy, isStore);
1926
1927 ShadowPtrs = IRB.CreateInsertElement(
1928 ShadowPtrs, ShadowPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1929 if (MS.TrackOrigins)
1930 OriginPtrs = IRB.CreateInsertElement(
1931 OriginPtrs, OriginPtr, ConstantInt::get(IRB.getInt32Ty(), i));
1932 }
1933 return {ShadowPtrs, OriginPtrs};
1934 }
1935
1936 std::pair<Value *, Value *> getShadowOriginPtr(Value *Addr, IRBuilder<> &IRB,
1937 Type *ShadowTy,
1938 MaybeAlign Alignment,
1939 bool isStore) {
1940 if (MS.CompileKernel)
1941 return getShadowOriginPtrKernel(Addr, IRB, ShadowTy, isStore);
1942 return getShadowOriginPtrUserspace(Addr, IRB, ShadowTy, Alignment);
1943 }
1944
1945 /// Compute the shadow address for a given function argument.
1946 ///
1947 /// Shadow = ParamTLS+ArgOffset.
1948 Value *getShadowPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1949 return IRB.CreatePtrAdd(MS.ParamTLS,
1950 ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg");
1951 }
1952
1953 /// Compute the origin address for a given function argument.
1954 Value *getOriginPtrForArgument(IRBuilder<> &IRB, int ArgOffset) {
1955 if (!MS.TrackOrigins)
1956 return nullptr;
1957 return IRB.CreatePtrAdd(MS.ParamOriginTLS,
1958 ConstantInt::get(MS.IntptrTy, ArgOffset),
1959 "_msarg_o");
1960 }
1961
1962 /// Compute the shadow address for a retval.
1963 Value *getShadowPtrForRetval(IRBuilder<> &IRB) {
1964 return IRB.CreatePointerCast(MS.RetvalTLS, IRB.getPtrTy(0), "_msret");
1965 }
1966
1967 /// Compute the origin address for a retval.
1968 Value *getOriginPtrForRetval() {
1969 // We keep a single origin for the entire retval. Might be too optimistic.
1970 return MS.RetvalOriginTLS;
1971 }
1972
1973 /// Set SV to be the shadow value for V.
1974 void setShadow(Value *V, Value *SV) {
1975 assert(!ShadowMap.count(V) && "Values may only have one shadow");
1976 ShadowMap[V] = PropagateShadow ? SV : getCleanShadow(V);
1977 }
1978
1979 /// Set Origin to be the origin value for V.
1980 void setOrigin(Value *V, Value *Origin) {
1981 if (!MS.TrackOrigins)
1982 return;
1983 assert(!OriginMap.count(V) && "Values may only have one origin");
1984 LLVM_DEBUG(dbgs() << "ORIGIN: " << *V << " ==> " << *Origin << "\n");
1985 OriginMap[V] = Origin;
1986 }
1987
1988 Constant *getCleanShadow(Type *OrigTy) {
1989 Type *ShadowTy = getShadowTy(OrigTy);
1990 if (!ShadowTy)
1991 return nullptr;
1992 return Constant::getNullValue(ShadowTy);
1993 }
1994
1995 /// Create a clean shadow value for a given value.
1996 ///
1997 /// Clean shadow (all zeroes) means all bits of the value are defined
1998 /// (initialized).
1999 Constant *getCleanShadow(Value *V) { return getCleanShadow(V->getType()); }
2000
2001 /// Create a dirty shadow of a given shadow type.
2002 Constant *getPoisonedShadow(Type *ShadowTy) {
2003 assert(ShadowTy);
2004 if (isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy))
2005 return Constant::getAllOnesValue(ShadowTy);
2006 if (ArrayType *AT = dyn_cast<ArrayType>(ShadowTy)) {
2007 SmallVector<Constant *, 4> Vals(AT->getNumElements(),
2008 getPoisonedShadow(AT->getElementType()));
2009 return ConstantArray::get(AT, Vals);
2010 }
2011 if (StructType *ST = dyn_cast<StructType>(ShadowTy)) {
2013 for (unsigned i = 0, n = ST->getNumElements(); i < n; i++)
2014 Vals.push_back(getPoisonedShadow(ST->getElementType(i)));
2015 return ConstantStruct::get(ST, Vals);
2016 }
2017 llvm_unreachable("Unexpected shadow type");
2018 }
2019
2020 /// Create a dirty shadow for a given value.
2021 Constant *getPoisonedShadow(Value *V) {
2022 Type *ShadowTy = getShadowTy(V);
2023 if (!ShadowTy)
2024 return nullptr;
2025 return getPoisonedShadow(ShadowTy);
2026 }
2027
2028 /// Create a clean (zero) origin.
2029 Value *getCleanOrigin() { return Constant::getNullValue(MS.OriginTy); }
2030
2031 /// Get the shadow value for a given Value.
2032 ///
2033 /// This function either returns the value set earlier with setShadow,
2034 /// or extracts if from ParamTLS (for function arguments).
2035 Value *getShadow(Value *V) {
2036 if (Instruction *I = dyn_cast<Instruction>(V)) {
2037 if (!PropagateShadow || I->getMetadata(LLVMContext::MD_nosanitize))
2038 return getCleanShadow(V);
2039 // For instructions the shadow is already stored in the map.
2040 Value *Shadow = ShadowMap[V];
2041 if (!Shadow) {
2042 LLVM_DEBUG(dbgs() << "No shadow: " << *V << "\n" << *(I->getParent()));
2043 assert(Shadow && "No shadow for a value");
2044 }
2045 return Shadow;
2046 }
2047 // Handle fully undefined values
2048 // (partially undefined constant vectors are handled later)
2049 if ([[maybe_unused]] UndefValue *U = dyn_cast<UndefValue>(V)) {
2050 Value *AllOnes = (PropagateShadow && PoisonUndef) ? getPoisonedShadow(V)
2051 : getCleanShadow(V);
2052 LLVM_DEBUG(dbgs() << "Undef: " << *U << " ==> " << *AllOnes << "\n");
2053 return AllOnes;
2054 }
2055 if (Argument *A = dyn_cast<Argument>(V)) {
2056 // For arguments we compute the shadow on demand and store it in the map.
2057 Value *&ShadowPtr = ShadowMap[V];
2058 if (ShadowPtr)
2059 return ShadowPtr;
2060 Function *F = A->getParent();
2061 IRBuilder<> EntryIRB(FnPrologueEnd);
2062 unsigned ArgOffset = 0;
2063 const DataLayout &DL = F->getDataLayout();
2064 for (auto &FArg : F->args()) {
2065 if (!FArg.getType()->isSized() || FArg.getType()->isScalableTy()) {
2066 LLVM_DEBUG(dbgs() << (FArg.getType()->isScalableTy()
2067 ? "vscale not fully supported\n"
2068 : "Arg is not sized\n"));
2069 if (A == &FArg) {
2070 ShadowPtr = getCleanShadow(V);
2071 setOrigin(A, getCleanOrigin());
2072 break;
2073 }
2074 continue;
2075 }
2076
2077 unsigned Size = FArg.hasByValAttr()
2078 ? DL.getTypeAllocSize(FArg.getParamByValType())
2079 : DL.getTypeAllocSize(FArg.getType());
2080
2081 if (A == &FArg) {
2082 bool Overflow = ArgOffset + Size > kParamTLSSize;
2083 if (FArg.hasByValAttr()) {
2084 // ByVal pointer itself has clean shadow. We copy the actual
2085 // argument shadow to the underlying memory.
2086 // Figure out maximal valid memcpy alignment.
2087 const Align ArgAlign = DL.getValueOrABITypeAlignment(
2088 FArg.getParamAlign(), FArg.getParamByValType());
2089 Value *CpShadowPtr, *CpOriginPtr;
2090 std::tie(CpShadowPtr, CpOriginPtr) =
2091 getShadowOriginPtr(V, EntryIRB, EntryIRB.getInt8Ty(), ArgAlign,
2092 /*isStore*/ true);
2093 if (!PropagateShadow || Overflow) {
2094 // ParamTLS overflow.
2095 EntryIRB.CreateMemSet(
2096 CpShadowPtr, Constant::getNullValue(EntryIRB.getInt8Ty()),
2097 Size, ArgAlign);
2098 } else {
2099 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2100 const Align CopyAlign = std::min(ArgAlign, kShadowTLSAlignment);
2101 [[maybe_unused]] Value *Cpy = EntryIRB.CreateMemCpy(
2102 CpShadowPtr, CopyAlign, Base, CopyAlign, Size);
2103 LLVM_DEBUG(dbgs() << " ByValCpy: " << *Cpy << "\n");
2104
2105 if (MS.TrackOrigins) {
2106 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2107 // FIXME: OriginSize should be:
2108 // alignTo(V % kMinOriginAlignment + Size, kMinOriginAlignment)
2109 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
2110 EntryIRB.CreateMemCpy(
2111 CpOriginPtr,
2112 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginPtr,
2113 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
2114 OriginSize);
2115 }
2116 }
2117 }
2118
2119 if (!PropagateShadow || Overflow || FArg.hasByValAttr() ||
2120 (MS.EagerChecks && FArg.hasAttribute(Attribute::NoUndef))) {
2121 ShadowPtr = getCleanShadow(V);
2122 setOrigin(A, getCleanOrigin());
2123 } else {
2124 // Shadow over TLS
2125 Value *Base = getShadowPtrForArgument(EntryIRB, ArgOffset);
2126 ShadowPtr = EntryIRB.CreateAlignedLoad(getShadowTy(&FArg), Base,
2128 if (MS.TrackOrigins) {
2129 Value *OriginPtr = getOriginPtrForArgument(EntryIRB, ArgOffset);
2130 setOrigin(A, EntryIRB.CreateLoad(MS.OriginTy, OriginPtr));
2131 }
2132 }
2134 << " ARG: " << FArg << " ==> " << *ShadowPtr << "\n");
2135 break;
2136 }
2137
2138 ArgOffset += alignTo(Size, kShadowTLSAlignment);
2139 }
2140 assert(ShadowPtr && "Could not find shadow for an argument");
2141 return ShadowPtr;
2142 }
2143
2144 // Check for partially-undefined constant vectors
2145 // TODO: scalable vectors (this is hard because we do not have IRBuilder)
2146 if (isa<FixedVectorType>(V->getType()) && isa<Constant>(V) &&
2147 cast<Constant>(V)->containsUndefOrPoisonElement() && PropagateShadow &&
2148 PoisonUndefVectors) {
2149 unsigned NumElems = cast<FixedVectorType>(V->getType())->getNumElements();
2150 SmallVector<Constant *, 32> ShadowVector(NumElems);
2151 for (unsigned i = 0; i != NumElems; ++i) {
2152 Constant *Elem = cast<Constant>(V)->getAggregateElement(i);
2153 ShadowVector[i] = isa<UndefValue>(Elem) ? getPoisonedShadow(Elem)
2154 : getCleanShadow(Elem);
2155 }
2156
2157 Value *ShadowConstant = ConstantVector::get(ShadowVector);
2158 LLVM_DEBUG(dbgs() << "Partial undef constant vector: " << *V << " ==> "
2159 << *ShadowConstant << "\n");
2160
2161 return ShadowConstant;
2162 }
2163
2164 // TODO: partially-undefined constant arrays, structures, and nested types
2165
2166 // For everything else the shadow is zero.
2167 return getCleanShadow(V);
2168 }
2169
2170 /// Get the shadow for i-th argument of the instruction I.
2171 Value *getShadow(Instruction *I, int i) {
2172 return getShadow(I->getOperand(i));
2173 }
2174
2175 /// Get the origin for a value.
2176 Value *getOrigin(Value *V) {
2177 if (!MS.TrackOrigins)
2178 return nullptr;
2179 if (!PropagateShadow || isa<Constant>(V) || isa<InlineAsm>(V))
2180 return getCleanOrigin();
2182 "Unexpected value type in getOrigin()");
2183 if (Instruction *I = dyn_cast<Instruction>(V)) {
2184 if (I->getMetadata(LLVMContext::MD_nosanitize))
2185 return getCleanOrigin();
2186 }
2187 Value *Origin = OriginMap[V];
2188 assert(Origin && "Missing origin");
2189 return Origin;
2190 }
2191
2192 /// Get the origin for i-th argument of the instruction I.
2193 Value *getOrigin(Instruction *I, int i) {
2194 return getOrigin(I->getOperand(i));
2195 }
2196
2197 /// Remember the place where a shadow check should be inserted.
2198 ///
2199 /// This location will be later instrumented with a check that will print a
2200 /// UMR warning in runtime if the shadow value is not 0.
2201 void insertCheckShadow(Value *Shadow, Value *Origin, Instruction *OrigIns) {
2202 assert(Shadow);
2203 if (!InsertChecks)
2204 return;
2205
2206 if (!DebugCounter::shouldExecute(DebugInsertCheck)) {
2207 LLVM_DEBUG(dbgs() << "Skipping check of " << *Shadow << " before "
2208 << *OrigIns << "\n");
2209 return;
2210 }
2211
2212 Type *ShadowTy = Shadow->getType();
2213 if (isScalableNonVectorType(ShadowTy)) {
2214 LLVM_DEBUG(dbgs() << "Skipping check of scalable non-vector " << *Shadow
2215 << " before " << *OrigIns << "\n");
2216 return;
2217 }
2218#ifndef NDEBUG
2219 assert((isa<IntegerType>(ShadowTy) || isa<VectorType>(ShadowTy) ||
2220 isa<StructType>(ShadowTy) || isa<ArrayType>(ShadowTy)) &&
2221 "Can only insert checks for integer, vector, and aggregate shadow "
2222 "types");
2223#endif
2224 InstrumentationList.push_back(
2225 ShadowOriginAndInsertPoint(Shadow, Origin, OrigIns));
2226 }
2227
2228 /// Get shadow for value, and remember the place where a shadow check should
2229 /// be inserted.
2230 ///
2231 /// This location will be later instrumented with a check that will print a
2232 /// UMR warning in runtime if the value is not fully defined.
2233 void insertCheckShadowOf(Value *Val, Instruction *OrigIns) {
2234 assert(Val);
2235 Value *Shadow, *Origin;
2237 Shadow = getShadow(Val);
2238 if (!Shadow)
2239 return;
2240 Origin = getOrigin(Val);
2241 } else {
2242 Shadow = dyn_cast_or_null<Instruction>(getShadow(Val));
2243 if (!Shadow)
2244 return;
2245 Origin = dyn_cast_or_null<Instruction>(getOrigin(Val));
2246 }
2247 insertCheckShadow(Shadow, Origin, OrigIns);
2248 }
2249
2251 switch (a) {
2252 case AtomicOrdering::NotAtomic:
2253 return AtomicOrdering::NotAtomic;
2254 case AtomicOrdering::Unordered:
2255 case AtomicOrdering::Monotonic:
2256 case AtomicOrdering::Release:
2257 return AtomicOrdering::Release;
2258 case AtomicOrdering::Acquire:
2259 case AtomicOrdering::AcquireRelease:
2260 return AtomicOrdering::AcquireRelease;
2261 case AtomicOrdering::SequentiallyConsistent:
2262 return AtomicOrdering::SequentiallyConsistent;
2263 }
2264 llvm_unreachable("Unknown ordering");
2265 }
2266
2267 Value *makeAddReleaseOrderingTable(IRBuilder<> &IRB) {
2268 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2269 uint32_t OrderingTable[NumOrderings] = {};
2270
2271 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2272 OrderingTable[(int)AtomicOrderingCABI::release] =
2273 (int)AtomicOrderingCABI::release;
2274 OrderingTable[(int)AtomicOrderingCABI::consume] =
2275 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2276 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2277 (int)AtomicOrderingCABI::acq_rel;
2278 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2279 (int)AtomicOrderingCABI::seq_cst;
2280
2281 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2282 }
2283
2285 switch (a) {
2286 case AtomicOrdering::NotAtomic:
2287 return AtomicOrdering::NotAtomic;
2288 case AtomicOrdering::Unordered:
2289 case AtomicOrdering::Monotonic:
2290 case AtomicOrdering::Acquire:
2291 return AtomicOrdering::Acquire;
2292 case AtomicOrdering::Release:
2293 case AtomicOrdering::AcquireRelease:
2294 return AtomicOrdering::AcquireRelease;
2295 case AtomicOrdering::SequentiallyConsistent:
2296 return AtomicOrdering::SequentiallyConsistent;
2297 }
2298 llvm_unreachable("Unknown ordering");
2299 }
2300
2301 Value *makeAddAcquireOrderingTable(IRBuilder<> &IRB) {
2302 constexpr int NumOrderings = (int)AtomicOrderingCABI::seq_cst + 1;
2303 uint32_t OrderingTable[NumOrderings] = {};
2304
2305 OrderingTable[(int)AtomicOrderingCABI::relaxed] =
2306 OrderingTable[(int)AtomicOrderingCABI::acquire] =
2307 OrderingTable[(int)AtomicOrderingCABI::consume] =
2308 (int)AtomicOrderingCABI::acquire;
2309 OrderingTable[(int)AtomicOrderingCABI::release] =
2310 OrderingTable[(int)AtomicOrderingCABI::acq_rel] =
2311 (int)AtomicOrderingCABI::acq_rel;
2312 OrderingTable[(int)AtomicOrderingCABI::seq_cst] =
2313 (int)AtomicOrderingCABI::seq_cst;
2314
2315 return ConstantDataVector::get(IRB.getContext(), OrderingTable);
2316 }
2317
2318 // ------------------- Visitors.
2319 using InstVisitor<MemorySanitizerVisitor>::visit;
2320 void visit(Instruction &I) {
2321 if (I.getMetadata(LLVMContext::MD_nosanitize))
2322 return;
2323 // Don't want to visit if we're in the prologue
2324 if (isInPrologue(I))
2325 return;
2326 if (!DebugCounter::shouldExecute(DebugInstrumentInstruction)) {
2327 LLVM_DEBUG(dbgs() << "Skipping instruction: " << I << "\n");
2328 // We still need to set the shadow and origin to clean values.
2329 setShadow(&I, getCleanShadow(&I));
2330 setOrigin(&I, getCleanOrigin());
2331 return;
2332 }
2333
2334 Instructions.push_back(&I);
2335 }
2336
2337 /// Instrument LoadInst
2338 ///
2339 /// Loads the corresponding shadow and (optionally) origin.
2340 /// Optionally, checks that the load address is fully defined.
2341 void visitLoadInst(LoadInst &I) {
2342 assert(I.getType()->isSized() && "Load type must have size");
2343 assert(!I.getMetadata(LLVMContext::MD_nosanitize));
2344 NextNodeIRBuilder IRB(&I);
2345 Type *ShadowTy = getShadowTy(&I);
2346 Value *Addr = I.getPointerOperand();
2347 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
2348 const Align Alignment = I.getAlign();
2349 if (PropagateShadow) {
2350 std::tie(ShadowPtr, OriginPtr) =
2351 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
2352 setShadow(&I,
2353 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
2354 } else {
2355 setShadow(&I, getCleanShadow(&I));
2356 }
2357
2359 insertCheckShadowOf(I.getPointerOperand(), &I);
2360
2361 if (I.isAtomic())
2362 I.setOrdering(addAcquireOrdering(I.getOrdering()));
2363
2364 if (MS.TrackOrigins) {
2365 if (PropagateShadow) {
2366 const Align OriginAlignment = std::max(kMinOriginAlignment, Alignment);
2367 setOrigin(
2368 &I, IRB.CreateAlignedLoad(MS.OriginTy, OriginPtr, OriginAlignment));
2369 } else {
2370 setOrigin(&I, getCleanOrigin());
2371 }
2372 }
2373 }
2374
2375 /// Instrument StoreInst
2376 ///
2377 /// Stores the corresponding shadow and (optionally) origin.
2378 /// Optionally, checks that the store address is fully defined.
2379 void visitStoreInst(StoreInst &I) {
2380 StoreList.push_back(&I);
2382 insertCheckShadowOf(I.getPointerOperand(), &I);
2383 }
2384
2385 void handleCASOrRMW(Instruction &I) {
2387
2388 IRBuilder<> IRB(&I);
2389 Value *Addr = I.getOperand(0);
2390 Value *Val = I.getOperand(1);
2391 Value *ShadowPtr = getShadowOriginPtr(Addr, IRB, getShadowTy(Val), Align(1),
2392 /*isStore*/ true)
2393 .first;
2394
2396 insertCheckShadowOf(Addr, &I);
2397
2398 // Only test the conditional argument of cmpxchg instruction.
2399 // The other argument can potentially be uninitialized, but we can not
2400 // detect this situation reliably without possible false positives.
2402 insertCheckShadowOf(Val, &I);
2403
2404 IRB.CreateStore(getCleanShadow(Val), ShadowPtr);
2405
2406 setShadow(&I, getCleanShadow(&I));
2407 setOrigin(&I, getCleanOrigin());
2408 }
2409
2410 void visitAtomicRMWInst(AtomicRMWInst &I) {
2411 handleCASOrRMW(I);
2412 I.setOrdering(addReleaseOrdering(I.getOrdering()));
2413 }
2414
2415 void visitAtomicCmpXchgInst(AtomicCmpXchgInst &I) {
2416 handleCASOrRMW(I);
2417 I.setSuccessOrdering(addReleaseOrdering(I.getSuccessOrdering()));
2418 }
2419
2420 // Vector manipulation.
2421 void visitExtractElementInst(ExtractElementInst &I) {
2422 insertCheckShadowOf(I.getOperand(1), &I);
2423 IRBuilder<> IRB(&I);
2424 setShadow(&I, IRB.CreateExtractElement(getShadow(&I, 0), I.getOperand(1),
2425 "_msprop"));
2426 setOrigin(&I, getOrigin(&I, 0));
2427 }
2428
2429 void visitInsertElementInst(InsertElementInst &I) {
2430 insertCheckShadowOf(I.getOperand(2), &I);
2431 IRBuilder<> IRB(&I);
2432 auto *Shadow0 = getShadow(&I, 0);
2433 auto *Shadow1 = getShadow(&I, 1);
2434 setShadow(&I, IRB.CreateInsertElement(Shadow0, Shadow1, I.getOperand(2),
2435 "_msprop"));
2436 setOriginForNaryOp(I);
2437 }
2438
2439 void visitShuffleVectorInst(ShuffleVectorInst &I) {
2440 IRBuilder<> IRB(&I);
2441 auto *Shadow0 = getShadow(&I, 0);
2442 auto *Shadow1 = getShadow(&I, 1);
2443 setShadow(&I, IRB.CreateShuffleVector(Shadow0, Shadow1, I.getShuffleMask(),
2444 "_msprop"));
2445 setOriginForNaryOp(I);
2446 }
2447
2448 // Casts.
2449 void visitSExtInst(SExtInst &I) {
2450 IRBuilder<> IRB(&I);
2451 setShadow(&I, IRB.CreateSExt(getShadow(&I, 0), I.getType(), "_msprop"));
2452 setOrigin(&I, getOrigin(&I, 0));
2453 }
2454
2455 void visitZExtInst(ZExtInst &I) {
2456 IRBuilder<> IRB(&I);
2457 setShadow(&I, IRB.CreateZExt(getShadow(&I, 0), I.getType(), "_msprop"));
2458 setOrigin(&I, getOrigin(&I, 0));
2459 }
2460
2461 void visitTruncInst(TruncInst &I) {
2462 IRBuilder<> IRB(&I);
2463 setShadow(&I, IRB.CreateTrunc(getShadow(&I, 0), I.getType(), "_msprop"));
2464 setOrigin(&I, getOrigin(&I, 0));
2465 }
2466
2467 void visitBitCastInst(BitCastInst &I) {
2468 // Special case: if this is the bitcast (there is exactly 1 allowed) between
2469 // a musttail call and a ret, don't instrument. New instructions are not
2470 // allowed after a musttail call.
2471 if (auto *CI = dyn_cast<CallInst>(I.getOperand(0)))
2472 if (CI->isMustTailCall())
2473 return;
2474 IRBuilder<> IRB(&I);
2475 setShadow(&I, IRB.CreateBitCast(getShadow(&I, 0), getShadowTy(&I)));
2476 setOrigin(&I, getOrigin(&I, 0));
2477 }
2478
2479 void visitPtrToIntInst(PtrToIntInst &I) {
2480 IRBuilder<> IRB(&I);
2481 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2482 "_msprop_ptrtoint"));
2483 setOrigin(&I, getOrigin(&I, 0));
2484 }
2485
2486 void visitIntToPtrInst(IntToPtrInst &I) {
2487 IRBuilder<> IRB(&I);
2488 setShadow(&I, IRB.CreateIntCast(getShadow(&I, 0), getShadowTy(&I), false,
2489 "_msprop_inttoptr"));
2490 setOrigin(&I, getOrigin(&I, 0));
2491 }
2492
2493 void visitFPToSIInst(CastInst &I) { handleShadowOr(I); }
2494 void visitFPToUIInst(CastInst &I) { handleShadowOr(I); }
2495 void visitSIToFPInst(CastInst &I) { handleShadowOr(I); }
2496 void visitUIToFPInst(CastInst &I) { handleShadowOr(I); }
2497 void visitFPExtInst(CastInst &I) { handleShadowOr(I); }
2498 void visitFPTruncInst(CastInst &I) { handleShadowOr(I); }
2499
2500 /// Propagate shadow for bitwise AND.
2501 ///
2502 /// This code is exact, i.e. if, for example, a bit in the left argument
2503 /// is defined and 0, then neither the value not definedness of the
2504 /// corresponding bit in B don't affect the resulting shadow.
2505 void visitAnd(BinaryOperator &I) {
2506 IRBuilder<> IRB(&I);
2507 // "And" of 0 and a poisoned value results in unpoisoned value.
2508 // 1&1 => 1; 0&1 => 0; p&1 => p;
2509 // 1&0 => 0; 0&0 => 0; p&0 => 0;
2510 // 1&p => p; 0&p => 0; p&p => p;
2511 // S = (S1 & S2) | (V1 & S2) | (S1 & V2)
2512 Value *S1 = getShadow(&I, 0);
2513 Value *S2 = getShadow(&I, 1);
2514 Value *V1 = I.getOperand(0);
2515 Value *V2 = I.getOperand(1);
2516 if (V1->getType() != S1->getType()) {
2517 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2518 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2519 }
2520 Value *S1S2 = IRB.CreateAnd(S1, S2);
2521 Value *V1S2 = IRB.CreateAnd(V1, S2);
2522 Value *S1V2 = IRB.CreateAnd(S1, V2);
2523 setShadow(&I, IRB.CreateOr({S1S2, V1S2, S1V2}));
2524 setOriginForNaryOp(I);
2525 }
2526
2527 void visitOr(BinaryOperator &I) {
2528 IRBuilder<> IRB(&I);
2529 // "Or" of 1 and a poisoned value results in unpoisoned value:
2530 // 1|1 => 1; 0|1 => 1; p|1 => 1;
2531 // 1|0 => 1; 0|0 => 0; p|0 => p;
2532 // 1|p => 1; 0|p => p; p|p => p;
2533 //
2534 // S = (S1 & S2) | (~V1 & S2) | (S1 & ~V2)
2535 //
2536 // If the "disjoint OR" property is violated, the result is poison, and
2537 // hence the entire shadow is uninitialized:
2538 // S = S | SignExt(V1 & V2 != 0)
2539 Value *S1 = getShadow(&I, 0);
2540 Value *S2 = getShadow(&I, 1);
2541 Value *V1 = I.getOperand(0);
2542 Value *V2 = I.getOperand(1);
2543 if (V1->getType() != S1->getType()) {
2544 V1 = IRB.CreateIntCast(V1, S1->getType(), false);
2545 V2 = IRB.CreateIntCast(V2, S2->getType(), false);
2546 }
2547
2548 Value *NotV1 = IRB.CreateNot(V1);
2549 Value *NotV2 = IRB.CreateNot(V2);
2550
2551 Value *S1S2 = IRB.CreateAnd(S1, S2);
2552 Value *S2NotV1 = IRB.CreateAnd(NotV1, S2);
2553 Value *S1NotV2 = IRB.CreateAnd(S1, NotV2);
2554
2555 Value *S = IRB.CreateOr({S1S2, S2NotV1, S1NotV2});
2556
2557 if (ClPreciseDisjointOr && cast<PossiblyDisjointInst>(&I)->isDisjoint()) {
2558 Value *V1V2 = IRB.CreateAnd(V1, V2);
2559 Value *DisjointOrShadow = IRB.CreateSExt(
2560 IRB.CreateICmpNE(V1V2, getCleanShadow(V1V2)), V1V2->getType());
2561 S = IRB.CreateOr(S, DisjointOrShadow, "_ms_disjoint");
2562 }
2563
2564 setShadow(&I, S);
2565 setOriginForNaryOp(I);
2566 }
2567
2568 /// Default propagation of shadow and/or origin.
2569 ///
2570 /// This class implements the general case of shadow propagation, used in all
2571 /// cases where we don't know and/or don't care about what the operation
2572 /// actually does. It converts all input shadow values to a common type
2573 /// (extending or truncating as necessary), and bitwise OR's them.
2574 ///
2575 /// This is much cheaper than inserting checks (i.e. requiring inputs to be
2576 /// fully initialized), and less prone to false positives.
2577 ///
2578 /// This class also implements the general case of origin propagation. For a
2579 /// Nary operation, result origin is set to the origin of an argument that is
2580 /// not entirely initialized. If there is more than one such arguments, the
2581 /// rightmost of them is picked. It does not matter which one is picked if all
2582 /// arguments are initialized.
2583 template <bool CombineShadow> class Combiner {
2584 Value *Shadow = nullptr;
2585 Value *Origin = nullptr;
2586 IRBuilder<> &IRB;
2587 MemorySanitizerVisitor *MSV;
2588
2589 public:
2590 Combiner(MemorySanitizerVisitor *MSV, IRBuilder<> &IRB)
2591 : IRB(IRB), MSV(MSV) {}
2592
2593 /// Add a pair of shadow and origin values to the mix.
2594 Combiner &Add(Value *OpShadow, Value *OpOrigin) {
2595 if (CombineShadow) {
2596 assert(OpShadow);
2597 if (!Shadow)
2598 Shadow = OpShadow;
2599 else {
2600 OpShadow = MSV->CreateShadowCast(IRB, OpShadow, Shadow->getType());
2601 Shadow = IRB.CreateOr(Shadow, OpShadow, "_msprop");
2602 }
2603 }
2604
2605 if (MSV->MS.TrackOrigins) {
2606 assert(OpOrigin);
2607 if (!Origin) {
2608 Origin = OpOrigin;
2609 } else {
2610 Constant *ConstOrigin = dyn_cast<Constant>(OpOrigin);
2611 // No point in adding something that might result in 0 origin value.
2612 if (!ConstOrigin || !ConstOrigin->isNullValue()) {
2613 Value *Cond = MSV->convertToBool(OpShadow, IRB);
2614 Origin = IRB.CreateSelect(Cond, OpOrigin, Origin);
2615 }
2616 }
2617 }
2618 return *this;
2619 }
2620
2621 /// Add an application value to the mix.
2622 Combiner &Add(Value *V) {
2623 Value *OpShadow = MSV->getShadow(V);
2624 Value *OpOrigin = MSV->MS.TrackOrigins ? MSV->getOrigin(V) : nullptr;
2625 return Add(OpShadow, OpOrigin);
2626 }
2627
2628 /// Set the current combined values as the given instruction's shadow
2629 /// and origin.
2630 void Done(Instruction *I) {
2631 if (CombineShadow) {
2632 assert(Shadow);
2633 Shadow = MSV->CreateShadowCast(IRB, Shadow, MSV->getShadowTy(I));
2634 MSV->setShadow(I, Shadow);
2635 }
2636 if (MSV->MS.TrackOrigins) {
2637 assert(Origin);
2638 MSV->setOrigin(I, Origin);
2639 }
2640 }
2641
2642 /// Store the current combined value at the specified origin
2643 /// location.
2644 void DoneAndStoreOrigin(TypeSize TS, Value *OriginPtr) {
2645 if (MSV->MS.TrackOrigins) {
2646 assert(Origin);
2647 MSV->paintOrigin(IRB, Origin, OriginPtr, TS, kMinOriginAlignment);
2648 }
2649 }
2650 };
2651
2652 using ShadowAndOriginCombiner = Combiner<true>;
2653 using OriginCombiner = Combiner<false>;
2654
2655 /// Propagate origin for arbitrary operation.
2656 void setOriginForNaryOp(Instruction &I) {
2657 if (!MS.TrackOrigins)
2658 return;
2659 IRBuilder<> IRB(&I);
2660 OriginCombiner OC(this, IRB);
2661 for (Use &Op : I.operands())
2662 OC.Add(Op.get());
2663 OC.Done(&I);
2664 }
2665
2666 size_t VectorOrPrimitiveTypeSizeInBits(Type *Ty) {
2667 assert(!(Ty->isVectorTy() && Ty->getScalarType()->isPointerTy()) &&
2668 "Vector of pointers is not a valid shadow type");
2669 return Ty->isVectorTy() ? cast<FixedVectorType>(Ty)->getNumElements() *
2671 : Ty->getPrimitiveSizeInBits();
2672 }
2673
2674 /// Cast between two shadow types, extending or truncating as
2675 /// necessary.
2676 Value *CreateShadowCast(IRBuilder<> &IRB, Value *V, Type *dstTy,
2677 bool Signed = false) {
2678 Type *srcTy = V->getType();
2679 if (srcTy == dstTy)
2680 return V;
2681 size_t srcSizeInBits = VectorOrPrimitiveTypeSizeInBits(srcTy);
2682 size_t dstSizeInBits = VectorOrPrimitiveTypeSizeInBits(dstTy);
2683 if (srcSizeInBits > 1 && dstSizeInBits == 1)
2684 return IRB.CreateICmpNE(V, getCleanShadow(V));
2685
2686 if (dstTy->isIntegerTy() && srcTy->isIntegerTy())
2687 return IRB.CreateIntCast(V, dstTy, Signed);
2688 if (dstTy->isVectorTy() && srcTy->isVectorTy() &&
2689 cast<VectorType>(dstTy)->getElementCount() ==
2690 cast<VectorType>(srcTy)->getElementCount())
2691 return IRB.CreateIntCast(V, dstTy, Signed);
2692 Value *V1 = IRB.CreateBitCast(V, Type::getIntNTy(*MS.C, srcSizeInBits));
2693 Value *V2 =
2694 IRB.CreateIntCast(V1, Type::getIntNTy(*MS.C, dstSizeInBits), Signed);
2695 return IRB.CreateBitCast(V2, dstTy);
2696 // TODO: handle struct types.
2697 }
2698
2699 /// Cast an application value to the type of its own shadow.
2700 Value *CreateAppToShadowCast(IRBuilder<> &IRB, Value *V) {
2701 Type *ShadowTy = getShadowTy(V);
2702 if (V->getType() == ShadowTy)
2703 return V;
2704 if (V->getType()->isPtrOrPtrVectorTy())
2705 return IRB.CreatePtrToInt(V, ShadowTy);
2706 else
2707 return IRB.CreateBitCast(V, ShadowTy);
2708 }
2709
2710 /// Propagate shadow for arbitrary operation.
2711 void handleShadowOr(Instruction &I) {
2712 IRBuilder<> IRB(&I);
2713 ShadowAndOriginCombiner SC(this, IRB);
2714 for (Use &Op : I.operands())
2715 SC.Add(Op.get());
2716 SC.Done(&I);
2717 }
2718
2719 // Perform a bitwise OR on the horizontal pairs (or other specified grouping)
2720 // of elements.
2721 //
2722 // For example, suppose we have:
2723 // VectorA: <a0, a1, a2, a3, a4, a5>
2724 // VectorB: <b0, b1, b2, b3, b4, b5>
2725 // ReductionFactor: 3
2726 // Shards: 1
2727 // The output would be:
2728 // <a0|a1|a2, a3|a4|a5, b0|b1|b2, b3|b4|b5>
2729 //
2730 // If we have:
2731 // VectorA: <a0, a1, a2, a3, a4, a5, a6, a7>
2732 // VectorB: <b0, b1, b2, b3, b4, b5, b6, b7>
2733 // ReductionFactor: 2
2734 // Shards: 2
2735 // then a and be each have 2 "shards", resulting in the output being
2736 // interleaved:
2737 // <a0|a1, a2|a3, b0|b1, b2|b3, a4|a5, a6|a7, b4|b5, b6|b7>
2738 //
2739 // This is convenient for instrumenting horizontal add/sub.
2740 // For bitwise OR on "vertical" pairs, see maybeHandleSimpleNomemIntrinsic().
2741 Value *horizontalReduce(IntrinsicInst &I, unsigned ReductionFactor,
2742 unsigned Shards, Value *VectorA, Value *VectorB) {
2743 assert(isa<FixedVectorType>(VectorA->getType()));
2744 unsigned NumElems =
2745 cast<FixedVectorType>(VectorA->getType())->getNumElements();
2746
2747 [[maybe_unused]] unsigned TotalNumElems = NumElems;
2748 if (VectorB) {
2749 assert(VectorA->getType() == VectorB->getType());
2750 TotalNumElems *= 2;
2751 }
2752
2753 assert(NumElems % (ReductionFactor * Shards) == 0);
2754
2755 Value *Or = nullptr;
2756
2757 IRBuilder<> IRB(&I);
2758 for (unsigned i = 0; i < ReductionFactor; i++) {
2759 SmallVector<int, 16> Mask;
2760
2761 for (unsigned j = 0; j < Shards; j++) {
2762 unsigned Offset = NumElems / Shards * j;
2763
2764 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2765 Mask.push_back(Offset + X + i);
2766
2767 if (VectorB) {
2768 for (unsigned X = 0; X < NumElems / Shards; X += ReductionFactor)
2769 Mask.push_back(NumElems + Offset + X + i);
2770 }
2771 }
2772
2773 Value *Masked;
2774 if (VectorB)
2775 Masked = IRB.CreateShuffleVector(VectorA, VectorB, Mask);
2776 else
2777 Masked = IRB.CreateShuffleVector(VectorA, Mask);
2778
2779 if (Or)
2780 Or = IRB.CreateOr(Or, Masked);
2781 else
2782 Or = Masked;
2783 }
2784
2785 return Or;
2786 }
2787
2788 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2789 /// fields.
2790 ///
2791 /// e.g., <2 x i32> @llvm.aarch64.neon.saddlp.v2i32.v4i16(<4 x i16>)
2792 /// <16 x i8> @llvm.aarch64.neon.addp.v16i8(<16 x i8>, <16 x i8>)
2793 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards) {
2794 assert(I.arg_size() == 1 || I.arg_size() == 2);
2795
2796 assert(I.getType()->isVectorTy());
2797 assert(I.getArgOperand(0)->getType()->isVectorTy());
2798
2799 [[maybe_unused]] FixedVectorType *ParamType =
2800 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2801 assert((I.arg_size() != 2) ||
2802 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2803 [[maybe_unused]] FixedVectorType *ReturnType =
2804 cast<FixedVectorType>(I.getType());
2805 assert(ParamType->getNumElements() * I.arg_size() ==
2806 2 * ReturnType->getNumElements());
2807
2808 IRBuilder<> IRB(&I);
2809
2810 // Horizontal OR of shadow
2811 Value *FirstArgShadow = getShadow(&I, 0);
2812 Value *SecondArgShadow = nullptr;
2813 if (I.arg_size() == 2)
2814 SecondArgShadow = getShadow(&I, 1);
2815
2816 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
2817 FirstArgShadow, SecondArgShadow);
2818
2819 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2820
2821 setShadow(&I, OrShadow);
2822 setOriginForNaryOp(I);
2823 }
2824
2825 /// Propagate shadow for 1- or 2-vector intrinsics that combine adjacent
2826 /// fields, with the parameters reinterpreted to have elements of a specified
2827 /// width. For example:
2828 /// @llvm.x86.ssse3.phadd.w(<1 x i64> [[VAR1]], <1 x i64> [[VAR2]])
2829 /// conceptually operates on
2830 /// (<4 x i16> [[VAR1]], <4 x i16> [[VAR2]])
2831 /// and can be handled with ReinterpretElemWidth == 16.
2832 void handlePairwiseShadowOrIntrinsic(IntrinsicInst &I, unsigned Shards,
2833 int ReinterpretElemWidth) {
2834 assert(I.arg_size() == 1 || I.arg_size() == 2);
2835
2836 assert(I.getType()->isVectorTy());
2837 assert(I.getArgOperand(0)->getType()->isVectorTy());
2838
2839 FixedVectorType *ParamType =
2840 cast<FixedVectorType>(I.getArgOperand(0)->getType());
2841 assert((I.arg_size() != 2) ||
2842 (ParamType == cast<FixedVectorType>(I.getArgOperand(1)->getType())));
2843
2844 [[maybe_unused]] FixedVectorType *ReturnType =
2845 cast<FixedVectorType>(I.getType());
2846 assert(ParamType->getNumElements() * I.arg_size() ==
2847 2 * ReturnType->getNumElements());
2848
2849 IRBuilder<> IRB(&I);
2850
2851 FixedVectorType *ReinterpretShadowTy = nullptr;
2852 assert(isAligned(Align(ReinterpretElemWidth),
2853 ParamType->getPrimitiveSizeInBits()));
2854 ReinterpretShadowTy = FixedVectorType::get(
2855 IRB.getIntNTy(ReinterpretElemWidth),
2856 ParamType->getPrimitiveSizeInBits() / ReinterpretElemWidth);
2857
2858 // Horizontal OR of shadow
2859 Value *FirstArgShadow = getShadow(&I, 0);
2860 FirstArgShadow = IRB.CreateBitCast(FirstArgShadow, ReinterpretShadowTy);
2861
2862 // If we had two parameters each with an odd number of elements, the total
2863 // number of elements is even, but we have never seen this in extant
2864 // instruction sets, so we enforce that each parameter must have an even
2865 // number of elements.
2867 Align(2),
2868 cast<FixedVectorType>(FirstArgShadow->getType())->getNumElements()));
2869
2870 Value *SecondArgShadow = nullptr;
2871 if (I.arg_size() == 2) {
2872 SecondArgShadow = getShadow(&I, 1);
2873 SecondArgShadow = IRB.CreateBitCast(SecondArgShadow, ReinterpretShadowTy);
2874 }
2875
2876 Value *OrShadow = horizontalReduce(I, /*ReductionFactor=*/2, Shards,
2877 FirstArgShadow, SecondArgShadow);
2878
2879 OrShadow = CreateShadowCast(IRB, OrShadow, getShadowTy(&I));
2880
2881 setShadow(&I, OrShadow);
2882 setOriginForNaryOp(I);
2883 }
2884
2885 void visitFNeg(UnaryOperator &I) { handleShadowOr(I); }
2886
2887 // Handle multiplication by constant.
2888 //
2889 // Handle a special case of multiplication by constant that may have one or
2890 // more zeros in the lower bits. This makes corresponding number of lower bits
2891 // of the result zero as well. We model it by shifting the other operand
2892 // shadow left by the required number of bits. Effectively, we transform
2893 // (X * (A * 2**B)) to ((X << B) * A) and instrument (X << B) as (Sx << B).
2894 // We use multiplication by 2**N instead of shift to cover the case of
2895 // multiplication by 0, which may occur in some elements of a vector operand.
2896 void handleMulByConstant(BinaryOperator &I, Constant *ConstArg,
2897 Value *OtherArg) {
2898 Constant *ShadowMul;
2899 Type *Ty = ConstArg->getType();
2900 if (auto *VTy = dyn_cast<VectorType>(Ty)) {
2901 unsigned NumElements = cast<FixedVectorType>(VTy)->getNumElements();
2902 Type *EltTy = VTy->getElementType();
2904 for (unsigned Idx = 0; Idx < NumElements; ++Idx) {
2905 if (ConstantInt *Elt =
2907 const APInt &V = Elt->getValue();
2908 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2909 Elements.push_back(ConstantInt::get(EltTy, V2));
2910 } else {
2911 Elements.push_back(ConstantInt::get(EltTy, 1));
2912 }
2913 }
2914 ShadowMul = ConstantVector::get(Elements);
2915 } else {
2916 if (ConstantInt *Elt = dyn_cast<ConstantInt>(ConstArg)) {
2917 const APInt &V = Elt->getValue();
2918 APInt V2 = APInt(V.getBitWidth(), 1) << V.countr_zero();
2919 ShadowMul = ConstantInt::get(Ty, V2);
2920 } else {
2921 ShadowMul = ConstantInt::get(Ty, 1);
2922 }
2923 }
2924
2925 IRBuilder<> IRB(&I);
2926 setShadow(&I,
2927 IRB.CreateMul(getShadow(OtherArg), ShadowMul, "msprop_mul_cst"));
2928 setOrigin(&I, getOrigin(OtherArg));
2929 }
2930
2931 void visitMul(BinaryOperator &I) {
2932 Constant *constOp0 = dyn_cast<Constant>(I.getOperand(0));
2933 Constant *constOp1 = dyn_cast<Constant>(I.getOperand(1));
2934 if (constOp0 && !constOp1)
2935 handleMulByConstant(I, constOp0, I.getOperand(1));
2936 else if (constOp1 && !constOp0)
2937 handleMulByConstant(I, constOp1, I.getOperand(0));
2938 else
2939 handleShadowOr(I);
2940 }
2941
2942 void visitFAdd(BinaryOperator &I) { handleShadowOr(I); }
2943 void visitFSub(BinaryOperator &I) { handleShadowOr(I); }
2944 void visitFMul(BinaryOperator &I) { handleShadowOr(I); }
2945 void visitAdd(BinaryOperator &I) { handleShadowOr(I); }
2946 void visitSub(BinaryOperator &I) { handleShadowOr(I); }
2947 void visitXor(BinaryOperator &I) { handleShadowOr(I); }
2948
2949 void handleIntegerDiv(Instruction &I) {
2950 IRBuilder<> IRB(&I);
2951 // Strict on the second argument.
2952 insertCheckShadowOf(I.getOperand(1), &I);
2953 setShadow(&I, getShadow(&I, 0));
2954 setOrigin(&I, getOrigin(&I, 0));
2955 }
2956
2957 void visitUDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2958 void visitSDiv(BinaryOperator &I) { handleIntegerDiv(I); }
2959 void visitURem(BinaryOperator &I) { handleIntegerDiv(I); }
2960 void visitSRem(BinaryOperator &I) { handleIntegerDiv(I); }
2961
2962 // Floating point division is side-effect free. We can not require that the
2963 // divisor is fully initialized and must propagate shadow. See PR37523.
2964 void visitFDiv(BinaryOperator &I) { handleShadowOr(I); }
2965 void visitFRem(BinaryOperator &I) { handleShadowOr(I); }
2966
2967 /// Instrument == and != comparisons.
2968 ///
2969 /// Sometimes the comparison result is known even if some of the bits of the
2970 /// arguments are not.
2971 void handleEqualityComparison(ICmpInst &I) {
2972 IRBuilder<> IRB(&I);
2973 Value *A = I.getOperand(0);
2974 Value *B = I.getOperand(1);
2975 Value *Sa = getShadow(A);
2976 Value *Sb = getShadow(B);
2977
2978 // Get rid of pointers and vectors of pointers.
2979 // For ints (and vectors of ints), types of A and Sa match,
2980 // and this is a no-op.
2981 A = IRB.CreatePointerCast(A, Sa->getType());
2982 B = IRB.CreatePointerCast(B, Sb->getType());
2983
2984 // A == B <==> (C = A^B) == 0
2985 // A != B <==> (C = A^B) != 0
2986 // Sc = Sa | Sb
2987 Value *C = IRB.CreateXor(A, B);
2988 Value *Sc = IRB.CreateOr(Sa, Sb);
2989 // Now dealing with i = (C == 0) comparison (or C != 0, does not matter now)
2990 // Result is defined if one of the following is true
2991 // * there is a defined 1 bit in C
2992 // * C is fully defined
2993 // Si = !(C & ~Sc) && Sc
2995 Value *MinusOne = Constant::getAllOnesValue(Sc->getType());
2996 Value *LHS = IRB.CreateICmpNE(Sc, Zero);
2997 Value *RHS =
2998 IRB.CreateICmpEQ(IRB.CreateAnd(IRB.CreateXor(Sc, MinusOne), C), Zero);
2999 Value *Si = IRB.CreateAnd(LHS, RHS);
3000 Si->setName("_msprop_icmp");
3001 setShadow(&I, Si);
3002 setOriginForNaryOp(I);
3003 }
3004
3005 /// Instrument relational comparisons.
3006 ///
3007 /// This function does exact shadow propagation for all relational
3008 /// comparisons of integers, pointers and vectors of those.
3009 /// FIXME: output seems suboptimal when one of the operands is a constant
3010 void handleRelationalComparisonExact(ICmpInst &I) {
3011 IRBuilder<> IRB(&I);
3012 Value *A = I.getOperand(0);
3013 Value *B = I.getOperand(1);
3014 Value *Sa = getShadow(A);
3015 Value *Sb = getShadow(B);
3016
3017 // Get rid of pointers and vectors of pointers.
3018 // For ints (and vectors of ints), types of A and Sa match,
3019 // and this is a no-op.
3020 A = IRB.CreatePointerCast(A, Sa->getType());
3021 B = IRB.CreatePointerCast(B, Sb->getType());
3022
3023 // Let [a0, a1] be the interval of possible values of A, taking into account
3024 // its undefined bits. Let [b0, b1] be the interval of possible values of B.
3025 // Then (A cmp B) is defined iff (a0 cmp b1) == (a1 cmp b0).
3026 bool IsSigned = I.isSigned();
3027
3028 auto GetMinMaxUnsigned = [&](Value *V, Value *S) {
3029 if (IsSigned) {
3030 // Sign-flip to map from signed range to unsigned range. Relation A vs B
3031 // should be preserved, if checked with `getUnsignedPredicate()`.
3032 // Relationship between Amin, Amax, Bmin, Bmax also will not be
3033 // affected, as they are created by effectively adding/substructing from
3034 // A (or B) a value, derived from shadow, with no overflow, either
3035 // before or after sign flip.
3036 APInt MinVal =
3037 APInt::getSignedMinValue(V->getType()->getScalarSizeInBits());
3038 V = IRB.CreateXor(V, ConstantInt::get(V->getType(), MinVal));
3039 }
3040 // Minimize undefined bits.
3041 Value *Min = IRB.CreateAnd(V, IRB.CreateNot(S));
3042 Value *Max = IRB.CreateOr(V, S);
3043 return std::make_pair(Min, Max);
3044 };
3045
3046 auto [Amin, Amax] = GetMinMaxUnsigned(A, Sa);
3047 auto [Bmin, Bmax] = GetMinMaxUnsigned(B, Sb);
3048 Value *S1 = IRB.CreateICmp(I.getUnsignedPredicate(), Amin, Bmax);
3049 Value *S2 = IRB.CreateICmp(I.getUnsignedPredicate(), Amax, Bmin);
3050
3051 Value *Si = IRB.CreateXor(S1, S2);
3052 setShadow(&I, Si);
3053 setOriginForNaryOp(I);
3054 }
3055
3056 /// Instrument signed relational comparisons.
3057 ///
3058 /// Handle sign bit tests: x<0, x>=0, x<=-1, x>-1 by propagating the highest
3059 /// bit of the shadow. Everything else is delegated to handleShadowOr().
3060 void handleSignedRelationalComparison(ICmpInst &I) {
3061 Constant *constOp;
3062 Value *op = nullptr;
3064 if ((constOp = dyn_cast<Constant>(I.getOperand(1)))) {
3065 op = I.getOperand(0);
3066 pre = I.getPredicate();
3067 } else if ((constOp = dyn_cast<Constant>(I.getOperand(0)))) {
3068 op = I.getOperand(1);
3069 pre = I.getSwappedPredicate();
3070 } else {
3071 handleShadowOr(I);
3072 return;
3073 }
3074
3075 if ((constOp->isNullValue() &&
3076 (pre == CmpInst::ICMP_SLT || pre == CmpInst::ICMP_SGE)) ||
3077 (constOp->isAllOnesValue() &&
3078 (pre == CmpInst::ICMP_SGT || pre == CmpInst::ICMP_SLE))) {
3079 IRBuilder<> IRB(&I);
3080 Value *Shadow = IRB.CreateICmpSLT(getShadow(op), getCleanShadow(op),
3081 "_msprop_icmp_s");
3082 setShadow(&I, Shadow);
3083 setOrigin(&I, getOrigin(op));
3084 } else {
3085 handleShadowOr(I);
3086 }
3087 }
3088
3089 void visitICmpInst(ICmpInst &I) {
3090 if (!ClHandleICmp) {
3091 handleShadowOr(I);
3092 return;
3093 }
3094 if (I.isEquality()) {
3095 handleEqualityComparison(I);
3096 return;
3097 }
3098
3099 assert(I.isRelational());
3100 if (ClHandleICmpExact) {
3101 handleRelationalComparisonExact(I);
3102 return;
3103 }
3104 if (I.isSigned()) {
3105 handleSignedRelationalComparison(I);
3106 return;
3107 }
3108
3109 assert(I.isUnsigned());
3110 if ((isa<Constant>(I.getOperand(0)) || isa<Constant>(I.getOperand(1)))) {
3111 handleRelationalComparisonExact(I);
3112 return;
3113 }
3114
3115 handleShadowOr(I);
3116 }
3117
3118 void visitFCmpInst(FCmpInst &I) { handleShadowOr(I); }
3119
3120 void handleShift(BinaryOperator &I) {
3121 IRBuilder<> IRB(&I);
3122 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3123 // Otherwise perform the same shift on S1.
3124 Value *S1 = getShadow(&I, 0);
3125 Value *S2 = getShadow(&I, 1);
3126 Value *S2Conv =
3127 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3128 Value *V2 = I.getOperand(1);
3129 Value *Shift = IRB.CreateBinOp(I.getOpcode(), S1, V2);
3130 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3131 setOriginForNaryOp(I);
3132 }
3133
3134 void visitShl(BinaryOperator &I) { handleShift(I); }
3135 void visitAShr(BinaryOperator &I) { handleShift(I); }
3136 void visitLShr(BinaryOperator &I) { handleShift(I); }
3137
3138 void handleFunnelShift(IntrinsicInst &I) {
3139 IRBuilder<> IRB(&I);
3140 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3141 // Otherwise perform the same shift on S0 and S1.
3142 Value *S0 = getShadow(&I, 0);
3143 Value *S1 = getShadow(&I, 1);
3144 Value *S2 = getShadow(&I, 2);
3145 Value *S2Conv =
3146 IRB.CreateSExt(IRB.CreateICmpNE(S2, getCleanShadow(S2)), S2->getType());
3147 Value *V2 = I.getOperand(2);
3148 Value *Shift = IRB.CreateIntrinsic(I.getIntrinsicID(), S2Conv->getType(),
3149 {S0, S1, V2});
3150 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3151 setOriginForNaryOp(I);
3152 }
3153
3154 /// Instrument llvm.memmove
3155 ///
3156 /// At this point we don't know if llvm.memmove will be inlined or not.
3157 /// If we don't instrument it and it gets inlined,
3158 /// our interceptor will not kick in and we will lose the memmove.
3159 /// If we instrument the call here, but it does not get inlined,
3160 /// we will memmove the shadow twice: which is bad in case
3161 /// of overlapping regions. So, we simply lower the intrinsic to a call.
3162 ///
3163 /// Similar situation exists for memcpy and memset.
3164 void visitMemMoveInst(MemMoveInst &I) {
3165 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3166 IRBuilder<> IRB(&I);
3167 IRB.CreateCall(MS.MemmoveFn,
3168 {I.getArgOperand(0), I.getArgOperand(1),
3169 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3171 }
3172
3173 /// Instrument memcpy
3174 ///
3175 /// Similar to memmove: avoid copying shadow twice. This is somewhat
3176 /// unfortunate as it may slowdown small constant memcpys.
3177 /// FIXME: consider doing manual inline for small constant sizes and proper
3178 /// alignment.
3179 ///
3180 /// Note: This also handles memcpy.inline, which promises no calls to external
3181 /// functions as an optimization. However, with instrumentation enabled this
3182 /// is difficult to promise; additionally, we know that the MSan runtime
3183 /// exists and provides __msan_memcpy(). Therefore, we assume that with
3184 /// instrumentation it's safe to turn memcpy.inline into a call to
3185 /// __msan_memcpy(). Should this be wrong, such as when implementing memcpy()
3186 /// itself, instrumentation should be disabled with the no_sanitize attribute.
3187 void visitMemCpyInst(MemCpyInst &I) {
3188 getShadow(I.getArgOperand(1)); // Ensure shadow initialized
3189 IRBuilder<> IRB(&I);
3190 IRB.CreateCall(MS.MemcpyFn,
3191 {I.getArgOperand(0), I.getArgOperand(1),
3192 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3194 }
3195
3196 // Same as memcpy.
3197 void visitMemSetInst(MemSetInst &I) {
3198 IRBuilder<> IRB(&I);
3199 IRB.CreateCall(
3200 MS.MemsetFn,
3201 {I.getArgOperand(0),
3202 IRB.CreateIntCast(I.getArgOperand(1), IRB.getInt32Ty(), false),
3203 IRB.CreateIntCast(I.getArgOperand(2), MS.IntptrTy, false)});
3205 }
3206
3207 void visitVAStartInst(VAStartInst &I) { VAHelper->visitVAStartInst(I); }
3208
3209 void visitVACopyInst(VACopyInst &I) { VAHelper->visitVACopyInst(I); }
3210
3211 /// Handle vector store-like intrinsics.
3212 ///
3213 /// Instrument intrinsics that look like a simple SIMD store: writes memory,
3214 /// has 1 pointer argument and 1 vector argument, returns void.
3215 bool handleVectorStoreIntrinsic(IntrinsicInst &I) {
3216 assert(I.arg_size() == 2);
3217
3218 IRBuilder<> IRB(&I);
3219 Value *Addr = I.getArgOperand(0);
3220 Value *Shadow = getShadow(&I, 1);
3221 Value *ShadowPtr, *OriginPtr;
3222
3223 // We don't know the pointer alignment (could be unaligned SSE store!).
3224 // Have to assume to worst case.
3225 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
3226 Addr, IRB, Shadow->getType(), Align(1), /*isStore*/ true);
3227 IRB.CreateAlignedStore(Shadow, ShadowPtr, Align(1));
3228
3230 insertCheckShadowOf(Addr, &I);
3231
3232 // FIXME: factor out common code from materializeStores
3233 if (MS.TrackOrigins)
3234 IRB.CreateStore(getOrigin(&I, 1), OriginPtr);
3235 return true;
3236 }
3237
3238 /// Handle vector load-like intrinsics.
3239 ///
3240 /// Instrument intrinsics that look like a simple SIMD load: reads memory,
3241 /// has 1 pointer argument, returns a vector.
3242 bool handleVectorLoadIntrinsic(IntrinsicInst &I) {
3243 assert(I.arg_size() == 1);
3244
3245 IRBuilder<> IRB(&I);
3246 Value *Addr = I.getArgOperand(0);
3247
3248 Type *ShadowTy = getShadowTy(&I);
3249 Value *ShadowPtr = nullptr, *OriginPtr = nullptr;
3250 if (PropagateShadow) {
3251 // We don't know the pointer alignment (could be unaligned SSE load!).
3252 // Have to assume to worst case.
3253 const Align Alignment = Align(1);
3254 std::tie(ShadowPtr, OriginPtr) =
3255 getShadowOriginPtr(Addr, IRB, ShadowTy, Alignment, /*isStore*/ false);
3256 setShadow(&I,
3257 IRB.CreateAlignedLoad(ShadowTy, ShadowPtr, Alignment, "_msld"));
3258 } else {
3259 setShadow(&I, getCleanShadow(&I));
3260 }
3261
3263 insertCheckShadowOf(Addr, &I);
3264
3265 if (MS.TrackOrigins) {
3266 if (PropagateShadow)
3267 setOrigin(&I, IRB.CreateLoad(MS.OriginTy, OriginPtr));
3268 else
3269 setOrigin(&I, getCleanOrigin());
3270 }
3271 return true;
3272 }
3273
3274 /// Handle (SIMD arithmetic)-like intrinsics.
3275 ///
3276 /// Instrument intrinsics with any number of arguments of the same type [*],
3277 /// equal to the return type, plus a specified number of trailing flags of
3278 /// any type.
3279 ///
3280 /// [*] The type should be simple (no aggregates or pointers; vectors are
3281 /// fine).
3282 ///
3283 /// Caller guarantees that this intrinsic does not access memory.
3284 ///
3285 /// TODO: "horizontal"/"pairwise" intrinsics are often incorrectly matched by
3286 /// by this handler. See horizontalReduce().
3287 ///
3288 /// TODO: permutation intrinsics are also often incorrectly matched.
3289 [[maybe_unused]] bool
3290 maybeHandleSimpleNomemIntrinsic(IntrinsicInst &I,
3291 unsigned int trailingFlags) {
3292 Type *RetTy = I.getType();
3293 if (!(RetTy->isIntOrIntVectorTy() || RetTy->isFPOrFPVectorTy()))
3294 return false;
3295
3296 unsigned NumArgOperands = I.arg_size();
3297 assert(NumArgOperands >= trailingFlags);
3298 for (unsigned i = 0; i < NumArgOperands - trailingFlags; ++i) {
3299 Type *Ty = I.getArgOperand(i)->getType();
3300 if (Ty != RetTy)
3301 return false;
3302 }
3303
3304 IRBuilder<> IRB(&I);
3305 ShadowAndOriginCombiner SC(this, IRB);
3306 for (unsigned i = 0; i < NumArgOperands; ++i)
3307 SC.Add(I.getArgOperand(i));
3308 SC.Done(&I);
3309
3310 return true;
3311 }
3312
3313 /// Returns whether it was able to heuristically instrument unknown
3314 /// intrinsics.
3315 ///
3316 /// The main purpose of this code is to do something reasonable with all
3317 /// random intrinsics we might encounter, most importantly - SIMD intrinsics.
3318 /// We recognize several classes of intrinsics by their argument types and
3319 /// ModRefBehaviour and apply special instrumentation when we are reasonably
3320 /// sure that we know what the intrinsic does.
3321 ///
3322 /// We special-case intrinsics where this approach fails. See llvm.bswap
3323 /// handling as an example of that.
3324 bool maybeHandleUnknownIntrinsicUnlogged(IntrinsicInst &I) {
3325 unsigned NumArgOperands = I.arg_size();
3326 if (NumArgOperands == 0)
3327 return false;
3328
3329 if (NumArgOperands == 2 && I.getArgOperand(0)->getType()->isPointerTy() &&
3330 I.getArgOperand(1)->getType()->isVectorTy() &&
3331 I.getType()->isVoidTy() && !I.onlyReadsMemory()) {
3332 // This looks like a vector store.
3333 return handleVectorStoreIntrinsic(I);
3334 }
3335
3336 if (NumArgOperands == 1 && I.getArgOperand(0)->getType()->isPointerTy() &&
3337 I.getType()->isVectorTy() && I.onlyReadsMemory()) {
3338 // This looks like a vector load.
3339 return handleVectorLoadIntrinsic(I);
3340 }
3341
3342 if (I.doesNotAccessMemory())
3343 if (maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/0))
3344 return true;
3345
3346 // FIXME: detect and handle SSE maskstore/maskload?
3347 // Some cases are now handled in handleAVXMasked{Load,Store}.
3348 return false;
3349 }
3350
3351 bool maybeHandleUnknownIntrinsic(IntrinsicInst &I) {
3352 if (maybeHandleUnknownIntrinsicUnlogged(I)) {
3354 dumpInst(I);
3355
3356 LLVM_DEBUG(dbgs() << "UNKNOWN INSTRUCTION HANDLED HEURISTICALLY: " << I
3357 << "\n");
3358 return true;
3359 } else
3360 return false;
3361 }
3362
3363 void handleInvariantGroup(IntrinsicInst &I) {
3364 setShadow(&I, getShadow(&I, 0));
3365 setOrigin(&I, getOrigin(&I, 0));
3366 }
3367
3368 void handleLifetimeStart(IntrinsicInst &I) {
3369 if (!PoisonStack)
3370 return;
3371 AllocaInst *AI = dyn_cast<AllocaInst>(I.getArgOperand(0));
3372 if (AI)
3373 LifetimeStartList.push_back(std::make_pair(&I, AI));
3374 }
3375
3376 void handleBswap(IntrinsicInst &I) {
3377 IRBuilder<> IRB(&I);
3378 Value *Op = I.getArgOperand(0);
3379 Type *OpType = Op->getType();
3380 setShadow(&I, IRB.CreateIntrinsic(Intrinsic::bswap, ArrayRef(&OpType, 1),
3381 getShadow(Op)));
3382 setOrigin(&I, getOrigin(Op));
3383 }
3384
3385 // Uninitialized bits are ok if they appear after the leading/trailing 0's
3386 // and a 1. If the input is all zero, it is fully initialized iff
3387 // !is_zero_poison.
3388 //
3389 // e.g., for ctlz, with little-endian, if 0/1 are initialized bits with
3390 // concrete value 0/1, and ? is an uninitialized bit:
3391 // - 0001 0??? is fully initialized
3392 // - 000? ???? is fully uninitialized (*)
3393 // - ???? ???? is fully uninitialized
3394 // - 0000 0000 is fully uninitialized if is_zero_poison,
3395 // fully initialized otherwise
3396 //
3397 // (*) TODO: arguably, since the number of zeros is in the range [3, 8], we
3398 // only need to poison 4 bits.
3399 //
3400 // OutputShadow =
3401 // ((ConcreteZerosCount >= ShadowZerosCount) && !AllZeroShadow)
3402 // || (is_zero_poison && AllZeroSrc)
3403 void handleCountLeadingTrailingZeros(IntrinsicInst &I) {
3404 IRBuilder<> IRB(&I);
3405 Value *Src = I.getArgOperand(0);
3406 Value *SrcShadow = getShadow(Src);
3407
3408 Value *False = IRB.getInt1(false);
3409 Value *ConcreteZerosCount = IRB.CreateIntrinsic(
3410 I.getType(), I.getIntrinsicID(), {Src, /*is_zero_poison=*/False});
3411 Value *ShadowZerosCount = IRB.CreateIntrinsic(
3412 I.getType(), I.getIntrinsicID(), {SrcShadow, /*is_zero_poison=*/False});
3413
3414 Value *CompareConcreteZeros = IRB.CreateICmpUGE(
3415 ConcreteZerosCount, ShadowZerosCount, "_mscz_cmp_zeros");
3416
3417 Value *NotAllZeroShadow =
3418 IRB.CreateIsNotNull(SrcShadow, "_mscz_shadow_not_null");
3419 Value *OutputShadow =
3420 IRB.CreateAnd(CompareConcreteZeros, NotAllZeroShadow, "_mscz_main");
3421
3422 // If zero poison is requested, mix in with the shadow
3423 Constant *IsZeroPoison = cast<Constant>(I.getOperand(1));
3424 if (!IsZeroPoison->isZeroValue()) {
3425 Value *BoolZeroPoison = IRB.CreateIsNull(Src, "_mscz_bzp");
3426 OutputShadow = IRB.CreateOr(OutputShadow, BoolZeroPoison, "_mscz_bs");
3427 }
3428
3429 OutputShadow = IRB.CreateSExt(OutputShadow, getShadowTy(Src), "_mscz_os");
3430
3431 setShadow(&I, OutputShadow);
3432 setOriginForNaryOp(I);
3433 }
3434
3435 /// Handle Arm NEON vector convert intrinsics.
3436 ///
3437 /// e.g., <4 x i32> @llvm.aarch64.neon.fcvtpu.v4i32.v4f32(<4 x float>)
3438 /// i32 @llvm.aarch64.neon.fcvtms.i32.f64(double)
3439 ///
3440 /// For x86 SSE vector convert intrinsics, see
3441 /// handleSSEVectorConvertIntrinsic().
3442 void handleNEONVectorConvertIntrinsic(IntrinsicInst &I) {
3443 assert(I.arg_size() == 1);
3444
3445 IRBuilder<> IRB(&I);
3446 Value *S0 = getShadow(&I, 0);
3447
3448 /// For scalars:
3449 /// Since they are converting from floating-point to integer, the output is
3450 /// - fully uninitialized if *any* bit of the input is uninitialized
3451 /// - fully ininitialized if all bits of the input are ininitialized
3452 /// We apply the same principle on a per-field basis for vectors.
3453 Value *OutShadow = IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)),
3454 getShadowTy(&I));
3455 setShadow(&I, OutShadow);
3456 setOriginForNaryOp(I);
3457 }
3458
3459 /// Some instructions have additional zero-elements in the return type
3460 /// e.g., <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512(<8 x i64>, ...)
3461 ///
3462 /// This function will return a vector type with the same number of elements
3463 /// as the input, but same per-element width as the return value e.g.,
3464 /// <8 x i8>.
3465 FixedVectorType *maybeShrinkVectorShadowType(Value *Src, IntrinsicInst &I) {
3466 assert(isa<FixedVectorType>(getShadowTy(&I)));
3467 FixedVectorType *ShadowType = cast<FixedVectorType>(getShadowTy(&I));
3468
3469 // TODO: generalize beyond 2x?
3470 if (ShadowType->getElementCount() ==
3471 cast<VectorType>(Src->getType())->getElementCount() * 2)
3472 ShadowType = FixedVectorType::getHalfElementsVectorType(ShadowType);
3473
3474 assert(ShadowType->getElementCount() ==
3475 cast<VectorType>(Src->getType())->getElementCount());
3476
3477 return ShadowType;
3478 }
3479
3480 /// Doubles the length of a vector shadow (extending with zeros) if necessary
3481 /// to match the length of the shadow for the instruction.
3482 /// If scalar types of the vectors are different, it will use the type of the
3483 /// input vector.
3484 /// This is more type-safe than CreateShadowCast().
3485 Value *maybeExtendVectorShadowWithZeros(Value *Shadow, IntrinsicInst &I) {
3486 IRBuilder<> IRB(&I);
3488 assert(isa<FixedVectorType>(I.getType()));
3489
3490 Value *FullShadow = getCleanShadow(&I);
3491 unsigned ShadowNumElems =
3492 cast<FixedVectorType>(Shadow->getType())->getNumElements();
3493 unsigned FullShadowNumElems =
3494 cast<FixedVectorType>(FullShadow->getType())->getNumElements();
3495
3496 assert((ShadowNumElems == FullShadowNumElems) ||
3497 (ShadowNumElems * 2 == FullShadowNumElems));
3498
3499 if (ShadowNumElems == FullShadowNumElems) {
3500 FullShadow = Shadow;
3501 } else {
3502 // TODO: generalize beyond 2x?
3503 SmallVector<int, 32> ShadowMask(FullShadowNumElems);
3504 std::iota(ShadowMask.begin(), ShadowMask.end(), 0);
3505
3506 // Append zeros
3507 FullShadow =
3508 IRB.CreateShuffleVector(Shadow, getCleanShadow(Shadow), ShadowMask);
3509 }
3510
3511 return FullShadow;
3512 }
3513
3514 /// Handle x86 SSE vector conversion.
3515 ///
3516 /// e.g., single-precision to half-precision conversion:
3517 /// <8 x i16> @llvm.x86.vcvtps2ph.256(<8 x float> %a0, i32 0)
3518 /// <8 x i16> @llvm.x86.vcvtps2ph.128(<4 x float> %a0, i32 0)
3519 ///
3520 /// floating-point to integer:
3521 /// <4 x i32> @llvm.x86.sse2.cvtps2dq(<4 x float>)
3522 /// <4 x i32> @llvm.x86.sse2.cvtpd2dq(<2 x double>)
3523 ///
3524 /// Note: if the output has more elements, they are zero-initialized (and
3525 /// therefore the shadow will also be initialized).
3526 ///
3527 /// This differs from handleSSEVectorConvertIntrinsic() because it
3528 /// propagates uninitialized shadow (instead of checking the shadow).
3529 void handleSSEVectorConvertIntrinsicByProp(IntrinsicInst &I,
3530 bool HasRoundingMode) {
3531 if (HasRoundingMode) {
3532 assert(I.arg_size() == 2);
3533 [[maybe_unused]] Value *RoundingMode = I.getArgOperand(1);
3534 assert(RoundingMode->getType()->isIntegerTy());
3535 } else {
3536 assert(I.arg_size() == 1);
3537 }
3538
3539 Value *Src = I.getArgOperand(0);
3540 assert(Src->getType()->isVectorTy());
3541
3542 // The return type might have more elements than the input.
3543 // Temporarily shrink the return type's number of elements.
3544 VectorType *ShadowType = maybeShrinkVectorShadowType(Src, I);
3545
3546 IRBuilder<> IRB(&I);
3547 Value *S0 = getShadow(&I, 0);
3548
3549 /// For scalars:
3550 /// Since they are converting to and/or from floating-point, the output is:
3551 /// - fully uninitialized if *any* bit of the input is uninitialized
3552 /// - fully ininitialized if all bits of the input are ininitialized
3553 /// We apply the same principle on a per-field basis for vectors.
3554 Value *Shadow =
3555 IRB.CreateSExt(IRB.CreateICmpNE(S0, getCleanShadow(S0)), ShadowType);
3556
3557 // The return type might have more elements than the input.
3558 // Extend the return type back to its original width if necessary.
3559 Value *FullShadow = maybeExtendVectorShadowWithZeros(Shadow, I);
3560
3561 setShadow(&I, FullShadow);
3562 setOriginForNaryOp(I);
3563 }
3564
3565 // Instrument x86 SSE vector convert intrinsic.
3566 //
3567 // This function instruments intrinsics like cvtsi2ss:
3568 // %Out = int_xxx_cvtyyy(%ConvertOp)
3569 // or
3570 // %Out = int_xxx_cvtyyy(%CopyOp, %ConvertOp)
3571 // Intrinsic converts \p NumUsedElements elements of \p ConvertOp to the same
3572 // number \p Out elements, and (if has 2 arguments) copies the rest of the
3573 // elements from \p CopyOp.
3574 // In most cases conversion involves floating-point value which may trigger a
3575 // hardware exception when not fully initialized. For this reason we require
3576 // \p ConvertOp[0:NumUsedElements] to be fully initialized and trap otherwise.
3577 // We copy the shadow of \p CopyOp[NumUsedElements:] to \p
3578 // Out[NumUsedElements:]. This means that intrinsics without \p CopyOp always
3579 // return a fully initialized value.
3580 //
3581 // For Arm NEON vector convert intrinsics, see
3582 // handleNEONVectorConvertIntrinsic().
3583 void handleSSEVectorConvertIntrinsic(IntrinsicInst &I, int NumUsedElements,
3584 bool HasRoundingMode = false) {
3585 IRBuilder<> IRB(&I);
3586 Value *CopyOp, *ConvertOp;
3587
3588 assert((!HasRoundingMode ||
3589 isa<ConstantInt>(I.getArgOperand(I.arg_size() - 1))) &&
3590 "Invalid rounding mode");
3591
3592 switch (I.arg_size() - HasRoundingMode) {
3593 case 2:
3594 CopyOp = I.getArgOperand(0);
3595 ConvertOp = I.getArgOperand(1);
3596 break;
3597 case 1:
3598 ConvertOp = I.getArgOperand(0);
3599 CopyOp = nullptr;
3600 break;
3601 default:
3602 llvm_unreachable("Cvt intrinsic with unsupported number of arguments.");
3603 }
3604
3605 // The first *NumUsedElements* elements of ConvertOp are converted to the
3606 // same number of output elements. The rest of the output is copied from
3607 // CopyOp, or (if not available) filled with zeroes.
3608 // Combine shadow for elements of ConvertOp that are used in this operation,
3609 // and insert a check.
3610 // FIXME: consider propagating shadow of ConvertOp, at least in the case of
3611 // int->any conversion.
3612 Value *ConvertShadow = getShadow(ConvertOp);
3613 Value *AggShadow = nullptr;
3614 if (ConvertOp->getType()->isVectorTy()) {
3615 AggShadow = IRB.CreateExtractElement(
3616 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), 0));
3617 for (int i = 1; i < NumUsedElements; ++i) {
3618 Value *MoreShadow = IRB.CreateExtractElement(
3619 ConvertShadow, ConstantInt::get(IRB.getInt32Ty(), i));
3620 AggShadow = IRB.CreateOr(AggShadow, MoreShadow);
3621 }
3622 } else {
3623 AggShadow = ConvertShadow;
3624 }
3625 assert(AggShadow->getType()->isIntegerTy());
3626 insertCheckShadow(AggShadow, getOrigin(ConvertOp), &I);
3627
3628 // Build result shadow by zero-filling parts of CopyOp shadow that come from
3629 // ConvertOp.
3630 if (CopyOp) {
3631 assert(CopyOp->getType() == I.getType());
3632 assert(CopyOp->getType()->isVectorTy());
3633 Value *ResultShadow = getShadow(CopyOp);
3634 Type *EltTy = cast<VectorType>(ResultShadow->getType())->getElementType();
3635 for (int i = 0; i < NumUsedElements; ++i) {
3636 ResultShadow = IRB.CreateInsertElement(
3637 ResultShadow, ConstantInt::getNullValue(EltTy),
3638 ConstantInt::get(IRB.getInt32Ty(), i));
3639 }
3640 setShadow(&I, ResultShadow);
3641 setOrigin(&I, getOrigin(CopyOp));
3642 } else {
3643 setShadow(&I, getCleanShadow(&I));
3644 setOrigin(&I, getCleanOrigin());
3645 }
3646 }
3647
3648 // Given a scalar or vector, extract lower 64 bits (or less), and return all
3649 // zeroes if it is zero, and all ones otherwise.
3650 Value *Lower64ShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3651 if (S->getType()->isVectorTy())
3652 S = CreateShadowCast(IRB, S, IRB.getInt64Ty(), /* Signed */ true);
3653 assert(S->getType()->getPrimitiveSizeInBits() <= 64);
3654 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3655 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3656 }
3657
3658 // Given a vector, extract its first element, and return all
3659 // zeroes if it is zero, and all ones otherwise.
3660 Value *LowerElementShadowExtend(IRBuilder<> &IRB, Value *S, Type *T) {
3661 Value *S1 = IRB.CreateExtractElement(S, (uint64_t)0);
3662 Value *S2 = IRB.CreateICmpNE(S1, getCleanShadow(S1));
3663 return CreateShadowCast(IRB, S2, T, /* Signed */ true);
3664 }
3665
3666 Value *VariableShadowExtend(IRBuilder<> &IRB, Value *S) {
3667 Type *T = S->getType();
3668 assert(T->isVectorTy());
3669 Value *S2 = IRB.CreateICmpNE(S, getCleanShadow(S));
3670 return IRB.CreateSExt(S2, T);
3671 }
3672
3673 // Instrument vector shift intrinsic.
3674 //
3675 // This function instruments intrinsics like int_x86_avx2_psll_w.
3676 // Intrinsic shifts %In by %ShiftSize bits.
3677 // %ShiftSize may be a vector. In that case the lower 64 bits determine shift
3678 // size, and the rest is ignored. Behavior is defined even if shift size is
3679 // greater than register (or field) width.
3680 void handleVectorShiftIntrinsic(IntrinsicInst &I, bool Variable) {
3681 assert(I.arg_size() == 2);
3682 IRBuilder<> IRB(&I);
3683 // If any of the S2 bits are poisoned, the whole thing is poisoned.
3684 // Otherwise perform the same shift on S1.
3685 Value *S1 = getShadow(&I, 0);
3686 Value *S2 = getShadow(&I, 1);
3687 Value *S2Conv = Variable ? VariableShadowExtend(IRB, S2)
3688 : Lower64ShadowExtend(IRB, S2, getShadowTy(&I));
3689 Value *V1 = I.getOperand(0);
3690 Value *V2 = I.getOperand(1);
3691 Value *Shift = IRB.CreateCall(I.getFunctionType(), I.getCalledOperand(),
3692 {IRB.CreateBitCast(S1, V1->getType()), V2});
3693 Shift = IRB.CreateBitCast(Shift, getShadowTy(&I));
3694 setShadow(&I, IRB.CreateOr(Shift, S2Conv));
3695 setOriginForNaryOp(I);
3696 }
3697
3698 // Get an MMX-sized (64-bit) vector type, or optionally, other sized
3699 // vectors.
3700 Type *getMMXVectorTy(unsigned EltSizeInBits,
3701 unsigned X86_MMXSizeInBits = 64) {
3702 assert(EltSizeInBits != 0 && (X86_MMXSizeInBits % EltSizeInBits) == 0 &&
3703 "Illegal MMX vector element size");
3704 return FixedVectorType::get(IntegerType::get(*MS.C, EltSizeInBits),
3705 X86_MMXSizeInBits / EltSizeInBits);
3706 }
3707
3708 // Returns a signed counterpart for an (un)signed-saturate-and-pack
3709 // intrinsic.
3710 Intrinsic::ID getSignedPackIntrinsic(Intrinsic::ID id) {
3711 switch (id) {
3712 case Intrinsic::x86_sse2_packsswb_128:
3713 case Intrinsic::x86_sse2_packuswb_128:
3714 return Intrinsic::x86_sse2_packsswb_128;
3715
3716 case Intrinsic::x86_sse2_packssdw_128:
3717 case Intrinsic::x86_sse41_packusdw:
3718 return Intrinsic::x86_sse2_packssdw_128;
3719
3720 case Intrinsic::x86_avx2_packsswb:
3721 case Intrinsic::x86_avx2_packuswb:
3722 return Intrinsic::x86_avx2_packsswb;
3723
3724 case Intrinsic::x86_avx2_packssdw:
3725 case Intrinsic::x86_avx2_packusdw:
3726 return Intrinsic::x86_avx2_packssdw;
3727
3728 case Intrinsic::x86_mmx_packsswb:
3729 case Intrinsic::x86_mmx_packuswb:
3730 return Intrinsic::x86_mmx_packsswb;
3731
3732 case Intrinsic::x86_mmx_packssdw:
3733 return Intrinsic::x86_mmx_packssdw;
3734
3735 case Intrinsic::x86_avx512_packssdw_512:
3736 case Intrinsic::x86_avx512_packusdw_512:
3737 return Intrinsic::x86_avx512_packssdw_512;
3738
3739 case Intrinsic::x86_avx512_packsswb_512:
3740 case Intrinsic::x86_avx512_packuswb_512:
3741 return Intrinsic::x86_avx512_packsswb_512;
3742
3743 default:
3744 llvm_unreachable("unexpected intrinsic id");
3745 }
3746 }
3747
3748 // Instrument vector pack intrinsic.
3749 //
3750 // This function instruments intrinsics like x86_mmx_packsswb, that
3751 // packs elements of 2 input vectors into half as many bits with saturation.
3752 // Shadow is propagated with the signed variant of the same intrinsic applied
3753 // to sext(Sa != zeroinitializer), sext(Sb != zeroinitializer).
3754 // MMXEltSizeInBits is used only for x86mmx arguments.
3755 //
3756 // TODO: consider using GetMinMaxUnsigned() to handle saturation precisely
3757 void handleVectorPackIntrinsic(IntrinsicInst &I,
3758 unsigned MMXEltSizeInBits = 0) {
3759 assert(I.arg_size() == 2);
3760 IRBuilder<> IRB(&I);
3761 Value *S1 = getShadow(&I, 0);
3762 Value *S2 = getShadow(&I, 1);
3763 assert(S1->getType()->isVectorTy());
3764
3765 // SExt and ICmpNE below must apply to individual elements of input vectors.
3766 // In case of x86mmx arguments, cast them to appropriate vector types and
3767 // back.
3768 Type *T =
3769 MMXEltSizeInBits ? getMMXVectorTy(MMXEltSizeInBits) : S1->getType();
3770 if (MMXEltSizeInBits) {
3771 S1 = IRB.CreateBitCast(S1, T);
3772 S2 = IRB.CreateBitCast(S2, T);
3773 }
3774 Value *S1_ext =
3776 Value *S2_ext =
3778 if (MMXEltSizeInBits) {
3779 S1_ext = IRB.CreateBitCast(S1_ext, getMMXVectorTy(64));
3780 S2_ext = IRB.CreateBitCast(S2_ext, getMMXVectorTy(64));
3781 }
3782
3783 Value *S = IRB.CreateIntrinsic(getSignedPackIntrinsic(I.getIntrinsicID()),
3784 {S1_ext, S2_ext}, /*FMFSource=*/nullptr,
3785 "_msprop_vector_pack");
3786 if (MMXEltSizeInBits)
3787 S = IRB.CreateBitCast(S, getShadowTy(&I));
3788 setShadow(&I, S);
3789 setOriginForNaryOp(I);
3790 }
3791
3792 // Convert `Mask` into `<n x i1>`.
3793 Constant *createDppMask(unsigned Width, unsigned Mask) {
3795 for (auto &M : R) {
3796 M = ConstantInt::getBool(F.getContext(), Mask & 1);
3797 Mask >>= 1;
3798 }
3799 return ConstantVector::get(R);
3800 }
3801
3802 // Calculate output shadow as array of booleans `<n x i1>`, assuming if any
3803 // arg is poisoned, entire dot product is poisoned.
3804 Value *findDppPoisonedOutput(IRBuilder<> &IRB, Value *S, unsigned SrcMask,
3805 unsigned DstMask) {
3806 const unsigned Width =
3807 cast<FixedVectorType>(S->getType())->getNumElements();
3808
3809 S = IRB.CreateSelect(createDppMask(Width, SrcMask), S,
3811 Value *SElem = IRB.CreateOrReduce(S);
3812 Value *IsClean = IRB.CreateIsNull(SElem, "_msdpp");
3813 Value *DstMaskV = createDppMask(Width, DstMask);
3814
3815 return IRB.CreateSelect(
3816 IsClean, Constant::getNullValue(DstMaskV->getType()), DstMaskV);
3817 }
3818
3819 // See `Intel Intrinsics Guide` for `_dp_p*` instructions.
3820 //
3821 // 2 and 4 element versions produce single scalar of dot product, and then
3822 // puts it into elements of output vector, selected by 4 lowest bits of the
3823 // mask. Top 4 bits of the mask control which elements of input to use for dot
3824 // product.
3825 //
3826 // 8 element version mask still has only 4 bit for input, and 4 bit for output
3827 // mask. According to the spec it just operates as 4 element version on first
3828 // 4 elements of inputs and output, and then on last 4 elements of inputs and
3829 // output.
3830 void handleDppIntrinsic(IntrinsicInst &I) {
3831 IRBuilder<> IRB(&I);
3832
3833 Value *S0 = getShadow(&I, 0);
3834 Value *S1 = getShadow(&I, 1);
3835 Value *S = IRB.CreateOr(S0, S1);
3836
3837 const unsigned Width =
3838 cast<FixedVectorType>(S->getType())->getNumElements();
3839 assert(Width == 2 || Width == 4 || Width == 8);
3840
3841 const unsigned Mask = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
3842 const unsigned SrcMask = Mask >> 4;
3843 const unsigned DstMask = Mask & 0xf;
3844
3845 // Calculate shadow as `<n x i1>`.
3846 Value *SI1 = findDppPoisonedOutput(IRB, S, SrcMask, DstMask);
3847 if (Width == 8) {
3848 // First 4 elements of shadow are already calculated. `makeDppShadow`
3849 // operats on 32 bit masks, so we can just shift masks, and repeat.
3850 SI1 = IRB.CreateOr(
3851 SI1, findDppPoisonedOutput(IRB, S, SrcMask << 4, DstMask << 4));
3852 }
3853 // Extend to real size of shadow, poisoning either all or none bits of an
3854 // element.
3855 S = IRB.CreateSExt(SI1, S->getType(), "_msdpp");
3856
3857 setShadow(&I, S);
3858 setOriginForNaryOp(I);
3859 }
3860
3861 Value *convertBlendvToSelectMask(IRBuilder<> &IRB, Value *C) {
3862 C = CreateAppToShadowCast(IRB, C);
3863 FixedVectorType *FVT = cast<FixedVectorType>(C->getType());
3864 unsigned ElSize = FVT->getElementType()->getPrimitiveSizeInBits();
3865 C = IRB.CreateAShr(C, ElSize - 1);
3866 FVT = FixedVectorType::get(IRB.getInt1Ty(), FVT->getNumElements());
3867 return IRB.CreateTrunc(C, FVT);
3868 }
3869
3870 // `blendv(f, t, c)` is effectively `select(c[top_bit], t, f)`.
3871 void handleBlendvIntrinsic(IntrinsicInst &I) {
3872 Value *C = I.getOperand(2);
3873 Value *T = I.getOperand(1);
3874 Value *F = I.getOperand(0);
3875
3876 Value *Sc = getShadow(&I, 2);
3877 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
3878
3879 {
3880 IRBuilder<> IRB(&I);
3881 // Extract top bit from condition and its shadow.
3882 C = convertBlendvToSelectMask(IRB, C);
3883 Sc = convertBlendvToSelectMask(IRB, Sc);
3884
3885 setShadow(C, Sc);
3886 setOrigin(C, Oc);
3887 }
3888
3889 handleSelectLikeInst(I, C, T, F);
3890 }
3891
3892 // Instrument sum-of-absolute-differences intrinsic.
3893 void handleVectorSadIntrinsic(IntrinsicInst &I, bool IsMMX = false) {
3894 const unsigned SignificantBitsPerResultElement = 16;
3895 Type *ResTy = IsMMX ? IntegerType::get(*MS.C, 64) : I.getType();
3896 unsigned ZeroBitsPerResultElement =
3897 ResTy->getScalarSizeInBits() - SignificantBitsPerResultElement;
3898
3899 IRBuilder<> IRB(&I);
3900 auto *Shadow0 = getShadow(&I, 0);
3901 auto *Shadow1 = getShadow(&I, 1);
3902 Value *S = IRB.CreateOr(Shadow0, Shadow1);
3903 S = IRB.CreateBitCast(S, ResTy);
3904 S = IRB.CreateSExt(IRB.CreateICmpNE(S, Constant::getNullValue(ResTy)),
3905 ResTy);
3906 S = IRB.CreateLShr(S, ZeroBitsPerResultElement);
3907 S = IRB.CreateBitCast(S, getShadowTy(&I));
3908 setShadow(&I, S);
3909 setOriginForNaryOp(I);
3910 }
3911
3912 // Instrument multiply-add(-accumulate)? intrinsics.
3913 //
3914 // e.g., Two operands:
3915 // <4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16> %a, <8 x i16> %b)
3916 //
3917 // Two operands which require an EltSizeInBits override:
3918 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64> %a, <1 x i64> %b)
3919 //
3920 // Three operands:
3921 // <4 x i32> @llvm.x86.avx512.vpdpbusd.128
3922 // (<4 x i32> %s, <16 x i8> %a, <16 x i8> %b)
3923 // (this is equivalent to multiply-add on %a and %b, followed by
3924 // adding/"accumulating" %s. "Accumulation" stores the result in one
3925 // of the source registers, but this accumulate vs. add distinction
3926 // is lost when dealing with LLVM intrinsics.)
3927 //
3928 // ZeroPurifies means that multiplying a known-zero with an uninitialized
3929 // value results in an initialized value. This is applicable for integer
3930 // multiplication, but not floating-point (counter-example: NaN).
3931 void handleVectorPmaddIntrinsic(IntrinsicInst &I, unsigned ReductionFactor,
3932 bool ZeroPurifies,
3933 unsigned EltSizeInBits = 0) {
3934 IRBuilder<> IRB(&I);
3935
3936 [[maybe_unused]] FixedVectorType *ReturnType =
3937 cast<FixedVectorType>(I.getType());
3938 assert(isa<FixedVectorType>(ReturnType));
3939
3940 // Vectors A and B, and shadows
3941 Value *Va = nullptr;
3942 Value *Vb = nullptr;
3943 Value *Sa = nullptr;
3944 Value *Sb = nullptr;
3945
3946 assert(I.arg_size() == 2 || I.arg_size() == 3);
3947 if (I.arg_size() == 2) {
3948 Va = I.getOperand(0);
3949 Vb = I.getOperand(1);
3950
3951 Sa = getShadow(&I, 0);
3952 Sb = getShadow(&I, 1);
3953 } else if (I.arg_size() == 3) {
3954 // Operand 0 is the accumulator. We will deal with that below.
3955 Va = I.getOperand(1);
3956 Vb = I.getOperand(2);
3957
3958 Sa = getShadow(&I, 1);
3959 Sb = getShadow(&I, 2);
3960 }
3961
3962 FixedVectorType *ParamType = cast<FixedVectorType>(Va->getType());
3963 assert(ParamType == Vb->getType());
3964
3965 assert(ParamType->getPrimitiveSizeInBits() ==
3966 ReturnType->getPrimitiveSizeInBits());
3967
3968 if (I.arg_size() == 3) {
3969 [[maybe_unused]] auto *AccumulatorType =
3970 cast<FixedVectorType>(I.getOperand(0)->getType());
3971 assert(AccumulatorType == ReturnType);
3972 }
3973
3974 FixedVectorType *ImplicitReturnType =
3975 cast<FixedVectorType>(getShadowTy(ReturnType));
3976 // Step 1: instrument multiplication of corresponding vector elements
3977 if (EltSizeInBits) {
3978 ImplicitReturnType = cast<FixedVectorType>(
3979 getMMXVectorTy(EltSizeInBits * ReductionFactor,
3980 ParamType->getPrimitiveSizeInBits()));
3981 ParamType = cast<FixedVectorType>(
3982 getMMXVectorTy(EltSizeInBits, ParamType->getPrimitiveSizeInBits()));
3983
3984 Va = IRB.CreateBitCast(Va, ParamType);
3985 Vb = IRB.CreateBitCast(Vb, ParamType);
3986
3987 Sa = IRB.CreateBitCast(Sa, getShadowTy(ParamType));
3988 Sb = IRB.CreateBitCast(Sb, getShadowTy(ParamType));
3989 } else {
3990 assert(ParamType->getNumElements() ==
3991 ReturnType->getNumElements() * ReductionFactor);
3992 }
3993
3994 // Each element of the vector is represented by a single bit (poisoned or
3995 // not) e.g., <8 x i1>.
3996 Value *SaNonZero = IRB.CreateIsNotNull(Sa);
3997 Value *SbNonZero = IRB.CreateIsNotNull(Sb);
3998 Value *And;
3999 if (ZeroPurifies) {
4000 // Multiplying an *initialized* zero by an uninitialized element results
4001 // in an initialized zero element.
4002 //
4003 // This is analogous to bitwise AND, where "AND" of 0 and a poisoned value
4004 // results in an unpoisoned value. We can therefore adapt the visitAnd()
4005 // instrumentation:
4006 // OutShadow = (SaNonZero & SbNonZero)
4007 // | (VaNonZero & SbNonZero)
4008 // | (SaNonZero & VbNonZero)
4009 // where non-zero is checked on a per-element basis (not per bit).
4010 Value *VaInt = Va;
4011 Value *VbInt = Vb;
4012 if (!Va->getType()->isIntegerTy()) {
4013 VaInt = CreateAppToShadowCast(IRB, Va);
4014 VbInt = CreateAppToShadowCast(IRB, Vb);
4015 }
4016
4017 Value *VaNonZero = IRB.CreateIsNotNull(VaInt);
4018 Value *VbNonZero = IRB.CreateIsNotNull(VbInt);
4019
4020 Value *SaAndSbNonZero = IRB.CreateAnd(SaNonZero, SbNonZero);
4021 Value *VaAndSbNonZero = IRB.CreateAnd(VaNonZero, SbNonZero);
4022 Value *SaAndVbNonZero = IRB.CreateAnd(SaNonZero, VbNonZero);
4023
4024 And = IRB.CreateOr({SaAndSbNonZero, VaAndSbNonZero, SaAndVbNonZero});
4025 } else {
4026 And = IRB.CreateOr({SaNonZero, SbNonZero});
4027 }
4028
4029 // Extend <8 x i1> to <8 x i16>.
4030 // (The real pmadd intrinsic would have computed intermediate values of
4031 // <8 x i32>, but that is irrelevant for our shadow purposes because we
4032 // consider each element to be either fully initialized or fully
4033 // uninitialized.)
4034 And = IRB.CreateSExt(And, Sa->getType());
4035
4036 // Step 2: instrument horizontal add
4037 // We don't need bit-precise horizontalReduce because we only want to check
4038 // if each pair/quad of elements is fully zero.
4039 // Cast to <4 x i32>.
4040 Value *Horizontal = IRB.CreateBitCast(And, ImplicitReturnType);
4041
4042 // Compute <4 x i1>, then extend back to <4 x i32>.
4043 Value *OutShadow = IRB.CreateSExt(
4044 IRB.CreateICmpNE(Horizontal,
4045 Constant::getNullValue(Horizontal->getType())),
4046 ImplicitReturnType);
4047
4048 // Cast it back to the required fake return type (if MMX: <1 x i64>; for
4049 // AVX, it is already correct).
4050 if (EltSizeInBits)
4051 OutShadow = CreateShadowCast(IRB, OutShadow, getShadowTy(&I));
4052
4053 // Step 3 (if applicable): instrument accumulator
4054 if (I.arg_size() == 3)
4055 OutShadow = IRB.CreateOr(OutShadow, getShadow(&I, 0));
4056
4057 setShadow(&I, OutShadow);
4058 setOriginForNaryOp(I);
4059 }
4060
4061 // Instrument compare-packed intrinsic.
4062 // Basically, an or followed by sext(icmp ne 0) to end up with all-zeros or
4063 // all-ones shadow.
4064 void handleVectorComparePackedIntrinsic(IntrinsicInst &I) {
4065 IRBuilder<> IRB(&I);
4066 Type *ResTy = getShadowTy(&I);
4067 auto *Shadow0 = getShadow(&I, 0);
4068 auto *Shadow1 = getShadow(&I, 1);
4069 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4070 Value *S = IRB.CreateSExt(
4071 IRB.CreateICmpNE(S0, Constant::getNullValue(ResTy)), ResTy);
4072 setShadow(&I, S);
4073 setOriginForNaryOp(I);
4074 }
4075
4076 // Instrument compare-scalar intrinsic.
4077 // This handles both cmp* intrinsics which return the result in the first
4078 // element of a vector, and comi* which return the result as i32.
4079 void handleVectorCompareScalarIntrinsic(IntrinsicInst &I) {
4080 IRBuilder<> IRB(&I);
4081 auto *Shadow0 = getShadow(&I, 0);
4082 auto *Shadow1 = getShadow(&I, 1);
4083 Value *S0 = IRB.CreateOr(Shadow0, Shadow1);
4084 Value *S = LowerElementShadowExtend(IRB, S0, getShadowTy(&I));
4085 setShadow(&I, S);
4086 setOriginForNaryOp(I);
4087 }
4088
4089 // Instrument generic vector reduction intrinsics
4090 // by ORing together all their fields.
4091 //
4092 // If AllowShadowCast is true, the return type does not need to be the same
4093 // type as the fields
4094 // e.g., declare i32 @llvm.aarch64.neon.uaddv.i32.v16i8(<16 x i8>)
4095 void handleVectorReduceIntrinsic(IntrinsicInst &I, bool AllowShadowCast) {
4096 assert(I.arg_size() == 1);
4097
4098 IRBuilder<> IRB(&I);
4099 Value *S = IRB.CreateOrReduce(getShadow(&I, 0));
4100 if (AllowShadowCast)
4101 S = CreateShadowCast(IRB, S, getShadowTy(&I));
4102 else
4103 assert(S->getType() == getShadowTy(&I));
4104 setShadow(&I, S);
4105 setOriginForNaryOp(I);
4106 }
4107
4108 // Similar to handleVectorReduceIntrinsic but with an initial starting value.
4109 // e.g., call float @llvm.vector.reduce.fadd.f32.v2f32(float %a0, <2 x float>
4110 // %a1)
4111 // shadow = shadow[a0] | shadow[a1.0] | shadow[a1.1]
4112 //
4113 // The type of the return value, initial starting value, and elements of the
4114 // vector must be identical.
4115 void handleVectorReduceWithStarterIntrinsic(IntrinsicInst &I) {
4116 assert(I.arg_size() == 2);
4117
4118 IRBuilder<> IRB(&I);
4119 Value *Shadow0 = getShadow(&I, 0);
4120 Value *Shadow1 = IRB.CreateOrReduce(getShadow(&I, 1));
4121 assert(Shadow0->getType() == Shadow1->getType());
4122 Value *S = IRB.CreateOr(Shadow0, Shadow1);
4123 assert(S->getType() == getShadowTy(&I));
4124 setShadow(&I, S);
4125 setOriginForNaryOp(I);
4126 }
4127
4128 // Instrument vector.reduce.or intrinsic.
4129 // Valid (non-poisoned) set bits in the operand pull low the
4130 // corresponding shadow bits.
4131 void handleVectorReduceOrIntrinsic(IntrinsicInst &I) {
4132 assert(I.arg_size() == 1);
4133
4134 IRBuilder<> IRB(&I);
4135 Value *OperandShadow = getShadow(&I, 0);
4136 Value *OperandUnsetBits = IRB.CreateNot(I.getOperand(0));
4137 Value *OperandUnsetOrPoison = IRB.CreateOr(OperandUnsetBits, OperandShadow);
4138 // Bit N is clean if any field's bit N is 1 and unpoison
4139 Value *OutShadowMask = IRB.CreateAndReduce(OperandUnsetOrPoison);
4140 // Otherwise, it is clean if every field's bit N is unpoison
4141 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4142 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4143
4144 setShadow(&I, S);
4145 setOrigin(&I, getOrigin(&I, 0));
4146 }
4147
4148 // Instrument vector.reduce.and intrinsic.
4149 // Valid (non-poisoned) unset bits in the operand pull down the
4150 // corresponding shadow bits.
4151 void handleVectorReduceAndIntrinsic(IntrinsicInst &I) {
4152 assert(I.arg_size() == 1);
4153
4154 IRBuilder<> IRB(&I);
4155 Value *OperandShadow = getShadow(&I, 0);
4156 Value *OperandSetOrPoison = IRB.CreateOr(I.getOperand(0), OperandShadow);
4157 // Bit N is clean if any field's bit N is 0 and unpoison
4158 Value *OutShadowMask = IRB.CreateAndReduce(OperandSetOrPoison);
4159 // Otherwise, it is clean if every field's bit N is unpoison
4160 Value *OrShadow = IRB.CreateOrReduce(OperandShadow);
4161 Value *S = IRB.CreateAnd(OutShadowMask, OrShadow);
4162
4163 setShadow(&I, S);
4164 setOrigin(&I, getOrigin(&I, 0));
4165 }
4166
4167 void handleStmxcsr(IntrinsicInst &I) {
4168 IRBuilder<> IRB(&I);
4169 Value *Addr = I.getArgOperand(0);
4170 Type *Ty = IRB.getInt32Ty();
4171 Value *ShadowPtr =
4172 getShadowOriginPtr(Addr, IRB, Ty, Align(1), /*isStore*/ true).first;
4173
4174 IRB.CreateStore(getCleanShadow(Ty), ShadowPtr);
4175
4177 insertCheckShadowOf(Addr, &I);
4178 }
4179
4180 void handleLdmxcsr(IntrinsicInst &I) {
4181 if (!InsertChecks)
4182 return;
4183
4184 IRBuilder<> IRB(&I);
4185 Value *Addr = I.getArgOperand(0);
4186 Type *Ty = IRB.getInt32Ty();
4187 const Align Alignment = Align(1);
4188 Value *ShadowPtr, *OriginPtr;
4189 std::tie(ShadowPtr, OriginPtr) =
4190 getShadowOriginPtr(Addr, IRB, Ty, Alignment, /*isStore*/ false);
4191
4193 insertCheckShadowOf(Addr, &I);
4194
4195 Value *Shadow = IRB.CreateAlignedLoad(Ty, ShadowPtr, Alignment, "_ldmxcsr");
4196 Value *Origin = MS.TrackOrigins ? IRB.CreateLoad(MS.OriginTy, OriginPtr)
4197 : getCleanOrigin();
4198 insertCheckShadow(Shadow, Origin, &I);
4199 }
4200
4201 void handleMaskedExpandLoad(IntrinsicInst &I) {
4202 IRBuilder<> IRB(&I);
4203 Value *Ptr = I.getArgOperand(0);
4204 MaybeAlign Align = I.getParamAlign(0);
4205 Value *Mask = I.getArgOperand(1);
4206 Value *PassThru = I.getArgOperand(2);
4207
4209 insertCheckShadowOf(Ptr, &I);
4210 insertCheckShadowOf(Mask, &I);
4211 }
4212
4213 if (!PropagateShadow) {
4214 setShadow(&I, getCleanShadow(&I));
4215 setOrigin(&I, getCleanOrigin());
4216 return;
4217 }
4218
4219 Type *ShadowTy = getShadowTy(&I);
4220 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4221 auto [ShadowPtr, OriginPtr] =
4222 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ false);
4223
4224 Value *Shadow =
4225 IRB.CreateMaskedExpandLoad(ShadowTy, ShadowPtr, Align, Mask,
4226 getShadow(PassThru), "_msmaskedexpload");
4227
4228 setShadow(&I, Shadow);
4229
4230 // TODO: Store origins.
4231 setOrigin(&I, getCleanOrigin());
4232 }
4233
4234 void handleMaskedCompressStore(IntrinsicInst &I) {
4235 IRBuilder<> IRB(&I);
4236 Value *Values = I.getArgOperand(0);
4237 Value *Ptr = I.getArgOperand(1);
4238 MaybeAlign Align = I.getParamAlign(1);
4239 Value *Mask = I.getArgOperand(2);
4240
4242 insertCheckShadowOf(Ptr, &I);
4243 insertCheckShadowOf(Mask, &I);
4244 }
4245
4246 Value *Shadow = getShadow(Values);
4247 Type *ElementShadowTy =
4248 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4249 auto [ShadowPtr, OriginPtrs] =
4250 getShadowOriginPtr(Ptr, IRB, ElementShadowTy, Align, /*isStore*/ true);
4251
4252 IRB.CreateMaskedCompressStore(Shadow, ShadowPtr, Align, Mask);
4253
4254 // TODO: Store origins.
4255 }
4256
4257 void handleMaskedGather(IntrinsicInst &I) {
4258 IRBuilder<> IRB(&I);
4259 Value *Ptrs = I.getArgOperand(0);
4260 const Align Alignment = I.getParamAlign(0).valueOrOne();
4261 Value *Mask = I.getArgOperand(1);
4262 Value *PassThru = I.getArgOperand(2);
4263
4264 Type *PtrsShadowTy = getShadowTy(Ptrs);
4266 insertCheckShadowOf(Mask, &I);
4267 Value *MaskedPtrShadow = IRB.CreateSelect(
4268 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4269 "_msmaskedptrs");
4270 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4271 }
4272
4273 if (!PropagateShadow) {
4274 setShadow(&I, getCleanShadow(&I));
4275 setOrigin(&I, getCleanOrigin());
4276 return;
4277 }
4278
4279 Type *ShadowTy = getShadowTy(&I);
4280 Type *ElementShadowTy = cast<VectorType>(ShadowTy)->getElementType();
4281 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4282 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ false);
4283
4284 Value *Shadow =
4285 IRB.CreateMaskedGather(ShadowTy, ShadowPtrs, Alignment, Mask,
4286 getShadow(PassThru), "_msmaskedgather");
4287
4288 setShadow(&I, Shadow);
4289
4290 // TODO: Store origins.
4291 setOrigin(&I, getCleanOrigin());
4292 }
4293
4294 void handleMaskedScatter(IntrinsicInst &I) {
4295 IRBuilder<> IRB(&I);
4296 Value *Values = I.getArgOperand(0);
4297 Value *Ptrs = I.getArgOperand(1);
4298 const Align Alignment = I.getParamAlign(1).valueOrOne();
4299 Value *Mask = I.getArgOperand(2);
4300
4301 Type *PtrsShadowTy = getShadowTy(Ptrs);
4303 insertCheckShadowOf(Mask, &I);
4304 Value *MaskedPtrShadow = IRB.CreateSelect(
4305 Mask, getShadow(Ptrs), Constant::getNullValue((PtrsShadowTy)),
4306 "_msmaskedptrs");
4307 insertCheckShadow(MaskedPtrShadow, getOrigin(Ptrs), &I);
4308 }
4309
4310 Value *Shadow = getShadow(Values);
4311 Type *ElementShadowTy =
4312 getShadowTy(cast<VectorType>(Values->getType())->getElementType());
4313 auto [ShadowPtrs, OriginPtrs] = getShadowOriginPtr(
4314 Ptrs, IRB, ElementShadowTy, Alignment, /*isStore*/ true);
4315
4316 IRB.CreateMaskedScatter(Shadow, ShadowPtrs, Alignment, Mask);
4317
4318 // TODO: Store origin.
4319 }
4320
4321 // Intrinsic::masked_store
4322 //
4323 // Note: handleAVXMaskedStore handles AVX/AVX2 variants, though AVX512 masked
4324 // stores are lowered to Intrinsic::masked_store.
4325 void handleMaskedStore(IntrinsicInst &I) {
4326 IRBuilder<> IRB(&I);
4327 Value *V = I.getArgOperand(0);
4328 Value *Ptr = I.getArgOperand(1);
4329 const Align Alignment = I.getParamAlign(1).valueOrOne();
4330 Value *Mask = I.getArgOperand(2);
4331 Value *Shadow = getShadow(V);
4332
4334 insertCheckShadowOf(Ptr, &I);
4335 insertCheckShadowOf(Mask, &I);
4336 }
4337
4338 Value *ShadowPtr;
4339 Value *OriginPtr;
4340 std::tie(ShadowPtr, OriginPtr) = getShadowOriginPtr(
4341 Ptr, IRB, Shadow->getType(), Alignment, /*isStore*/ true);
4342
4343 IRB.CreateMaskedStore(Shadow, ShadowPtr, Alignment, Mask);
4344
4345 if (!MS.TrackOrigins)
4346 return;
4347
4348 auto &DL = F.getDataLayout();
4349 paintOrigin(IRB, getOrigin(V), OriginPtr,
4350 DL.getTypeStoreSize(Shadow->getType()),
4351 std::max(Alignment, kMinOriginAlignment));
4352 }
4353
4354 // Intrinsic::masked_load
4355 //
4356 // Note: handleAVXMaskedLoad handles AVX/AVX2 variants, though AVX512 masked
4357 // loads are lowered to Intrinsic::masked_load.
4358 void handleMaskedLoad(IntrinsicInst &I) {
4359 IRBuilder<> IRB(&I);
4360 Value *Ptr = I.getArgOperand(0);
4361 const Align Alignment = I.getParamAlign(0).valueOrOne();
4362 Value *Mask = I.getArgOperand(1);
4363 Value *PassThru = I.getArgOperand(2);
4364
4366 insertCheckShadowOf(Ptr, &I);
4367 insertCheckShadowOf(Mask, &I);
4368 }
4369
4370 if (!PropagateShadow) {
4371 setShadow(&I, getCleanShadow(&I));
4372 setOrigin(&I, getCleanOrigin());
4373 return;
4374 }
4375
4376 Type *ShadowTy = getShadowTy(&I);
4377 Value *ShadowPtr, *OriginPtr;
4378 std::tie(ShadowPtr, OriginPtr) =
4379 getShadowOriginPtr(Ptr, IRB, ShadowTy, Alignment, /*isStore*/ false);
4380 setShadow(&I, IRB.CreateMaskedLoad(ShadowTy, ShadowPtr, Alignment, Mask,
4381 getShadow(PassThru), "_msmaskedld"));
4382
4383 if (!MS.TrackOrigins)
4384 return;
4385
4386 // Choose between PassThru's and the loaded value's origins.
4387 Value *MaskedPassThruShadow = IRB.CreateAnd(
4388 getShadow(PassThru), IRB.CreateSExt(IRB.CreateNeg(Mask), ShadowTy));
4389
4390 Value *NotNull = convertToBool(MaskedPassThruShadow, IRB, "_mscmp");
4391
4392 Value *PtrOrigin = IRB.CreateLoad(MS.OriginTy, OriginPtr);
4393 Value *Origin = IRB.CreateSelect(NotNull, getOrigin(PassThru), PtrOrigin);
4394
4395 setOrigin(&I, Origin);
4396 }
4397
4398 // e.g., void @llvm.x86.avx.maskstore.ps.256(ptr, <8 x i32>, <8 x float>)
4399 // dst mask src
4400 //
4401 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4402 // by handleMaskedStore.
4403 //
4404 // This function handles AVX and AVX2 masked stores; these use the MSBs of a
4405 // vector of integers, unlike the LLVM masked intrinsics, which require a
4406 // vector of booleans. X86InstCombineIntrinsic.cpp::simplifyX86MaskedLoad
4407 // mentions that the x86 backend does not know how to efficiently convert
4408 // from a vector of booleans back into the AVX mask format; therefore, they
4409 // (and we) do not reduce AVX/AVX2 masked intrinsics into LLVM masked
4410 // intrinsics.
4411 void handleAVXMaskedStore(IntrinsicInst &I) {
4412 assert(I.arg_size() == 3);
4413
4414 IRBuilder<> IRB(&I);
4415
4416 Value *Dst = I.getArgOperand(0);
4417 assert(Dst->getType()->isPointerTy() && "Destination is not a pointer!");
4418
4419 Value *Mask = I.getArgOperand(1);
4420 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4421
4422 Value *Src = I.getArgOperand(2);
4423 assert(isa<VectorType>(Src->getType()) && "Source is not a vector!");
4424
4425 const Align Alignment = Align(1);
4426
4427 Value *SrcShadow = getShadow(Src);
4428
4430 insertCheckShadowOf(Dst, &I);
4431 insertCheckShadowOf(Mask, &I);
4432 }
4433
4434 Value *DstShadowPtr;
4435 Value *DstOriginPtr;
4436 std::tie(DstShadowPtr, DstOriginPtr) = getShadowOriginPtr(
4437 Dst, IRB, SrcShadow->getType(), Alignment, /*isStore*/ true);
4438
4439 SmallVector<Value *, 2> ShadowArgs;
4440 ShadowArgs.append(1, DstShadowPtr);
4441 ShadowArgs.append(1, Mask);
4442 // The intrinsic may require floating-point but shadows can be arbitrary
4443 // bit patterns, of which some would be interpreted as "invalid"
4444 // floating-point values (NaN etc.); we assume the intrinsic will happily
4445 // copy them.
4446 ShadowArgs.append(1, IRB.CreateBitCast(SrcShadow, Src->getType()));
4447
4448 CallInst *CI =
4449 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
4450 setShadow(&I, CI);
4451
4452 if (!MS.TrackOrigins)
4453 return;
4454
4455 // Approximation only
4456 auto &DL = F.getDataLayout();
4457 paintOrigin(IRB, getOrigin(Src), DstOriginPtr,
4458 DL.getTypeStoreSize(SrcShadow->getType()),
4459 std::max(Alignment, kMinOriginAlignment));
4460 }
4461
4462 // e.g., <8 x float> @llvm.x86.avx.maskload.ps.256(ptr, <8 x i32>)
4463 // return src mask
4464 //
4465 // Masked-off values are replaced with 0, which conveniently also represents
4466 // initialized memory.
4467 //
4468 // AVX512 masked stores are lowered to Intrinsic::masked_load and are handled
4469 // by handleMaskedStore.
4470 //
4471 // We do not combine this with handleMaskedLoad; see comment in
4472 // handleAVXMaskedStore for the rationale.
4473 //
4474 // This is subtly different than handleIntrinsicByApplyingToShadow(I, 1)
4475 // because we need to apply getShadowOriginPtr, not getShadow, to the first
4476 // parameter.
4477 void handleAVXMaskedLoad(IntrinsicInst &I) {
4478 assert(I.arg_size() == 2);
4479
4480 IRBuilder<> IRB(&I);
4481
4482 Value *Src = I.getArgOperand(0);
4483 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
4484
4485 Value *Mask = I.getArgOperand(1);
4486 assert(isa<VectorType>(Mask->getType()) && "Mask is not a vector!");
4487
4488 const Align Alignment = Align(1);
4489
4491 insertCheckShadowOf(Mask, &I);
4492 }
4493
4494 Type *SrcShadowTy = getShadowTy(Src);
4495 Value *SrcShadowPtr, *SrcOriginPtr;
4496 std::tie(SrcShadowPtr, SrcOriginPtr) =
4497 getShadowOriginPtr(Src, IRB, SrcShadowTy, Alignment, /*isStore*/ false);
4498
4499 SmallVector<Value *, 2> ShadowArgs;
4500 ShadowArgs.append(1, SrcShadowPtr);
4501 ShadowArgs.append(1, Mask);
4502
4503 CallInst *CI =
4504 IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(), ShadowArgs);
4505 // The AVX masked load intrinsics do not have integer variants. We use the
4506 // floating-point variants, which will happily copy the shadows even if
4507 // they are interpreted as "invalid" floating-point values (NaN etc.).
4508 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4509
4510 if (!MS.TrackOrigins)
4511 return;
4512
4513 // The "pass-through" value is always zero (initialized). To the extent
4514 // that that results in initialized aligned 4-byte chunks, the origin value
4515 // is ignored. It is therefore correct to simply copy the origin from src.
4516 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
4517 setOrigin(&I, PtrSrcOrigin);
4518 }
4519
4520 // Test whether the mask indices are initialized, only checking the bits that
4521 // are actually used.
4522 //
4523 // e.g., if Idx is <32 x i16>, only (log2(32) == 5) bits of each index are
4524 // used/checked.
4525 void maskedCheckAVXIndexShadow(IRBuilder<> &IRB, Value *Idx, Instruction *I) {
4526 assert(isFixedIntVector(Idx));
4527 auto IdxVectorSize =
4528 cast<FixedVectorType>(Idx->getType())->getNumElements();
4529 assert(isPowerOf2_64(IdxVectorSize));
4530
4531 // Compiler isn't smart enough, let's help it
4532 if (isa<Constant>(Idx))
4533 return;
4534
4535 auto *IdxShadow = getShadow(Idx);
4536 Value *Truncated = IRB.CreateTrunc(
4537 IdxShadow,
4538 FixedVectorType::get(Type::getIntNTy(*MS.C, Log2_64(IdxVectorSize)),
4539 IdxVectorSize));
4540 insertCheckShadow(Truncated, getOrigin(Idx), I);
4541 }
4542
4543 // Instrument AVX permutation intrinsic.
4544 // We apply the same permutation (argument index 1) to the shadow.
4545 void handleAVXVpermilvar(IntrinsicInst &I) {
4546 IRBuilder<> IRB(&I);
4547 Value *Shadow = getShadow(&I, 0);
4548 maskedCheckAVXIndexShadow(IRB, I.getArgOperand(1), &I);
4549
4550 // Shadows are integer-ish types but some intrinsics require a
4551 // different (e.g., floating-point) type.
4552 Shadow = IRB.CreateBitCast(Shadow, I.getArgOperand(0)->getType());
4553 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4554 {Shadow, I.getArgOperand(1)});
4555
4556 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4557 setOriginForNaryOp(I);
4558 }
4559
4560 // Instrument AVX permutation intrinsic.
4561 // We apply the same permutation (argument index 1) to the shadows.
4562 void handleAVXVpermi2var(IntrinsicInst &I) {
4563 assert(I.arg_size() == 3);
4564 assert(isa<FixedVectorType>(I.getArgOperand(0)->getType()));
4565 assert(isa<FixedVectorType>(I.getArgOperand(1)->getType()));
4566 assert(isa<FixedVectorType>(I.getArgOperand(2)->getType()));
4567 [[maybe_unused]] auto ArgVectorSize =
4568 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4569 assert(cast<FixedVectorType>(I.getArgOperand(1)->getType())
4570 ->getNumElements() == ArgVectorSize);
4571 assert(cast<FixedVectorType>(I.getArgOperand(2)->getType())
4572 ->getNumElements() == ArgVectorSize);
4573 assert(I.getArgOperand(0)->getType() == I.getArgOperand(2)->getType());
4574 assert(I.getType() == I.getArgOperand(0)->getType());
4575 assert(I.getArgOperand(1)->getType()->isIntOrIntVectorTy());
4576 IRBuilder<> IRB(&I);
4577 Value *AShadow = getShadow(&I, 0);
4578 Value *Idx = I.getArgOperand(1);
4579 Value *BShadow = getShadow(&I, 2);
4580
4581 maskedCheckAVXIndexShadow(IRB, Idx, &I);
4582
4583 // Shadows are integer-ish types but some intrinsics require a
4584 // different (e.g., floating-point) type.
4585 AShadow = IRB.CreateBitCast(AShadow, I.getArgOperand(0)->getType());
4586 BShadow = IRB.CreateBitCast(BShadow, I.getArgOperand(2)->getType());
4587 CallInst *CI = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
4588 {AShadow, Idx, BShadow});
4589 setShadow(&I, IRB.CreateBitCast(CI, getShadowTy(&I)));
4590 setOriginForNaryOp(I);
4591 }
4592
4593 [[maybe_unused]] static bool isFixedIntVectorTy(const Type *T) {
4594 return isa<FixedVectorType>(T) && T->isIntOrIntVectorTy();
4595 }
4596
4597 [[maybe_unused]] static bool isFixedFPVectorTy(const Type *T) {
4598 return isa<FixedVectorType>(T) && T->isFPOrFPVectorTy();
4599 }
4600
4601 [[maybe_unused]] static bool isFixedIntVector(const Value *V) {
4602 return isFixedIntVectorTy(V->getType());
4603 }
4604
4605 [[maybe_unused]] static bool isFixedFPVector(const Value *V) {
4606 return isFixedFPVectorTy(V->getType());
4607 }
4608
4609 // e.g., <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
4610 // (<16 x float> a, <16 x i32> writethru, i16 mask,
4611 // i32 rounding)
4612 //
4613 // Inconveniently, some similar intrinsics have a different operand order:
4614 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
4615 // (<16 x float> a, i32 rounding, <16 x i16> writethru,
4616 // i16 mask)
4617 //
4618 // If the return type has more elements than A, the excess elements are
4619 // zeroed (and the corresponding shadow is initialized).
4620 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
4621 // (<4 x float> a, i32 rounding, <8 x i16> writethru,
4622 // i8 mask)
4623 //
4624 // dst[i] = mask[i] ? convert(a[i]) : writethru[i]
4625 // dst_shadow[i] = mask[i] ? all_or_nothing(a_shadow[i]) : writethru_shadow[i]
4626 // where all_or_nothing(x) is fully uninitialized if x has any
4627 // uninitialized bits
4628 void handleAVX512VectorConvertFPToInt(IntrinsicInst &I, bool LastMask) {
4629 IRBuilder<> IRB(&I);
4630
4631 assert(I.arg_size() == 4);
4632 Value *A = I.getOperand(0);
4633 Value *WriteThrough;
4634 Value *Mask;
4636 if (LastMask) {
4637 WriteThrough = I.getOperand(2);
4638 Mask = I.getOperand(3);
4639 RoundingMode = I.getOperand(1);
4640 } else {
4641 WriteThrough = I.getOperand(1);
4642 Mask = I.getOperand(2);
4643 RoundingMode = I.getOperand(3);
4644 }
4645
4646 assert(isFixedFPVector(A));
4647 assert(isFixedIntVector(WriteThrough));
4648
4649 unsigned ANumElements =
4650 cast<FixedVectorType>(A->getType())->getNumElements();
4651 [[maybe_unused]] unsigned WriteThruNumElements =
4652 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4653 assert(ANumElements == WriteThruNumElements ||
4654 ANumElements * 2 == WriteThruNumElements);
4655
4656 assert(Mask->getType()->isIntegerTy());
4657 unsigned MaskNumElements = Mask->getType()->getScalarSizeInBits();
4658 assert(ANumElements == MaskNumElements ||
4659 ANumElements * 2 == MaskNumElements);
4660
4661 assert(WriteThruNumElements == MaskNumElements);
4662
4663 // Some bits of the mask may be unused, though it's unusual to have partly
4664 // uninitialized bits.
4665 insertCheckShadowOf(Mask, &I);
4666
4667 assert(RoundingMode->getType()->isIntegerTy());
4668 // Only some bits of the rounding mode are used, though it's very
4669 // unusual to have uninitialized bits there (more commonly, it's a
4670 // constant).
4671 insertCheckShadowOf(RoundingMode, &I);
4672
4673 assert(I.getType() == WriteThrough->getType());
4674
4675 Value *AShadow = getShadow(A);
4676 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4677
4678 if (ANumElements * 2 == MaskNumElements) {
4679 // Ensure that the irrelevant bits of the mask are zero, hence selecting
4680 // from the zeroed shadow instead of the writethrough's shadow.
4681 Mask =
4682 IRB.CreateTrunc(Mask, IRB.getIntNTy(ANumElements), "_ms_mask_trunc");
4683 Mask =
4684 IRB.CreateZExt(Mask, IRB.getIntNTy(MaskNumElements), "_ms_mask_zext");
4685 }
4686
4687 // Convert i16 mask to <16 x i1>
4688 Mask = IRB.CreateBitCast(
4689 Mask, FixedVectorType::get(IRB.getInt1Ty(), MaskNumElements),
4690 "_ms_mask_bitcast");
4691
4692 /// For floating-point to integer conversion, the output is:
4693 /// - fully uninitialized if *any* bit of the input is uninitialized
4694 /// - fully ininitialized if all bits of the input are ininitialized
4695 /// We apply the same principle on a per-element basis for vectors.
4696 ///
4697 /// We use the scalar width of the return type instead of A's.
4698 AShadow = IRB.CreateSExt(
4699 IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow->getType())),
4700 getShadowTy(&I), "_ms_a_shadow");
4701
4702 Value *WriteThroughShadow = getShadow(WriteThrough);
4703 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow,
4704 "_ms_writethru_select");
4705
4706 setShadow(&I, Shadow);
4707 setOriginForNaryOp(I);
4708 }
4709
4710 // Instrument BMI / BMI2 intrinsics.
4711 // All of these intrinsics are Z = I(X, Y)
4712 // where the types of all operands and the result match, and are either i32 or
4713 // i64. The following instrumentation happens to work for all of them:
4714 // Sz = I(Sx, Y) | (sext (Sy != 0))
4715 void handleBmiIntrinsic(IntrinsicInst &I) {
4716 IRBuilder<> IRB(&I);
4717 Type *ShadowTy = getShadowTy(&I);
4718
4719 // If any bit of the mask operand is poisoned, then the whole thing is.
4720 Value *SMask = getShadow(&I, 1);
4721 SMask = IRB.CreateSExt(IRB.CreateICmpNE(SMask, getCleanShadow(ShadowTy)),
4722 ShadowTy);
4723 // Apply the same intrinsic to the shadow of the first operand.
4724 Value *S = IRB.CreateCall(I.getCalledFunction(),
4725 {getShadow(&I, 0), I.getOperand(1)});
4726 S = IRB.CreateOr(SMask, S);
4727 setShadow(&I, S);
4728 setOriginForNaryOp(I);
4729 }
4730
4731 static SmallVector<int, 8> getPclmulMask(unsigned Width, bool OddElements) {
4732 SmallVector<int, 8> Mask;
4733 for (unsigned X = OddElements ? 1 : 0; X < Width; X += 2) {
4734 Mask.append(2, X);
4735 }
4736 return Mask;
4737 }
4738
4739 // Instrument pclmul intrinsics.
4740 // These intrinsics operate either on odd or on even elements of the input
4741 // vectors, depending on the constant in the 3rd argument, ignoring the rest.
4742 // Replace the unused elements with copies of the used ones, ex:
4743 // (0, 1, 2, 3) -> (0, 0, 2, 2) (even case)
4744 // or
4745 // (0, 1, 2, 3) -> (1, 1, 3, 3) (odd case)
4746 // and then apply the usual shadow combining logic.
4747 void handlePclmulIntrinsic(IntrinsicInst &I) {
4748 IRBuilder<> IRB(&I);
4749 unsigned Width =
4750 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4751 assert(isa<ConstantInt>(I.getArgOperand(2)) &&
4752 "pclmul 3rd operand must be a constant");
4753 unsigned Imm = cast<ConstantInt>(I.getArgOperand(2))->getZExtValue();
4754 Value *Shuf0 = IRB.CreateShuffleVector(getShadow(&I, 0),
4755 getPclmulMask(Width, Imm & 0x01));
4756 Value *Shuf1 = IRB.CreateShuffleVector(getShadow(&I, 1),
4757 getPclmulMask(Width, Imm & 0x10));
4758 ShadowAndOriginCombiner SOC(this, IRB);
4759 SOC.Add(Shuf0, getOrigin(&I, 0));
4760 SOC.Add(Shuf1, getOrigin(&I, 1));
4761 SOC.Done(&I);
4762 }
4763
4764 // Instrument _mm_*_sd|ss intrinsics
4765 void handleUnarySdSsIntrinsic(IntrinsicInst &I) {
4766 IRBuilder<> IRB(&I);
4767 unsigned Width =
4768 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4769 Value *First = getShadow(&I, 0);
4770 Value *Second = getShadow(&I, 1);
4771 // First element of second operand, remaining elements of first operand
4772 SmallVector<int, 16> Mask;
4773 Mask.push_back(Width);
4774 for (unsigned i = 1; i < Width; i++)
4775 Mask.push_back(i);
4776 Value *Shadow = IRB.CreateShuffleVector(First, Second, Mask);
4777
4778 setShadow(&I, Shadow);
4779 setOriginForNaryOp(I);
4780 }
4781
4782 void handleVtestIntrinsic(IntrinsicInst &I) {
4783 IRBuilder<> IRB(&I);
4784 Value *Shadow0 = getShadow(&I, 0);
4785 Value *Shadow1 = getShadow(&I, 1);
4786 Value *Or = IRB.CreateOr(Shadow0, Shadow1);
4787 Value *NZ = IRB.CreateICmpNE(Or, Constant::getNullValue(Or->getType()));
4788 Value *Scalar = convertShadowToScalar(NZ, IRB);
4789 Value *Shadow = IRB.CreateZExt(Scalar, getShadowTy(&I));
4790
4791 setShadow(&I, Shadow);
4792 setOriginForNaryOp(I);
4793 }
4794
4795 void handleBinarySdSsIntrinsic(IntrinsicInst &I) {
4796 IRBuilder<> IRB(&I);
4797 unsigned Width =
4798 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements();
4799 Value *First = getShadow(&I, 0);
4800 Value *Second = getShadow(&I, 1);
4801 Value *OrShadow = IRB.CreateOr(First, Second);
4802 // First element of both OR'd together, remaining elements of first operand
4803 SmallVector<int, 16> Mask;
4804 Mask.push_back(Width);
4805 for (unsigned i = 1; i < Width; i++)
4806 Mask.push_back(i);
4807 Value *Shadow = IRB.CreateShuffleVector(First, OrShadow, Mask);
4808
4809 setShadow(&I, Shadow);
4810 setOriginForNaryOp(I);
4811 }
4812
4813 // _mm_round_ps / _mm_round_ps.
4814 // Similar to maybeHandleSimpleNomemIntrinsic except
4815 // the second argument is guaranteed to be a constant integer.
4816 void handleRoundPdPsIntrinsic(IntrinsicInst &I) {
4817 assert(I.getArgOperand(0)->getType() == I.getType());
4818 assert(I.arg_size() == 2);
4819 assert(isa<ConstantInt>(I.getArgOperand(1)));
4820
4821 IRBuilder<> IRB(&I);
4822 ShadowAndOriginCombiner SC(this, IRB);
4823 SC.Add(I.getArgOperand(0));
4824 SC.Done(&I);
4825 }
4826
4827 // Instrument @llvm.abs intrinsic.
4828 //
4829 // e.g., i32 @llvm.abs.i32 (i32 <Src>, i1 <is_int_min_poison>)
4830 // <4 x i32> @llvm.abs.v4i32(<4 x i32> <Src>, i1 <is_int_min_poison>)
4831 void handleAbsIntrinsic(IntrinsicInst &I) {
4832 assert(I.arg_size() == 2);
4833 Value *Src = I.getArgOperand(0);
4834 Value *IsIntMinPoison = I.getArgOperand(1);
4835
4836 assert(I.getType()->isIntOrIntVectorTy());
4837
4838 assert(Src->getType() == I.getType());
4839
4840 assert(IsIntMinPoison->getType()->isIntegerTy());
4841 assert(IsIntMinPoison->getType()->getIntegerBitWidth() == 1);
4842
4843 IRBuilder<> IRB(&I);
4844 Value *SrcShadow = getShadow(Src);
4845
4846 APInt MinVal =
4847 APInt::getSignedMinValue(Src->getType()->getScalarSizeInBits());
4848 Value *MinValVec = ConstantInt::get(Src->getType(), MinVal);
4849 Value *SrcIsMin = IRB.CreateICmp(CmpInst::ICMP_EQ, Src, MinValVec);
4850
4851 Value *PoisonedShadow = getPoisonedShadow(Src);
4852 Value *PoisonedIfIntMinShadow =
4853 IRB.CreateSelect(SrcIsMin, PoisonedShadow, SrcShadow);
4854 Value *Shadow =
4855 IRB.CreateSelect(IsIntMinPoison, PoisonedIfIntMinShadow, SrcShadow);
4856
4857 setShadow(&I, Shadow);
4858 setOrigin(&I, getOrigin(&I, 0));
4859 }
4860
4861 void handleIsFpClass(IntrinsicInst &I) {
4862 IRBuilder<> IRB(&I);
4863 Value *Shadow = getShadow(&I, 0);
4864 setShadow(&I, IRB.CreateICmpNE(Shadow, getCleanShadow(Shadow)));
4865 setOrigin(&I, getOrigin(&I, 0));
4866 }
4867
4868 void handleArithmeticWithOverflow(IntrinsicInst &I) {
4869 IRBuilder<> IRB(&I);
4870 Value *Shadow0 = getShadow(&I, 0);
4871 Value *Shadow1 = getShadow(&I, 1);
4872 Value *ShadowElt0 = IRB.CreateOr(Shadow0, Shadow1);
4873 Value *ShadowElt1 =
4874 IRB.CreateICmpNE(ShadowElt0, getCleanShadow(ShadowElt0));
4875
4876 Value *Shadow = PoisonValue::get(getShadowTy(&I));
4877 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt0, 0);
4878 Shadow = IRB.CreateInsertValue(Shadow, ShadowElt1, 1);
4879
4880 setShadow(&I, Shadow);
4881 setOriginForNaryOp(I);
4882 }
4883
4884 Value *extractLowerShadow(IRBuilder<> &IRB, Value *V) {
4885 assert(isa<FixedVectorType>(V->getType()));
4886 assert(cast<FixedVectorType>(V->getType())->getNumElements() > 0);
4887 Value *Shadow = getShadow(V);
4888 return IRB.CreateExtractElement(Shadow,
4889 ConstantInt::get(IRB.getInt32Ty(), 0));
4890 }
4891
4892 // Handle llvm.x86.avx512.mask.pmov{,s,us}.*.512
4893 //
4894 // e.g., call <16 x i8> @llvm.x86.avx512.mask.pmov.qb.512
4895 // (<8 x i64>, <16 x i8>, i8)
4896 // A WriteThru Mask
4897 //
4898 // call <16 x i8> @llvm.x86.avx512.mask.pmovs.db.512
4899 // (<16 x i32>, <16 x i8>, i16)
4900 //
4901 // Dst[i] = Mask[i] ? truncate_or_saturate(A[i]) : WriteThru[i]
4902 // Dst_shadow[i] = Mask[i] ? truncate(A_shadow[i]) : WriteThru_shadow[i]
4903 //
4904 // If Dst has more elements than A, the excess elements are zeroed (and the
4905 // corresponding shadow is initialized).
4906 //
4907 // Note: for PMOV (truncation), handleIntrinsicByApplyingToShadow is precise
4908 // and is much faster than this handler.
4909 void handleAVX512VectorDownConvert(IntrinsicInst &I) {
4910 IRBuilder<> IRB(&I);
4911
4912 assert(I.arg_size() == 3);
4913 Value *A = I.getOperand(0);
4914 Value *WriteThrough = I.getOperand(1);
4915 Value *Mask = I.getOperand(2);
4916
4917 assert(isFixedIntVector(A));
4918 assert(isFixedIntVector(WriteThrough));
4919
4920 unsigned ANumElements =
4921 cast<FixedVectorType>(A->getType())->getNumElements();
4922 unsigned OutputNumElements =
4923 cast<FixedVectorType>(WriteThrough->getType())->getNumElements();
4924 assert(ANumElements == OutputNumElements ||
4925 ANumElements * 2 == OutputNumElements);
4926
4927 assert(Mask->getType()->isIntegerTy());
4928 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
4929 insertCheckShadowOf(Mask, &I);
4930
4931 assert(I.getType() == WriteThrough->getType());
4932
4933 // Widen the mask, if necessary, to have one bit per element of the output
4934 // vector.
4935 // We want the extra bits to have '1's, so that the CreateSelect will
4936 // select the values from AShadow instead of WriteThroughShadow ("maskless"
4937 // versions of the intrinsics are sometimes implemented using an all-1's
4938 // mask and an undefined value for WriteThroughShadow). We accomplish this
4939 // by using bitwise NOT before and after the ZExt.
4940 if (ANumElements != OutputNumElements) {
4941 Mask = IRB.CreateNot(Mask);
4942 Mask = IRB.CreateZExt(Mask, Type::getIntNTy(*MS.C, OutputNumElements),
4943 "_ms_widen_mask");
4944 Mask = IRB.CreateNot(Mask);
4945 }
4946 Mask = IRB.CreateBitCast(
4947 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
4948
4949 Value *AShadow = getShadow(A);
4950
4951 // The return type might have more elements than the input.
4952 // Temporarily shrink the return type's number of elements.
4953 VectorType *ShadowType = maybeShrinkVectorShadowType(A, I);
4954
4955 // PMOV truncates; PMOVS/PMOVUS uses signed/unsigned saturation.
4956 // This handler treats them all as truncation, which leads to some rare
4957 // false positives in the cases where the truncated bytes could
4958 // unambiguously saturate the value e.g., if A = ??????10 ????????
4959 // (big-endian), the unsigned saturated byte conversion is 11111111 i.e.,
4960 // fully defined, but the truncated byte is ????????.
4961 //
4962 // TODO: use GetMinMaxUnsigned() to handle saturation precisely.
4963 AShadow = IRB.CreateTrunc(AShadow, ShadowType, "_ms_trunc_shadow");
4964 AShadow = maybeExtendVectorShadowWithZeros(AShadow, I);
4965
4966 Value *WriteThroughShadow = getShadow(WriteThrough);
4967
4968 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThroughShadow);
4969 setShadow(&I, Shadow);
4970 setOriginForNaryOp(I);
4971 }
4972
4973 // Handle llvm.x86.avx512.* instructions that take a vector of floating-point
4974 // values and perform an operation whose shadow propagation should be handled
4975 // as all-or-nothing [*], with masking provided by a vector and a mask
4976 // supplied as an integer.
4977 //
4978 // [*] if all bits of a vector element are initialized, the output is fully
4979 // initialized; otherwise, the output is fully uninitialized
4980 //
4981 // e.g., <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
4982 // (<16 x float>, <16 x float>, i16)
4983 // A WriteThru Mask
4984 //
4985 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
4986 // (<2 x double>, <2 x double>, i8)
4987 //
4988 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
4989 // (<8 x double>, i32, <8 x double>, i8, i32)
4990 // A Imm WriteThru Mask Rounding
4991 //
4992 // All operands other than A and WriteThru (e.g., Mask, Imm, Rounding) must
4993 // be fully initialized.
4994 //
4995 // Dst[i] = Mask[i] ? some_op(A[i]) : WriteThru[i]
4996 // Dst_shadow[i] = Mask[i] ? all_or_nothing(A_shadow[i]) : WriteThru_shadow[i]
4997 void handleAVX512VectorGenericMaskedFP(IntrinsicInst &I, unsigned AIndex,
4998 unsigned WriteThruIndex,
4999 unsigned MaskIndex) {
5000 IRBuilder<> IRB(&I);
5001
5002 unsigned NumArgs = I.arg_size();
5003 assert(AIndex < NumArgs);
5004 assert(WriteThruIndex < NumArgs);
5005 assert(MaskIndex < NumArgs);
5006 assert(AIndex != WriteThruIndex);
5007 assert(AIndex != MaskIndex);
5008 assert(WriteThruIndex != MaskIndex);
5009
5010 Value *A = I.getOperand(AIndex);
5011 Value *WriteThru = I.getOperand(WriteThruIndex);
5012 Value *Mask = I.getOperand(MaskIndex);
5013
5014 assert(isFixedFPVector(A));
5015 assert(isFixedFPVector(WriteThru));
5016
5017 [[maybe_unused]] unsigned ANumElements =
5018 cast<FixedVectorType>(A->getType())->getNumElements();
5019 unsigned OutputNumElements =
5020 cast<FixedVectorType>(WriteThru->getType())->getNumElements();
5021 assert(ANumElements == OutputNumElements);
5022
5023 for (unsigned i = 0; i < NumArgs; ++i) {
5024 if (i != AIndex && i != WriteThruIndex) {
5025 // Imm, Mask, Rounding etc. are "control" data, hence we require that
5026 // they be fully initialized.
5027 assert(I.getOperand(i)->getType()->isIntegerTy());
5028 insertCheckShadowOf(I.getOperand(i), &I);
5029 }
5030 }
5031
5032 // The mask has 1 bit per element of A, but a minimum of 8 bits.
5033 if (Mask->getType()->getScalarSizeInBits() == 8 && ANumElements < 8)
5034 Mask = IRB.CreateTrunc(Mask, Type::getIntNTy(*MS.C, ANumElements));
5035 assert(Mask->getType()->getScalarSizeInBits() == ANumElements);
5036
5037 assert(I.getType() == WriteThru->getType());
5038
5039 Mask = IRB.CreateBitCast(
5040 Mask, FixedVectorType::get(IRB.getInt1Ty(), OutputNumElements));
5041
5042 Value *AShadow = getShadow(A);
5043
5044 // All-or-nothing shadow
5045 AShadow = IRB.CreateSExt(IRB.CreateICmpNE(AShadow, getCleanShadow(AShadow)),
5046 AShadow->getType());
5047
5048 Value *WriteThruShadow = getShadow(WriteThru);
5049
5050 Value *Shadow = IRB.CreateSelect(Mask, AShadow, WriteThruShadow);
5051 setShadow(&I, Shadow);
5052
5053 setOriginForNaryOp(I);
5054 }
5055
5056 // For sh.* compiler intrinsics:
5057 // llvm.x86.avx512fp16.mask.{add/sub/mul/div/max/min}.sh.round
5058 // (<8 x half>, <8 x half>, <8 x half>, i8, i32)
5059 // A B WriteThru Mask RoundingMode
5060 //
5061 // DstShadow[0] = Mask[0] ? (AShadow[0] | BShadow[0]) : WriteThruShadow[0]
5062 // DstShadow[1..7] = AShadow[1..7]
5063 void visitGenericScalarHalfwordInst(IntrinsicInst &I) {
5064 IRBuilder<> IRB(&I);
5065
5066 assert(I.arg_size() == 5);
5067 Value *A = I.getOperand(0);
5068 Value *B = I.getOperand(1);
5069 Value *WriteThrough = I.getOperand(2);
5070 Value *Mask = I.getOperand(3);
5071 Value *RoundingMode = I.getOperand(4);
5072
5073 // Technically, we could probably just check whether the LSB is
5074 // initialized, but intuitively it feels like a partly uninitialized mask
5075 // is unintended, and we should warn the user immediately.
5076 insertCheckShadowOf(Mask, &I);
5077 insertCheckShadowOf(RoundingMode, &I);
5078
5079 assert(isa<FixedVectorType>(A->getType()));
5080 unsigned NumElements =
5081 cast<FixedVectorType>(A->getType())->getNumElements();
5082 assert(NumElements == 8);
5083 assert(A->getType() == B->getType());
5084 assert(B->getType() == WriteThrough->getType());
5085 assert(Mask->getType()->getPrimitiveSizeInBits() == NumElements);
5086 assert(RoundingMode->getType()->isIntegerTy());
5087
5088 Value *ALowerShadow = extractLowerShadow(IRB, A);
5089 Value *BLowerShadow = extractLowerShadow(IRB, B);
5090
5091 Value *ABLowerShadow = IRB.CreateOr(ALowerShadow, BLowerShadow);
5092
5093 Value *WriteThroughLowerShadow = extractLowerShadow(IRB, WriteThrough);
5094
5095 Mask = IRB.CreateBitCast(
5096 Mask, FixedVectorType::get(IRB.getInt1Ty(), NumElements));
5097 Value *MaskLower =
5098 IRB.CreateExtractElement(Mask, ConstantInt::get(IRB.getInt32Ty(), 0));
5099
5100 Value *AShadow = getShadow(A);
5101 Value *DstLowerShadow =
5102 IRB.CreateSelect(MaskLower, ABLowerShadow, WriteThroughLowerShadow);
5103 Value *DstShadow = IRB.CreateInsertElement(
5104 AShadow, DstLowerShadow, ConstantInt::get(IRB.getInt32Ty(), 0),
5105 "_msprop");
5106
5107 setShadow(&I, DstShadow);
5108 setOriginForNaryOp(I);
5109 }
5110
5111 // Approximately handle AVX Galois Field Affine Transformation
5112 //
5113 // e.g.,
5114 // <16 x i8> @llvm.x86.vgf2p8affineqb.128(<16 x i8>, <16 x i8>, i8)
5115 // <32 x i8> @llvm.x86.vgf2p8affineqb.256(<32 x i8>, <32 x i8>, i8)
5116 // <64 x i8> @llvm.x86.vgf2p8affineqb.512(<64 x i8>, <64 x i8>, i8)
5117 // Out A x b
5118 // where A and x are packed matrices, b is a vector,
5119 // Out = A * x + b in GF(2)
5120 //
5121 // Multiplication in GF(2) is equivalent to bitwise AND. However, the matrix
5122 // computation also includes a parity calculation.
5123 //
5124 // For the bitwise AND of bits V1 and V2, the exact shadow is:
5125 // Out_Shadow = (V1_Shadow & V2_Shadow)
5126 // | (V1 & V2_Shadow)
5127 // | (V1_Shadow & V2 )
5128 //
5129 // We approximate the shadow of gf2p8affineqb using:
5130 // Out_Shadow = gf2p8affineqb(x_Shadow, A_shadow, 0)
5131 // | gf2p8affineqb(x, A_shadow, 0)
5132 // | gf2p8affineqb(x_Shadow, A, 0)
5133 // | set1_epi8(b_Shadow)
5134 //
5135 // This approximation has false negatives: if an intermediate dot-product
5136 // contains an even number of 1's, the parity is 0.
5137 // It has no false positives.
5138 void handleAVXGF2P8Affine(IntrinsicInst &I) {
5139 IRBuilder<> IRB(&I);
5140
5141 assert(I.arg_size() == 3);
5142 Value *A = I.getOperand(0);
5143 Value *X = I.getOperand(1);
5144 Value *B = I.getOperand(2);
5145
5146 assert(isFixedIntVector(A));
5147 assert(cast<VectorType>(A->getType())
5148 ->getElementType()
5149 ->getScalarSizeInBits() == 8);
5150
5151 assert(A->getType() == X->getType());
5152
5153 assert(B->getType()->isIntegerTy());
5154 assert(B->getType()->getScalarSizeInBits() == 8);
5155
5156 assert(I.getType() == A->getType());
5157
5158 Value *AShadow = getShadow(A);
5159 Value *XShadow = getShadow(X);
5160 Value *BZeroShadow = getCleanShadow(B);
5161
5162 CallInst *AShadowXShadow = IRB.CreateIntrinsic(
5163 I.getType(), I.getIntrinsicID(), {XShadow, AShadow, BZeroShadow});
5164 CallInst *AShadowX = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5165 {X, AShadow, BZeroShadow});
5166 CallInst *XShadowA = IRB.CreateIntrinsic(I.getType(), I.getIntrinsicID(),
5167 {XShadow, A, BZeroShadow});
5168
5169 unsigned NumElements = cast<FixedVectorType>(I.getType())->getNumElements();
5170 Value *BShadow = getShadow(B);
5171 Value *BBroadcastShadow = getCleanShadow(AShadow);
5172 // There is no LLVM IR intrinsic for _mm512_set1_epi8.
5173 // This loop generates a lot of LLVM IR, which we expect that CodeGen will
5174 // lower appropriately (e.g., VPBROADCASTB).
5175 // Besides, b is often a constant, in which case it is fully initialized.
5176 for (unsigned i = 0; i < NumElements; i++)
5177 BBroadcastShadow = IRB.CreateInsertElement(BBroadcastShadow, BShadow, i);
5178
5179 setShadow(&I, IRB.CreateOr(
5180 {AShadowXShadow, AShadowX, XShadowA, BBroadcastShadow}));
5181 setOriginForNaryOp(I);
5182 }
5183
5184 // Handle Arm NEON vector load intrinsics (vld*).
5185 //
5186 // The WithLane instructions (ld[234]lane) are similar to:
5187 // call {<4 x i32>, <4 x i32>, <4 x i32>}
5188 // @llvm.aarch64.neon.ld3lane.v4i32.p0
5189 // (<4 x i32> %L1, <4 x i32> %L2, <4 x i32> %L3, i64 %lane, ptr
5190 // %A)
5191 //
5192 // The non-WithLane instructions (ld[234], ld1x[234], ld[234]r) are similar
5193 // to:
5194 // call {<8 x i8>, <8 x i8>} @llvm.aarch64.neon.ld2.v8i8.p0(ptr %A)
5195 void handleNEONVectorLoad(IntrinsicInst &I, bool WithLane) {
5196 unsigned int numArgs = I.arg_size();
5197
5198 // Return type is a struct of vectors of integers or floating-point
5199 assert(I.getType()->isStructTy());
5200 [[maybe_unused]] StructType *RetTy = cast<StructType>(I.getType());
5201 assert(RetTy->getNumElements() > 0);
5203 RetTy->getElementType(0)->isFPOrFPVectorTy());
5204 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5205 assert(RetTy->getElementType(i) == RetTy->getElementType(0));
5206
5207 if (WithLane) {
5208 // 2, 3 or 4 vectors, plus lane number, plus input pointer
5209 assert(4 <= numArgs && numArgs <= 6);
5210
5211 // Return type is a struct of the input vectors
5212 assert(RetTy->getNumElements() + 2 == numArgs);
5213 for (unsigned int i = 0; i < RetTy->getNumElements(); i++)
5214 assert(I.getArgOperand(i)->getType() == RetTy->getElementType(0));
5215 } else {
5216 assert(numArgs == 1);
5217 }
5218
5219 IRBuilder<> IRB(&I);
5220
5221 SmallVector<Value *, 6> ShadowArgs;
5222 if (WithLane) {
5223 for (unsigned int i = 0; i < numArgs - 2; i++)
5224 ShadowArgs.push_back(getShadow(I.getArgOperand(i)));
5225
5226 // Lane number, passed verbatim
5227 Value *LaneNumber = I.getArgOperand(numArgs - 2);
5228 ShadowArgs.push_back(LaneNumber);
5229
5230 // TODO: blend shadow of lane number into output shadow?
5231 insertCheckShadowOf(LaneNumber, &I);
5232 }
5233
5234 Value *Src = I.getArgOperand(numArgs - 1);
5235 assert(Src->getType()->isPointerTy() && "Source is not a pointer!");
5236
5237 Type *SrcShadowTy = getShadowTy(Src);
5238 auto [SrcShadowPtr, SrcOriginPtr] =
5239 getShadowOriginPtr(Src, IRB, SrcShadowTy, Align(1), /*isStore*/ false);
5240 ShadowArgs.push_back(SrcShadowPtr);
5241
5242 // The NEON vector load instructions handled by this function all have
5243 // integer variants. It is easier to use those rather than trying to cast
5244 // a struct of vectors of floats into a struct of vectors of integers.
5245 CallInst *CI =
5246 IRB.CreateIntrinsic(getShadowTy(&I), I.getIntrinsicID(), ShadowArgs);
5247 setShadow(&I, CI);
5248
5249 if (!MS.TrackOrigins)
5250 return;
5251
5252 Value *PtrSrcOrigin = IRB.CreateLoad(MS.OriginTy, SrcOriginPtr);
5253 setOrigin(&I, PtrSrcOrigin);
5254 }
5255
5256 /// Handle Arm NEON vector store intrinsics (vst{2,3,4}, vst1x_{2,3,4},
5257 /// and vst{2,3,4}lane).
5258 ///
5259 /// Arm NEON vector store intrinsics have the output address (pointer) as the
5260 /// last argument, with the initial arguments being the inputs (and lane
5261 /// number for vst{2,3,4}lane). They return void.
5262 ///
5263 /// - st4 interleaves the output e.g., st4 (inA, inB, inC, inD, outP) writes
5264 /// abcdabcdabcdabcd... into *outP
5265 /// - st1_x4 is non-interleaved e.g., st1_x4 (inA, inB, inC, inD, outP)
5266 /// writes aaaa...bbbb...cccc...dddd... into *outP
5267 /// - st4lane has arguments of (inA, inB, inC, inD, lane, outP)
5268 /// These instructions can all be instrumented with essentially the same
5269 /// MSan logic, simply by applying the corresponding intrinsic to the shadow.
5270 void handleNEONVectorStoreIntrinsic(IntrinsicInst &I, bool useLane) {
5271 IRBuilder<> IRB(&I);
5272
5273 // Don't use getNumOperands() because it includes the callee
5274 int numArgOperands = I.arg_size();
5275
5276 // The last arg operand is the output (pointer)
5277 assert(numArgOperands >= 1);
5278 Value *Addr = I.getArgOperand(numArgOperands - 1);
5279 assert(Addr->getType()->isPointerTy());
5280 int skipTrailingOperands = 1;
5281
5283 insertCheckShadowOf(Addr, &I);
5284
5285 // Second-last operand is the lane number (for vst{2,3,4}lane)
5286 if (useLane) {
5287 skipTrailingOperands++;
5288 assert(numArgOperands >= static_cast<int>(skipTrailingOperands));
5290 I.getArgOperand(numArgOperands - skipTrailingOperands)->getType()));
5291 }
5292
5293 SmallVector<Value *, 8> ShadowArgs;
5294 // All the initial operands are the inputs
5295 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++) {
5296 assert(isa<FixedVectorType>(I.getArgOperand(i)->getType()));
5297 Value *Shadow = getShadow(&I, i);
5298 ShadowArgs.append(1, Shadow);
5299 }
5300
5301 // MSan's GetShadowTy assumes the LHS is the type we want the shadow for
5302 // e.g., for:
5303 // [[TMP5:%.*]] = bitcast <16 x i8> [[TMP2]] to i128
5304 // we know the type of the output (and its shadow) is <16 x i8>.
5305 //
5306 // Arm NEON VST is unusual because the last argument is the output address:
5307 // define void @st2_16b(<16 x i8> %A, <16 x i8> %B, ptr %P) {
5308 // call void @llvm.aarch64.neon.st2.v16i8.p0
5309 // (<16 x i8> [[A]], <16 x i8> [[B]], ptr [[P]])
5310 // and we have no type information about P's operand. We must manually
5311 // compute the type (<16 x i8> x 2).
5312 FixedVectorType *OutputVectorTy = FixedVectorType::get(
5313 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getElementType(),
5314 cast<FixedVectorType>(I.getArgOperand(0)->getType())->getNumElements() *
5315 (numArgOperands - skipTrailingOperands));
5316 Type *OutputShadowTy = getShadowTy(OutputVectorTy);
5317
5318 if (useLane)
5319 ShadowArgs.append(1,
5320 I.getArgOperand(numArgOperands - skipTrailingOperands));
5321
5322 Value *OutputShadowPtr, *OutputOriginPtr;
5323 // AArch64 NEON does not need alignment (unless OS requires it)
5324 std::tie(OutputShadowPtr, OutputOriginPtr) = getShadowOriginPtr(
5325 Addr, IRB, OutputShadowTy, Align(1), /*isStore*/ true);
5326 ShadowArgs.append(1, OutputShadowPtr);
5327
5328 CallInst *CI =
5329 IRB.CreateIntrinsic(IRB.getVoidTy(), I.getIntrinsicID(), ShadowArgs);
5330 setShadow(&I, CI);
5331
5332 if (MS.TrackOrigins) {
5333 // TODO: if we modelled the vst* instruction more precisely, we could
5334 // more accurately track the origins (e.g., if both inputs are
5335 // uninitialized for vst2, we currently blame the second input, even
5336 // though part of the output depends only on the first input).
5337 //
5338 // This is particularly imprecise for vst{2,3,4}lane, since only one
5339 // lane of each input is actually copied to the output.
5340 OriginCombiner OC(this, IRB);
5341 for (int i = 0; i < numArgOperands - skipTrailingOperands; i++)
5342 OC.Add(I.getArgOperand(i));
5343
5344 const DataLayout &DL = F.getDataLayout();
5345 OC.DoneAndStoreOrigin(DL.getTypeStoreSize(OutputVectorTy),
5346 OutputOriginPtr);
5347 }
5348 }
5349
5350 /// Handle intrinsics by applying the intrinsic to the shadows.
5351 ///
5352 /// The trailing arguments are passed verbatim to the intrinsic, though any
5353 /// uninitialized trailing arguments can also taint the shadow e.g., for an
5354 /// intrinsic with one trailing verbatim argument:
5355 /// out = intrinsic(var1, var2, opType)
5356 /// we compute:
5357 /// shadow[out] =
5358 /// intrinsic(shadow[var1], shadow[var2], opType) | shadow[opType]
5359 ///
5360 /// Typically, shadowIntrinsicID will be specified by the caller to be
5361 /// I.getIntrinsicID(), but the caller can choose to replace it with another
5362 /// intrinsic of the same type.
5363 ///
5364 /// CAUTION: this assumes that the intrinsic will handle arbitrary
5365 /// bit-patterns (for example, if the intrinsic accepts floats for
5366 /// var1, we require that it doesn't care if inputs are NaNs).
5367 ///
5368 /// For example, this can be applied to the Arm NEON vector table intrinsics
5369 /// (tbl{1,2,3,4}).
5370 ///
5371 /// The origin is approximated using setOriginForNaryOp.
5372 void handleIntrinsicByApplyingToShadow(IntrinsicInst &I,
5373 Intrinsic::ID shadowIntrinsicID,
5374 unsigned int trailingVerbatimArgs) {
5375 IRBuilder<> IRB(&I);
5376
5377 assert(trailingVerbatimArgs < I.arg_size());
5378
5379 SmallVector<Value *, 8> ShadowArgs;
5380 // Don't use getNumOperands() because it includes the callee
5381 for (unsigned int i = 0; i < I.arg_size() - trailingVerbatimArgs; i++) {
5382 Value *Shadow = getShadow(&I, i);
5383
5384 // Shadows are integer-ish types but some intrinsics require a
5385 // different (e.g., floating-point) type.
5386 ShadowArgs.push_back(
5387 IRB.CreateBitCast(Shadow, I.getArgOperand(i)->getType()));
5388 }
5389
5390 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5391 i++) {
5392 Value *Arg = I.getArgOperand(i);
5393 ShadowArgs.push_back(Arg);
5394 }
5395
5396 CallInst *CI =
5397 IRB.CreateIntrinsic(I.getType(), shadowIntrinsicID, ShadowArgs);
5398 Value *CombinedShadow = CI;
5399
5400 // Combine the computed shadow with the shadow of trailing args
5401 for (unsigned int i = I.arg_size() - trailingVerbatimArgs; i < I.arg_size();
5402 i++) {
5403 Value *Shadow =
5404 CreateShadowCast(IRB, getShadow(&I, i), CombinedShadow->getType());
5405 CombinedShadow = IRB.CreateOr(Shadow, CombinedShadow, "_msprop");
5406 }
5407
5408 setShadow(&I, IRB.CreateBitCast(CombinedShadow, getShadowTy(&I)));
5409
5410 setOriginForNaryOp(I);
5411 }
5412
5413 // Approximation only
5414 //
5415 // e.g., <16 x i8> @llvm.aarch64.neon.pmull64(i64, i64)
5416 void handleNEONVectorMultiplyIntrinsic(IntrinsicInst &I) {
5417 assert(I.arg_size() == 2);
5418
5419 handleShadowOr(I);
5420 }
5421
5422 bool maybeHandleCrossPlatformIntrinsic(IntrinsicInst &I) {
5423 switch (I.getIntrinsicID()) {
5424 case Intrinsic::uadd_with_overflow:
5425 case Intrinsic::sadd_with_overflow:
5426 case Intrinsic::usub_with_overflow:
5427 case Intrinsic::ssub_with_overflow:
5428 case Intrinsic::umul_with_overflow:
5429 case Intrinsic::smul_with_overflow:
5430 handleArithmeticWithOverflow(I);
5431 break;
5432 case Intrinsic::abs:
5433 handleAbsIntrinsic(I);
5434 break;
5435 case Intrinsic::bitreverse:
5436 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
5437 /*trailingVerbatimArgs*/ 0);
5438 break;
5439 case Intrinsic::is_fpclass:
5440 handleIsFpClass(I);
5441 break;
5442 case Intrinsic::lifetime_start:
5443 handleLifetimeStart(I);
5444 break;
5445 case Intrinsic::launder_invariant_group:
5446 case Intrinsic::strip_invariant_group:
5447 handleInvariantGroup(I);
5448 break;
5449 case Intrinsic::bswap:
5450 handleBswap(I);
5451 break;
5452 case Intrinsic::ctlz:
5453 case Intrinsic::cttz:
5454 handleCountLeadingTrailingZeros(I);
5455 break;
5456 case Intrinsic::masked_compressstore:
5457 handleMaskedCompressStore(I);
5458 break;
5459 case Intrinsic::masked_expandload:
5460 handleMaskedExpandLoad(I);
5461 break;
5462 case Intrinsic::masked_gather:
5463 handleMaskedGather(I);
5464 break;
5465 case Intrinsic::masked_scatter:
5466 handleMaskedScatter(I);
5467 break;
5468 case Intrinsic::masked_store:
5469 handleMaskedStore(I);
5470 break;
5471 case Intrinsic::masked_load:
5472 handleMaskedLoad(I);
5473 break;
5474 case Intrinsic::vector_reduce_and:
5475 handleVectorReduceAndIntrinsic(I);
5476 break;
5477 case Intrinsic::vector_reduce_or:
5478 handleVectorReduceOrIntrinsic(I);
5479 break;
5480
5481 case Intrinsic::vector_reduce_add:
5482 case Intrinsic::vector_reduce_xor:
5483 case Intrinsic::vector_reduce_mul:
5484 // Signed/Unsigned Min/Max
5485 // TODO: handling similarly to AND/OR may be more precise.
5486 case Intrinsic::vector_reduce_smax:
5487 case Intrinsic::vector_reduce_smin:
5488 case Intrinsic::vector_reduce_umax:
5489 case Intrinsic::vector_reduce_umin:
5490 // TODO: this has no false positives, but arguably we should check that all
5491 // the bits are initialized.
5492 case Intrinsic::vector_reduce_fmax:
5493 case Intrinsic::vector_reduce_fmin:
5494 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/false);
5495 break;
5496
5497 case Intrinsic::vector_reduce_fadd:
5498 case Intrinsic::vector_reduce_fmul:
5499 handleVectorReduceWithStarterIntrinsic(I);
5500 break;
5501
5502 case Intrinsic::scmp:
5503 case Intrinsic::ucmp: {
5504 handleShadowOr(I);
5505 break;
5506 }
5507
5508 case Intrinsic::fshl:
5509 case Intrinsic::fshr:
5510 handleFunnelShift(I);
5511 break;
5512
5513 case Intrinsic::is_constant:
5514 // The result of llvm.is.constant() is always defined.
5515 setShadow(&I, getCleanShadow(&I));
5516 setOrigin(&I, getCleanOrigin());
5517 break;
5518
5519 default:
5520 return false;
5521 }
5522
5523 return true;
5524 }
5525
5526 bool maybeHandleX86SIMDIntrinsic(IntrinsicInst &I) {
5527 switch (I.getIntrinsicID()) {
5528 case Intrinsic::x86_sse_stmxcsr:
5529 handleStmxcsr(I);
5530 break;
5531 case Intrinsic::x86_sse_ldmxcsr:
5532 handleLdmxcsr(I);
5533 break;
5534
5535 // Convert Scalar Double Precision Floating-Point Value
5536 // to Unsigned Doubleword Integer
5537 // etc.
5538 case Intrinsic::x86_avx512_vcvtsd2usi64:
5539 case Intrinsic::x86_avx512_vcvtsd2usi32:
5540 case Intrinsic::x86_avx512_vcvtss2usi64:
5541 case Intrinsic::x86_avx512_vcvtss2usi32:
5542 case Intrinsic::x86_avx512_cvttss2usi64:
5543 case Intrinsic::x86_avx512_cvttss2usi:
5544 case Intrinsic::x86_avx512_cvttsd2usi64:
5545 case Intrinsic::x86_avx512_cvttsd2usi:
5546 case Intrinsic::x86_avx512_cvtusi2ss:
5547 case Intrinsic::x86_avx512_cvtusi642sd:
5548 case Intrinsic::x86_avx512_cvtusi642ss:
5549 handleSSEVectorConvertIntrinsic(I, 1, true);
5550 break;
5551 case Intrinsic::x86_sse2_cvtsd2si64:
5552 case Intrinsic::x86_sse2_cvtsd2si:
5553 case Intrinsic::x86_sse2_cvtsd2ss:
5554 case Intrinsic::x86_sse2_cvttsd2si64:
5555 case Intrinsic::x86_sse2_cvttsd2si:
5556 case Intrinsic::x86_sse_cvtss2si64:
5557 case Intrinsic::x86_sse_cvtss2si:
5558 case Intrinsic::x86_sse_cvttss2si64:
5559 case Intrinsic::x86_sse_cvttss2si:
5560 handleSSEVectorConvertIntrinsic(I, 1);
5561 break;
5562 case Intrinsic::x86_sse_cvtps2pi:
5563 case Intrinsic::x86_sse_cvttps2pi:
5564 handleSSEVectorConvertIntrinsic(I, 2);
5565 break;
5566
5567 // TODO:
5568 // <1 x i64> @llvm.x86.sse.cvtpd2pi(<2 x double>)
5569 // <2 x double> @llvm.x86.sse.cvtpi2pd(<1 x i64>)
5570 // <4 x float> @llvm.x86.sse.cvtpi2ps(<4 x float>, <1 x i64>)
5571
5572 case Intrinsic::x86_vcvtps2ph_128:
5573 case Intrinsic::x86_vcvtps2ph_256: {
5574 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/true);
5575 break;
5576 }
5577
5578 // Convert Packed Single Precision Floating-Point Values
5579 // to Packed Signed Doubleword Integer Values
5580 //
5581 // <16 x i32> @llvm.x86.avx512.mask.cvtps2dq.512
5582 // (<16 x float>, <16 x i32>, i16, i32)
5583 case Intrinsic::x86_avx512_mask_cvtps2dq_512:
5584 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/false);
5585 break;
5586
5587 // Convert Packed Double Precision Floating-Point Values
5588 // to Packed Single Precision Floating-Point Values
5589 case Intrinsic::x86_sse2_cvtpd2ps:
5590 case Intrinsic::x86_sse2_cvtps2dq:
5591 case Intrinsic::x86_sse2_cvtpd2dq:
5592 case Intrinsic::x86_sse2_cvttps2dq:
5593 case Intrinsic::x86_sse2_cvttpd2dq:
5594 case Intrinsic::x86_avx_cvt_pd2_ps_256:
5595 case Intrinsic::x86_avx_cvt_ps2dq_256:
5596 case Intrinsic::x86_avx_cvt_pd2dq_256:
5597 case Intrinsic::x86_avx_cvtt_ps2dq_256:
5598 case Intrinsic::x86_avx_cvtt_pd2dq_256: {
5599 handleSSEVectorConvertIntrinsicByProp(I, /*HasRoundingMode=*/false);
5600 break;
5601 }
5602
5603 // Convert Single-Precision FP Value to 16-bit FP Value
5604 // <16 x i16> @llvm.x86.avx512.mask.vcvtps2ph.512
5605 // (<16 x float>, i32, <16 x i16>, i16)
5606 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.128
5607 // (<4 x float>, i32, <8 x i16>, i8)
5608 // <8 x i16> @llvm.x86.avx512.mask.vcvtps2ph.256
5609 // (<8 x float>, i32, <8 x i16>, i8)
5610 case Intrinsic::x86_avx512_mask_vcvtps2ph_512:
5611 case Intrinsic::x86_avx512_mask_vcvtps2ph_256:
5612 case Intrinsic::x86_avx512_mask_vcvtps2ph_128:
5613 handleAVX512VectorConvertFPToInt(I, /*LastMask=*/true);
5614 break;
5615
5616 // Shift Packed Data (Left Logical, Right Arithmetic, Right Logical)
5617 case Intrinsic::x86_avx512_psll_w_512:
5618 case Intrinsic::x86_avx512_psll_d_512:
5619 case Intrinsic::x86_avx512_psll_q_512:
5620 case Intrinsic::x86_avx512_pslli_w_512:
5621 case Intrinsic::x86_avx512_pslli_d_512:
5622 case Intrinsic::x86_avx512_pslli_q_512:
5623 case Intrinsic::x86_avx512_psrl_w_512:
5624 case Intrinsic::x86_avx512_psrl_d_512:
5625 case Intrinsic::x86_avx512_psrl_q_512:
5626 case Intrinsic::x86_avx512_psra_w_512:
5627 case Intrinsic::x86_avx512_psra_d_512:
5628 case Intrinsic::x86_avx512_psra_q_512:
5629 case Intrinsic::x86_avx512_psrli_w_512:
5630 case Intrinsic::x86_avx512_psrli_d_512:
5631 case Intrinsic::x86_avx512_psrli_q_512:
5632 case Intrinsic::x86_avx512_psrai_w_512:
5633 case Intrinsic::x86_avx512_psrai_d_512:
5634 case Intrinsic::x86_avx512_psrai_q_512:
5635 case Intrinsic::x86_avx512_psra_q_256:
5636 case Intrinsic::x86_avx512_psra_q_128:
5637 case Intrinsic::x86_avx512_psrai_q_256:
5638 case Intrinsic::x86_avx512_psrai_q_128:
5639 case Intrinsic::x86_avx2_psll_w:
5640 case Intrinsic::x86_avx2_psll_d:
5641 case Intrinsic::x86_avx2_psll_q:
5642 case Intrinsic::x86_avx2_pslli_w:
5643 case Intrinsic::x86_avx2_pslli_d:
5644 case Intrinsic::x86_avx2_pslli_q:
5645 case Intrinsic::x86_avx2_psrl_w:
5646 case Intrinsic::x86_avx2_psrl_d:
5647 case Intrinsic::x86_avx2_psrl_q:
5648 case Intrinsic::x86_avx2_psra_w:
5649 case Intrinsic::x86_avx2_psra_d:
5650 case Intrinsic::x86_avx2_psrli_w:
5651 case Intrinsic::x86_avx2_psrli_d:
5652 case Intrinsic::x86_avx2_psrli_q:
5653 case Intrinsic::x86_avx2_psrai_w:
5654 case Intrinsic::x86_avx2_psrai_d:
5655 case Intrinsic::x86_sse2_psll_w:
5656 case Intrinsic::x86_sse2_psll_d:
5657 case Intrinsic::x86_sse2_psll_q:
5658 case Intrinsic::x86_sse2_pslli_w:
5659 case Intrinsic::x86_sse2_pslli_d:
5660 case Intrinsic::x86_sse2_pslli_q:
5661 case Intrinsic::x86_sse2_psrl_w:
5662 case Intrinsic::x86_sse2_psrl_d:
5663 case Intrinsic::x86_sse2_psrl_q:
5664 case Intrinsic::x86_sse2_psra_w:
5665 case Intrinsic::x86_sse2_psra_d:
5666 case Intrinsic::x86_sse2_psrli_w:
5667 case Intrinsic::x86_sse2_psrli_d:
5668 case Intrinsic::x86_sse2_psrli_q:
5669 case Intrinsic::x86_sse2_psrai_w:
5670 case Intrinsic::x86_sse2_psrai_d:
5671 case Intrinsic::x86_mmx_psll_w:
5672 case Intrinsic::x86_mmx_psll_d:
5673 case Intrinsic::x86_mmx_psll_q:
5674 case Intrinsic::x86_mmx_pslli_w:
5675 case Intrinsic::x86_mmx_pslli_d:
5676 case Intrinsic::x86_mmx_pslli_q:
5677 case Intrinsic::x86_mmx_psrl_w:
5678 case Intrinsic::x86_mmx_psrl_d:
5679 case Intrinsic::x86_mmx_psrl_q:
5680 case Intrinsic::x86_mmx_psra_w:
5681 case Intrinsic::x86_mmx_psra_d:
5682 case Intrinsic::x86_mmx_psrli_w:
5683 case Intrinsic::x86_mmx_psrli_d:
5684 case Intrinsic::x86_mmx_psrli_q:
5685 case Intrinsic::x86_mmx_psrai_w:
5686 case Intrinsic::x86_mmx_psrai_d:
5687 handleVectorShiftIntrinsic(I, /* Variable */ false);
5688 break;
5689 case Intrinsic::x86_avx2_psllv_d:
5690 case Intrinsic::x86_avx2_psllv_d_256:
5691 case Intrinsic::x86_avx512_psllv_d_512:
5692 case Intrinsic::x86_avx2_psllv_q:
5693 case Intrinsic::x86_avx2_psllv_q_256:
5694 case Intrinsic::x86_avx512_psllv_q_512:
5695 case Intrinsic::x86_avx2_psrlv_d:
5696 case Intrinsic::x86_avx2_psrlv_d_256:
5697 case Intrinsic::x86_avx512_psrlv_d_512:
5698 case Intrinsic::x86_avx2_psrlv_q:
5699 case Intrinsic::x86_avx2_psrlv_q_256:
5700 case Intrinsic::x86_avx512_psrlv_q_512:
5701 case Intrinsic::x86_avx2_psrav_d:
5702 case Intrinsic::x86_avx2_psrav_d_256:
5703 case Intrinsic::x86_avx512_psrav_d_512:
5704 case Intrinsic::x86_avx512_psrav_q_128:
5705 case Intrinsic::x86_avx512_psrav_q_256:
5706 case Intrinsic::x86_avx512_psrav_q_512:
5707 handleVectorShiftIntrinsic(I, /* Variable */ true);
5708 break;
5709
5710 // Pack with Signed/Unsigned Saturation
5711 case Intrinsic::x86_sse2_packsswb_128:
5712 case Intrinsic::x86_sse2_packssdw_128:
5713 case Intrinsic::x86_sse2_packuswb_128:
5714 case Intrinsic::x86_sse41_packusdw:
5715 case Intrinsic::x86_avx2_packsswb:
5716 case Intrinsic::x86_avx2_packssdw:
5717 case Intrinsic::x86_avx2_packuswb:
5718 case Intrinsic::x86_avx2_packusdw:
5719 // e.g., <64 x i8> @llvm.x86.avx512.packsswb.512
5720 // (<32 x i16> %a, <32 x i16> %b)
5721 // <32 x i16> @llvm.x86.avx512.packssdw.512
5722 // (<16 x i32> %a, <16 x i32> %b)
5723 // Note: AVX512 masked variants are auto-upgraded by LLVM.
5724 case Intrinsic::x86_avx512_packsswb_512:
5725 case Intrinsic::x86_avx512_packssdw_512:
5726 case Intrinsic::x86_avx512_packuswb_512:
5727 case Intrinsic::x86_avx512_packusdw_512:
5728 handleVectorPackIntrinsic(I);
5729 break;
5730
5731 case Intrinsic::x86_sse41_pblendvb:
5732 case Intrinsic::x86_sse41_blendvpd:
5733 case Intrinsic::x86_sse41_blendvps:
5734 case Intrinsic::x86_avx_blendv_pd_256:
5735 case Intrinsic::x86_avx_blendv_ps_256:
5736 case Intrinsic::x86_avx2_pblendvb:
5737 handleBlendvIntrinsic(I);
5738 break;
5739
5740 case Intrinsic::x86_avx_dp_ps_256:
5741 case Intrinsic::x86_sse41_dppd:
5742 case Intrinsic::x86_sse41_dpps:
5743 handleDppIntrinsic(I);
5744 break;
5745
5746 case Intrinsic::x86_mmx_packsswb:
5747 case Intrinsic::x86_mmx_packuswb:
5748 handleVectorPackIntrinsic(I, 16);
5749 break;
5750
5751 case Intrinsic::x86_mmx_packssdw:
5752 handleVectorPackIntrinsic(I, 32);
5753 break;
5754
5755 case Intrinsic::x86_mmx_psad_bw:
5756 handleVectorSadIntrinsic(I, true);
5757 break;
5758 case Intrinsic::x86_sse2_psad_bw:
5759 case Intrinsic::x86_avx2_psad_bw:
5760 handleVectorSadIntrinsic(I);
5761 break;
5762
5763 // Multiply and Add Packed Words
5764 // < 4 x i32> @llvm.x86.sse2.pmadd.wd(<8 x i16>, <8 x i16>)
5765 // < 8 x i32> @llvm.x86.avx2.pmadd.wd(<16 x i16>, <16 x i16>)
5766 // <16 x i32> @llvm.x86.avx512.pmaddw.d.512(<32 x i16>, <32 x i16>)
5767 //
5768 // Multiply and Add Packed Signed and Unsigned Bytes
5769 // < 8 x i16> @llvm.x86.ssse3.pmadd.ub.sw.128(<16 x i8>, <16 x i8>)
5770 // <16 x i16> @llvm.x86.avx2.pmadd.ub.sw(<32 x i8>, <32 x i8>)
5771 // <32 x i16> @llvm.x86.avx512.pmaddubs.w.512(<64 x i8>, <64 x i8>)
5772 //
5773 // These intrinsics are auto-upgraded into non-masked forms:
5774 // < 4 x i32> @llvm.x86.avx512.mask.pmaddw.d.128
5775 // (<8 x i16>, <8 x i16>, <4 x i32>, i8)
5776 // < 8 x i32> @llvm.x86.avx512.mask.pmaddw.d.256
5777 // (<16 x i16>, <16 x i16>, <8 x i32>, i8)
5778 // <16 x i32> @llvm.x86.avx512.mask.pmaddw.d.512
5779 // (<32 x i16>, <32 x i16>, <16 x i32>, i16)
5780 // < 8 x i16> @llvm.x86.avx512.mask.pmaddubs.w.128
5781 // (<16 x i8>, <16 x i8>, <8 x i16>, i8)
5782 // <16 x i16> @llvm.x86.avx512.mask.pmaddubs.w.256
5783 // (<32 x i8>, <32 x i8>, <16 x i16>, i16)
5784 // <32 x i16> @llvm.x86.avx512.mask.pmaddubs.w.512
5785 // (<64 x i8>, <64 x i8>, <32 x i16>, i32)
5786 case Intrinsic::x86_sse2_pmadd_wd:
5787 case Intrinsic::x86_avx2_pmadd_wd:
5788 case Intrinsic::x86_avx512_pmaddw_d_512:
5789 case Intrinsic::x86_ssse3_pmadd_ub_sw_128:
5790 case Intrinsic::x86_avx2_pmadd_ub_sw:
5791 case Intrinsic::x86_avx512_pmaddubs_w_512:
5792 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2,
5793 /*ZeroPurifies=*/true);
5794 break;
5795
5796 // <1 x i64> @llvm.x86.ssse3.pmadd.ub.sw(<1 x i64>, <1 x i64>)
5797 case Intrinsic::x86_ssse3_pmadd_ub_sw:
5798 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2,
5799 /*ZeroPurifies=*/true, /*EltSizeInBits=*/8);
5800 break;
5801
5802 // <1 x i64> @llvm.x86.mmx.pmadd.wd(<1 x i64>, <1 x i64>)
5803 case Intrinsic::x86_mmx_pmadd_wd:
5804 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2,
5805 /*ZeroPurifies=*/true, /*EltSizeInBits=*/16);
5806 break;
5807
5808 // AVX Vector Neural Network Instructions: bytes
5809 //
5810 // Multiply and Add Signed Bytes
5811 // < 4 x i32> @llvm.x86.avx2.vpdpbssd.128
5812 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5813 // < 8 x i32> @llvm.x86.avx2.vpdpbssd.256
5814 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5815 // <16 x i32> @llvm.x86.avx10.vpdpbssd.512
5816 // (<16 x i32>, <64 x i8>, <64 x i8>)
5817 //
5818 // Multiply and Add Signed Bytes With Saturation
5819 // < 4 x i32> @llvm.x86.avx2.vpdpbssds.128
5820 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5821 // < 8 x i32> @llvm.x86.avx2.vpdpbssds.256
5822 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5823 // <16 x i32> @llvm.x86.avx10.vpdpbssds.512
5824 // (<16 x i32>, <64 x i8>, <64 x i8>)
5825 //
5826 // Multiply and Add Signed and Unsigned Bytes
5827 // < 4 x i32> @llvm.x86.avx2.vpdpbsud.128
5828 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5829 // < 8 x i32> @llvm.x86.avx2.vpdpbsud.256
5830 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5831 // <16 x i32> @llvm.x86.avx10.vpdpbsud.512
5832 // (<16 x i32>, <64 x i8>, <64 x i8>)
5833 //
5834 // Multiply and Add Signed and Unsigned Bytes With Saturation
5835 // < 4 x i32> @llvm.x86.avx2.vpdpbsuds.128
5836 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5837 // < 8 x i32> @llvm.x86.avx2.vpdpbsuds.256
5838 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5839 // <16 x i32> @llvm.x86.avx512.vpdpbusds.512
5840 // (<16 x i32>, <64 x i8>, <64 x i8>)
5841 //
5842 // Multiply and Add Unsigned and Signed Bytes
5843 // < 4 x i32> @llvm.x86.avx512.vpdpbusd.128
5844 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5845 // < 8 x i32> @llvm.x86.avx512.vpdpbusd.256
5846 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5847 // <16 x i32> @llvm.x86.avx512.vpdpbusd.512
5848 // (<16 x i32>, <64 x i8>, <64 x i8>)
5849 //
5850 // Multiply and Add Unsigned and Signed Bytes With Saturation
5851 // < 4 x i32> @llvm.x86.avx512.vpdpbusds.128
5852 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5853 // < 8 x i32> @llvm.x86.avx512.vpdpbusds.256
5854 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5855 // <16 x i32> @llvm.x86.avx10.vpdpbsuds.512
5856 // (<16 x i32>, <64 x i8>, <64 x i8>)
5857 //
5858 // Multiply and Add Unsigned Bytes
5859 // < 4 x i32> @llvm.x86.avx2.vpdpbuud.128
5860 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5861 // < 8 x i32> @llvm.x86.avx2.vpdpbuud.256
5862 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5863 // <16 x i32> @llvm.x86.avx10.vpdpbuud.512
5864 // (<16 x i32>, <64 x i8>, <64 x i8>)
5865 //
5866 // Multiply and Add Unsigned Bytes With Saturation
5867 // < 4 x i32> @llvm.x86.avx2.vpdpbuuds.128
5868 // (< 4 x i32>, <16 x i8>, <16 x i8>)
5869 // < 8 x i32> @llvm.x86.avx2.vpdpbuuds.256
5870 // (< 8 x i32>, <32 x i8>, <32 x i8>)
5871 // <16 x i32> @llvm.x86.avx10.vpdpbuuds.512
5872 // (<16 x i32>, <64 x i8>, <64 x i8>)
5873 //
5874 // These intrinsics are auto-upgraded into non-masked forms:
5875 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusd.128
5876 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5877 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusd.128
5878 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5879 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusd.256
5880 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5881 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusd.256
5882 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5883 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusd.512
5884 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5885 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusd.512
5886 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5887 //
5888 // <4 x i32> @llvm.x86.avx512.mask.vpdpbusds.128
5889 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5890 // <4 x i32> @llvm.x86.avx512.maskz.vpdpbusds.128
5891 // (<4 x i32>, <16 x i8>, <16 x i8>, i8)
5892 // <8 x i32> @llvm.x86.avx512.mask.vpdpbusds.256
5893 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5894 // <8 x i32> @llvm.x86.avx512.maskz.vpdpbusds.256
5895 // (<8 x i32>, <32 x i8>, <32 x i8>, i8)
5896 // <16 x i32> @llvm.x86.avx512.mask.vpdpbusds.512
5897 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5898 // <16 x i32> @llvm.x86.avx512.maskz.vpdpbusds.512
5899 // (<16 x i32>, <64 x i8>, <64 x i8>, i16)
5900 case Intrinsic::x86_avx512_vpdpbusd_128:
5901 case Intrinsic::x86_avx512_vpdpbusd_256:
5902 case Intrinsic::x86_avx512_vpdpbusd_512:
5903 case Intrinsic::x86_avx512_vpdpbusds_128:
5904 case Intrinsic::x86_avx512_vpdpbusds_256:
5905 case Intrinsic::x86_avx512_vpdpbusds_512:
5906 case Intrinsic::x86_avx2_vpdpbssd_128:
5907 case Intrinsic::x86_avx2_vpdpbssd_256:
5908 case Intrinsic::x86_avx10_vpdpbssd_512:
5909 case Intrinsic::x86_avx2_vpdpbssds_128:
5910 case Intrinsic::x86_avx2_vpdpbssds_256:
5911 case Intrinsic::x86_avx10_vpdpbssds_512:
5912 case Intrinsic::x86_avx2_vpdpbsud_128:
5913 case Intrinsic::x86_avx2_vpdpbsud_256:
5914 case Intrinsic::x86_avx10_vpdpbsud_512:
5915 case Intrinsic::x86_avx2_vpdpbsuds_128:
5916 case Intrinsic::x86_avx2_vpdpbsuds_256:
5917 case Intrinsic::x86_avx10_vpdpbsuds_512:
5918 case Intrinsic::x86_avx2_vpdpbuud_128:
5919 case Intrinsic::x86_avx2_vpdpbuud_256:
5920 case Intrinsic::x86_avx10_vpdpbuud_512:
5921 case Intrinsic::x86_avx2_vpdpbuuds_128:
5922 case Intrinsic::x86_avx2_vpdpbuuds_256:
5923 case Intrinsic::x86_avx10_vpdpbuuds_512:
5924 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/4,
5925 /*ZeroPurifies=*/true);
5926 break;
5927
5928 // AVX Vector Neural Network Instructions: words
5929 //
5930 // Multiply and Add Signed Word Integers
5931 // < 4 x i32> @llvm.x86.avx512.vpdpwssd.128
5932 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
5933 // < 8 x i32> @llvm.x86.avx512.vpdpwssd.256
5934 // (< 8 x i32>, <16 x i16>, <16 x i16>)
5935 // <16 x i32> @llvm.x86.avx512.vpdpwssd.512
5936 // (<16 x i32>, <32 x i16>, <32 x i16>)
5937 //
5938 // Multiply and Add Signed Word Integers With Saturation
5939 // < 4 x i32> @llvm.x86.avx512.vpdpwssds.128
5940 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
5941 // < 8 x i32> @llvm.x86.avx512.vpdpwssds.256
5942 // (< 8 x i32>, <16 x i16>, <16 x i16>)
5943 // <16 x i32> @llvm.x86.avx512.vpdpwssds.512
5944 // (<16 x i32>, <32 x i16>, <32 x i16>)
5945 //
5946 // Multiply and Add Signed and Unsigned Word Integers
5947 // < 4 x i32> @llvm.x86.avx2.vpdpwsud.128
5948 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
5949 // < 8 x i32> @llvm.x86.avx2.vpdpwsud.256
5950 // (< 8 x i32>, <16 x i16>, <16 x i16>)
5951 // <16 x i32> @llvm.x86.avx10.vpdpwsud.512
5952 // (<16 x i32>, <32 x i16>, <32 x i16>)
5953 //
5954 // Multiply and Add Signed and Unsigned Word Integers With Saturation
5955 // < 4 x i32> @llvm.x86.avx2.vpdpwsuds.128
5956 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
5957 // < 8 x i32> @llvm.x86.avx2.vpdpwsuds.256
5958 // (< 8 x i32>, <16 x i16>, <16 x i16>)
5959 // <16 x i32> @llvm.x86.avx10.vpdpwsuds.512
5960 // (<16 x i32>, <32 x i16>, <32 x i16>)
5961 //
5962 // Multiply and Add Unsigned and Signed Word Integers
5963 // < 4 x i32> @llvm.x86.avx2.vpdpwusd.128
5964 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
5965 // < 8 x i32> @llvm.x86.avx2.vpdpwusd.256
5966 // (< 8 x i32>, <16 x i16>, <16 x i16>)
5967 // <16 x i32> @llvm.x86.avx10.vpdpwusd.512
5968 // (<16 x i32>, <32 x i16>, <32 x i16>)
5969 //
5970 // Multiply and Add Unsigned and Signed Word Integers With Saturation
5971 // < 4 x i32> @llvm.x86.avx2.vpdpwusds.128
5972 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
5973 // < 8 x i32> @llvm.x86.avx2.vpdpwusds.256
5974 // (< 8 x i32>, <16 x i16>, <16 x i16>)
5975 // <16 x i32> @llvm.x86.avx10.vpdpwusds.512
5976 // (<16 x i32>, <32 x i16>, <32 x i16>)
5977 //
5978 // Multiply and Add Unsigned and Unsigned Word Integers
5979 // < 4 x i32> @llvm.x86.avx2.vpdpwuud.128
5980 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
5981 // < 8 x i32> @llvm.x86.avx2.vpdpwuud.256
5982 // (< 8 x i32>, <16 x i16>, <16 x i16>)
5983 // <16 x i32> @llvm.x86.avx10.vpdpwuud.512
5984 // (<16 x i32>, <32 x i16>, <32 x i16>)
5985 //
5986 // Multiply and Add Unsigned and Unsigned Word Integers With Saturation
5987 // < 4 x i32> @llvm.x86.avx2.vpdpwuuds.128
5988 // (< 4 x i32>, < 8 x i16>, < 8 x i16>)
5989 // < 8 x i32> @llvm.x86.avx2.vpdpwuuds.256
5990 // (< 8 x i32>, <16 x i16>, <16 x i16>)
5991 // <16 x i32> @llvm.x86.avx10.vpdpwuuds.512
5992 // (<16 x i32>, <32 x i16>, <32 x i16>)
5993 //
5994 // These intrinsics are auto-upgraded into non-masked forms:
5995 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssd.128
5996 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
5997 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssd.128
5998 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
5999 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssd.256
6000 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6001 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssd.256
6002 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6003 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssd.512
6004 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6005 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssd.512
6006 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6007 //
6008 // <4 x i32> @llvm.x86.avx512.mask.vpdpwssds.128
6009 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6010 // <4 x i32> @llvm.x86.avx512.maskz.vpdpwssds.128
6011 // (<4 x i32>, <8 x i16>, <8 x i16>, i8)
6012 // <8 x i32> @llvm.x86.avx512.mask.vpdpwssds.256
6013 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6014 // <8 x i32> @llvm.x86.avx512.maskz.vpdpwssds.256
6015 // (<8 x i32>, <16 x i16>, <16 x i16>, i8)
6016 // <16 x i32> @llvm.x86.avx512.mask.vpdpwssds.512
6017 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6018 // <16 x i32> @llvm.x86.avx512.maskz.vpdpwssds.512
6019 // (<16 x i32>, <32 x i16>, <32 x i16>, i16)
6020 case Intrinsic::x86_avx512_vpdpwssd_128:
6021 case Intrinsic::x86_avx512_vpdpwssd_256:
6022 case Intrinsic::x86_avx512_vpdpwssd_512:
6023 case Intrinsic::x86_avx512_vpdpwssds_128:
6024 case Intrinsic::x86_avx512_vpdpwssds_256:
6025 case Intrinsic::x86_avx512_vpdpwssds_512:
6026 case Intrinsic::x86_avx2_vpdpwsud_128:
6027 case Intrinsic::x86_avx2_vpdpwsud_256:
6028 case Intrinsic::x86_avx10_vpdpwsud_512:
6029 case Intrinsic::x86_avx2_vpdpwsuds_128:
6030 case Intrinsic::x86_avx2_vpdpwsuds_256:
6031 case Intrinsic::x86_avx10_vpdpwsuds_512:
6032 case Intrinsic::x86_avx2_vpdpwusd_128:
6033 case Intrinsic::x86_avx2_vpdpwusd_256:
6034 case Intrinsic::x86_avx10_vpdpwusd_512:
6035 case Intrinsic::x86_avx2_vpdpwusds_128:
6036 case Intrinsic::x86_avx2_vpdpwusds_256:
6037 case Intrinsic::x86_avx10_vpdpwusds_512:
6038 case Intrinsic::x86_avx2_vpdpwuud_128:
6039 case Intrinsic::x86_avx2_vpdpwuud_256:
6040 case Intrinsic::x86_avx10_vpdpwuud_512:
6041 case Intrinsic::x86_avx2_vpdpwuuds_128:
6042 case Intrinsic::x86_avx2_vpdpwuuds_256:
6043 case Intrinsic::x86_avx10_vpdpwuuds_512:
6044 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2,
6045 /*ZeroPurifies=*/true);
6046 break;
6047
6048 // Dot Product of BF16 Pairs Accumulated Into Packed Single
6049 // Precision
6050 // <4 x float> @llvm.x86.avx512bf16.dpbf16ps.128
6051 // (<4 x float>, <8 x bfloat>, <8 x bfloat>)
6052 // <8 x float> @llvm.x86.avx512bf16.dpbf16ps.256
6053 // (<8 x float>, <16 x bfloat>, <16 x bfloat>)
6054 // <16 x float> @llvm.x86.avx512bf16.dpbf16ps.512
6055 // (<16 x float>, <32 x bfloat>, <32 x bfloat>)
6056 case Intrinsic::x86_avx512bf16_dpbf16ps_128:
6057 case Intrinsic::x86_avx512bf16_dpbf16ps_256:
6058 case Intrinsic::x86_avx512bf16_dpbf16ps_512:
6059 handleVectorPmaddIntrinsic(I, /*ReductionFactor=*/2,
6060 /*ZeroPurifies=*/false);
6061 break;
6062
6063 case Intrinsic::x86_sse_cmp_ss:
6064 case Intrinsic::x86_sse2_cmp_sd:
6065 case Intrinsic::x86_sse_comieq_ss:
6066 case Intrinsic::x86_sse_comilt_ss:
6067 case Intrinsic::x86_sse_comile_ss:
6068 case Intrinsic::x86_sse_comigt_ss:
6069 case Intrinsic::x86_sse_comige_ss:
6070 case Intrinsic::x86_sse_comineq_ss:
6071 case Intrinsic::x86_sse_ucomieq_ss:
6072 case Intrinsic::x86_sse_ucomilt_ss:
6073 case Intrinsic::x86_sse_ucomile_ss:
6074 case Intrinsic::x86_sse_ucomigt_ss:
6075 case Intrinsic::x86_sse_ucomige_ss:
6076 case Intrinsic::x86_sse_ucomineq_ss:
6077 case Intrinsic::x86_sse2_comieq_sd:
6078 case Intrinsic::x86_sse2_comilt_sd:
6079 case Intrinsic::x86_sse2_comile_sd:
6080 case Intrinsic::x86_sse2_comigt_sd:
6081 case Intrinsic::x86_sse2_comige_sd:
6082 case Intrinsic::x86_sse2_comineq_sd:
6083 case Intrinsic::x86_sse2_ucomieq_sd:
6084 case Intrinsic::x86_sse2_ucomilt_sd:
6085 case Intrinsic::x86_sse2_ucomile_sd:
6086 case Intrinsic::x86_sse2_ucomigt_sd:
6087 case Intrinsic::x86_sse2_ucomige_sd:
6088 case Intrinsic::x86_sse2_ucomineq_sd:
6089 handleVectorCompareScalarIntrinsic(I);
6090 break;
6091
6092 case Intrinsic::x86_avx_cmp_pd_256:
6093 case Intrinsic::x86_avx_cmp_ps_256:
6094 case Intrinsic::x86_sse2_cmp_pd:
6095 case Intrinsic::x86_sse_cmp_ps:
6096 handleVectorComparePackedIntrinsic(I);
6097 break;
6098
6099 case Intrinsic::x86_bmi_bextr_32:
6100 case Intrinsic::x86_bmi_bextr_64:
6101 case Intrinsic::x86_bmi_bzhi_32:
6102 case Intrinsic::x86_bmi_bzhi_64:
6103 case Intrinsic::x86_bmi_pdep_32:
6104 case Intrinsic::x86_bmi_pdep_64:
6105 case Intrinsic::x86_bmi_pext_32:
6106 case Intrinsic::x86_bmi_pext_64:
6107 handleBmiIntrinsic(I);
6108 break;
6109
6110 case Intrinsic::x86_pclmulqdq:
6111 case Intrinsic::x86_pclmulqdq_256:
6112 case Intrinsic::x86_pclmulqdq_512:
6113 handlePclmulIntrinsic(I);
6114 break;
6115
6116 case Intrinsic::x86_avx_round_pd_256:
6117 case Intrinsic::x86_avx_round_ps_256:
6118 case Intrinsic::x86_sse41_round_pd:
6119 case Intrinsic::x86_sse41_round_ps:
6120 handleRoundPdPsIntrinsic(I);
6121 break;
6122
6123 case Intrinsic::x86_sse41_round_sd:
6124 case Intrinsic::x86_sse41_round_ss:
6125 handleUnarySdSsIntrinsic(I);
6126 break;
6127
6128 case Intrinsic::x86_sse2_max_sd:
6129 case Intrinsic::x86_sse_max_ss:
6130 case Intrinsic::x86_sse2_min_sd:
6131 case Intrinsic::x86_sse_min_ss:
6132 handleBinarySdSsIntrinsic(I);
6133 break;
6134
6135 case Intrinsic::x86_avx_vtestc_pd:
6136 case Intrinsic::x86_avx_vtestc_pd_256:
6137 case Intrinsic::x86_avx_vtestc_ps:
6138 case Intrinsic::x86_avx_vtestc_ps_256:
6139 case Intrinsic::x86_avx_vtestnzc_pd:
6140 case Intrinsic::x86_avx_vtestnzc_pd_256:
6141 case Intrinsic::x86_avx_vtestnzc_ps:
6142 case Intrinsic::x86_avx_vtestnzc_ps_256:
6143 case Intrinsic::x86_avx_vtestz_pd:
6144 case Intrinsic::x86_avx_vtestz_pd_256:
6145 case Intrinsic::x86_avx_vtestz_ps:
6146 case Intrinsic::x86_avx_vtestz_ps_256:
6147 case Intrinsic::x86_avx_ptestc_256:
6148 case Intrinsic::x86_avx_ptestnzc_256:
6149 case Intrinsic::x86_avx_ptestz_256:
6150 case Intrinsic::x86_sse41_ptestc:
6151 case Intrinsic::x86_sse41_ptestnzc:
6152 case Intrinsic::x86_sse41_ptestz:
6153 handleVtestIntrinsic(I);
6154 break;
6155
6156 // Packed Horizontal Add/Subtract
6157 case Intrinsic::x86_ssse3_phadd_w:
6158 case Intrinsic::x86_ssse3_phadd_w_128:
6159 case Intrinsic::x86_ssse3_phsub_w:
6160 case Intrinsic::x86_ssse3_phsub_w_128:
6161 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6162 /*ReinterpretElemWidth=*/16);
6163 break;
6164
6165 case Intrinsic::x86_avx2_phadd_w:
6166 case Intrinsic::x86_avx2_phsub_w:
6167 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6168 /*ReinterpretElemWidth=*/16);
6169 break;
6170
6171 // Packed Horizontal Add/Subtract
6172 case Intrinsic::x86_ssse3_phadd_d:
6173 case Intrinsic::x86_ssse3_phadd_d_128:
6174 case Intrinsic::x86_ssse3_phsub_d:
6175 case Intrinsic::x86_ssse3_phsub_d_128:
6176 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6177 /*ReinterpretElemWidth=*/32);
6178 break;
6179
6180 case Intrinsic::x86_avx2_phadd_d:
6181 case Intrinsic::x86_avx2_phsub_d:
6182 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6183 /*ReinterpretElemWidth=*/32);
6184 break;
6185
6186 // Packed Horizontal Add/Subtract and Saturate
6187 case Intrinsic::x86_ssse3_phadd_sw:
6188 case Intrinsic::x86_ssse3_phadd_sw_128:
6189 case Intrinsic::x86_ssse3_phsub_sw:
6190 case Intrinsic::x86_ssse3_phsub_sw_128:
6191 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1,
6192 /*ReinterpretElemWidth=*/16);
6193 break;
6194
6195 case Intrinsic::x86_avx2_phadd_sw:
6196 case Intrinsic::x86_avx2_phsub_sw:
6197 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2,
6198 /*ReinterpretElemWidth=*/16);
6199 break;
6200
6201 // Packed Single/Double Precision Floating-Point Horizontal Add
6202 case Intrinsic::x86_sse3_hadd_ps:
6203 case Intrinsic::x86_sse3_hadd_pd:
6204 case Intrinsic::x86_sse3_hsub_ps:
6205 case Intrinsic::x86_sse3_hsub_pd:
6206 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6207 break;
6208
6209 case Intrinsic::x86_avx_hadd_pd_256:
6210 case Intrinsic::x86_avx_hadd_ps_256:
6211 case Intrinsic::x86_avx_hsub_pd_256:
6212 case Intrinsic::x86_avx_hsub_ps_256:
6213 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/2);
6214 break;
6215
6216 case Intrinsic::x86_avx_maskstore_ps:
6217 case Intrinsic::x86_avx_maskstore_pd:
6218 case Intrinsic::x86_avx_maskstore_ps_256:
6219 case Intrinsic::x86_avx_maskstore_pd_256:
6220 case Intrinsic::x86_avx2_maskstore_d:
6221 case Intrinsic::x86_avx2_maskstore_q:
6222 case Intrinsic::x86_avx2_maskstore_d_256:
6223 case Intrinsic::x86_avx2_maskstore_q_256: {
6224 handleAVXMaskedStore(I);
6225 break;
6226 }
6227
6228 case Intrinsic::x86_avx_maskload_ps:
6229 case Intrinsic::x86_avx_maskload_pd:
6230 case Intrinsic::x86_avx_maskload_ps_256:
6231 case Intrinsic::x86_avx_maskload_pd_256:
6232 case Intrinsic::x86_avx2_maskload_d:
6233 case Intrinsic::x86_avx2_maskload_q:
6234 case Intrinsic::x86_avx2_maskload_d_256:
6235 case Intrinsic::x86_avx2_maskload_q_256: {
6236 handleAVXMaskedLoad(I);
6237 break;
6238 }
6239
6240 // Packed
6241 case Intrinsic::x86_avx512fp16_add_ph_512:
6242 case Intrinsic::x86_avx512fp16_sub_ph_512:
6243 case Intrinsic::x86_avx512fp16_mul_ph_512:
6244 case Intrinsic::x86_avx512fp16_div_ph_512:
6245 case Intrinsic::x86_avx512fp16_max_ph_512:
6246 case Intrinsic::x86_avx512fp16_min_ph_512:
6247 case Intrinsic::x86_avx512_min_ps_512:
6248 case Intrinsic::x86_avx512_min_pd_512:
6249 case Intrinsic::x86_avx512_max_ps_512:
6250 case Intrinsic::x86_avx512_max_pd_512: {
6251 // These AVX512 variants contain the rounding mode as a trailing flag.
6252 // Earlier variants do not have a trailing flag and are already handled
6253 // by maybeHandleSimpleNomemIntrinsic(I, 0) via
6254 // maybeHandleUnknownIntrinsic.
6255 [[maybe_unused]] bool Success =
6256 maybeHandleSimpleNomemIntrinsic(I, /*trailingFlags=*/1);
6257 assert(Success);
6258 break;
6259 }
6260
6261 case Intrinsic::x86_avx_vpermilvar_pd:
6262 case Intrinsic::x86_avx_vpermilvar_pd_256:
6263 case Intrinsic::x86_avx512_vpermilvar_pd_512:
6264 case Intrinsic::x86_avx_vpermilvar_ps:
6265 case Intrinsic::x86_avx_vpermilvar_ps_256:
6266 case Intrinsic::x86_avx512_vpermilvar_ps_512: {
6267 handleAVXVpermilvar(I);
6268 break;
6269 }
6270
6271 case Intrinsic::x86_avx512_vpermi2var_d_128:
6272 case Intrinsic::x86_avx512_vpermi2var_d_256:
6273 case Intrinsic::x86_avx512_vpermi2var_d_512:
6274 case Intrinsic::x86_avx512_vpermi2var_hi_128:
6275 case Intrinsic::x86_avx512_vpermi2var_hi_256:
6276 case Intrinsic::x86_avx512_vpermi2var_hi_512:
6277 case Intrinsic::x86_avx512_vpermi2var_pd_128:
6278 case Intrinsic::x86_avx512_vpermi2var_pd_256:
6279 case Intrinsic::x86_avx512_vpermi2var_pd_512:
6280 case Intrinsic::x86_avx512_vpermi2var_ps_128:
6281 case Intrinsic::x86_avx512_vpermi2var_ps_256:
6282 case Intrinsic::x86_avx512_vpermi2var_ps_512:
6283 case Intrinsic::x86_avx512_vpermi2var_q_128:
6284 case Intrinsic::x86_avx512_vpermi2var_q_256:
6285 case Intrinsic::x86_avx512_vpermi2var_q_512:
6286 case Intrinsic::x86_avx512_vpermi2var_qi_128:
6287 case Intrinsic::x86_avx512_vpermi2var_qi_256:
6288 case Intrinsic::x86_avx512_vpermi2var_qi_512:
6289 handleAVXVpermi2var(I);
6290 break;
6291
6292 // Packed Shuffle
6293 // llvm.x86.sse.pshuf.w(<1 x i64>, i8)
6294 // llvm.x86.ssse3.pshuf.b(<1 x i64>, <1 x i64>)
6295 // llvm.x86.ssse3.pshuf.b.128(<16 x i8>, <16 x i8>)
6296 // llvm.x86.avx2.pshuf.b(<32 x i8>, <32 x i8>)
6297 // llvm.x86.avx512.pshuf.b.512(<64 x i8>, <64 x i8>)
6298 //
6299 // The following intrinsics are auto-upgraded:
6300 // llvm.x86.sse2.pshuf.d(<4 x i32>, i8)
6301 // llvm.x86.sse2.gpshufh.w(<8 x i16>, i8)
6302 // llvm.x86.sse2.pshufl.w(<8 x i16>, i8)
6303 case Intrinsic::x86_avx2_pshuf_b:
6304 case Intrinsic::x86_sse_pshuf_w:
6305 case Intrinsic::x86_ssse3_pshuf_b_128:
6306 case Intrinsic::x86_ssse3_pshuf_b:
6307 case Intrinsic::x86_avx512_pshuf_b_512:
6308 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6309 /*trailingVerbatimArgs=*/1);
6310 break;
6311
6312 // AVX512 PMOV: Packed MOV, with truncation
6313 // Precisely handled by applying the same intrinsic to the shadow
6314 case Intrinsic::x86_avx512_mask_pmov_dw_512:
6315 case Intrinsic::x86_avx512_mask_pmov_db_512:
6316 case Intrinsic::x86_avx512_mask_pmov_qb_512:
6317 case Intrinsic::x86_avx512_mask_pmov_qw_512: {
6318 // Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 were removed in
6319 // f608dc1f5775ee880e8ea30e2d06ab5a4a935c22
6320 handleIntrinsicByApplyingToShadow(I, I.getIntrinsicID(),
6321 /*trailingVerbatimArgs=*/1);
6322 break;
6323 }
6324
6325 // AVX512 PMVOV{S,US}: Packed MOV, with signed/unsigned saturation
6326 // Approximately handled using the corresponding truncation intrinsic
6327 // TODO: improve handleAVX512VectorDownConvert to precisely model saturation
6328 case Intrinsic::x86_avx512_mask_pmovs_dw_512:
6329 case Intrinsic::x86_avx512_mask_pmovus_dw_512: {
6330 handleIntrinsicByApplyingToShadow(I,
6331 Intrinsic::x86_avx512_mask_pmov_dw_512,
6332 /* trailingVerbatimArgs=*/1);
6333 break;
6334 }
6335
6336 case Intrinsic::x86_avx512_mask_pmovs_db_512:
6337 case Intrinsic::x86_avx512_mask_pmovus_db_512: {
6338 handleIntrinsicByApplyingToShadow(I,
6339 Intrinsic::x86_avx512_mask_pmov_db_512,
6340 /* trailingVerbatimArgs=*/1);
6341 break;
6342 }
6343
6344 case Intrinsic::x86_avx512_mask_pmovs_qb_512:
6345 case Intrinsic::x86_avx512_mask_pmovus_qb_512: {
6346 handleIntrinsicByApplyingToShadow(I,
6347 Intrinsic::x86_avx512_mask_pmov_qb_512,
6348 /* trailingVerbatimArgs=*/1);
6349 break;
6350 }
6351
6352 case Intrinsic::x86_avx512_mask_pmovs_qw_512:
6353 case Intrinsic::x86_avx512_mask_pmovus_qw_512: {
6354 handleIntrinsicByApplyingToShadow(I,
6355 Intrinsic::x86_avx512_mask_pmov_qw_512,
6356 /* trailingVerbatimArgs=*/1);
6357 break;
6358 }
6359
6360 case Intrinsic::x86_avx512_mask_pmovs_qd_512:
6361 case Intrinsic::x86_avx512_mask_pmovus_qd_512:
6362 case Intrinsic::x86_avx512_mask_pmovs_wb_512:
6363 case Intrinsic::x86_avx512_mask_pmovus_wb_512: {
6364 // Since Intrinsic::x86_avx512_mask_pmov_{qd,wb}_512 do not exist, we
6365 // cannot use handleIntrinsicByApplyingToShadow. Instead, we call the
6366 // slow-path handler.
6367 handleAVX512VectorDownConvert(I);
6368 break;
6369 }
6370
6371 // AVX512/AVX10 Reciprocal
6372 // <16 x float> @llvm.x86.avx512.rsqrt14.ps.512
6373 // (<16 x float>, <16 x float>, i16)
6374 // <8 x float> @llvm.x86.avx512.rsqrt14.ps.256
6375 // (<8 x float>, <8 x float>, i8)
6376 // <4 x float> @llvm.x86.avx512.rsqrt14.ps.128
6377 // (<4 x float>, <4 x float>, i8)
6378 //
6379 // <8 x double> @llvm.x86.avx512.rsqrt14.pd.512
6380 // (<8 x double>, <8 x double>, i8)
6381 // <4 x double> @llvm.x86.avx512.rsqrt14.pd.256
6382 // (<4 x double>, <4 x double>, i8)
6383 // <2 x double> @llvm.x86.avx512.rsqrt14.pd.128
6384 // (<2 x double>, <2 x double>, i8)
6385 //
6386 // <32 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.512
6387 // (<32 x bfloat>, <32 x bfloat>, i32)
6388 // <16 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.256
6389 // (<16 x bfloat>, <16 x bfloat>, i16)
6390 // <8 x bfloat> @llvm.x86.avx10.mask.rsqrt.bf16.128
6391 // (<8 x bfloat>, <8 x bfloat>, i8)
6392 //
6393 // <32 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.512
6394 // (<32 x half>, <32 x half>, i32)
6395 // <16 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.256
6396 // (<16 x half>, <16 x half>, i16)
6397 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.ph.128
6398 // (<8 x half>, <8 x half>, i8)
6399 //
6400 // TODO: 3-operand variants are not handled:
6401 // <2 x double> @llvm.x86.avx512.rsqrt14.sd
6402 // (<2 x double>, <2 x double>, <2 x double>, i8)
6403 // <4 x float> @llvm.x86.avx512.rsqrt14.ss
6404 // (<4 x float>, <4 x float>, <4 x float>, i8)
6405 // <8 x half> @llvm.x86.avx512fp16.mask.rsqrt.sh
6406 // (<8 x half>, <8 x half>, <8 x half>, i8)
6407 case Intrinsic::x86_avx512_rsqrt14_ps_512:
6408 case Intrinsic::x86_avx512_rsqrt14_ps_256:
6409 case Intrinsic::x86_avx512_rsqrt14_ps_128:
6410 case Intrinsic::x86_avx512_rsqrt14_pd_512:
6411 case Intrinsic::x86_avx512_rsqrt14_pd_256:
6412 case Intrinsic::x86_avx512_rsqrt14_pd_128:
6413 case Intrinsic::x86_avx10_mask_rsqrt_bf16_512:
6414 case Intrinsic::x86_avx10_mask_rsqrt_bf16_256:
6415 case Intrinsic::x86_avx10_mask_rsqrt_bf16_128:
6416 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_512:
6417 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_256:
6418 case Intrinsic::x86_avx512fp16_mask_rsqrt_ph_128:
6419 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6420 /*MaskIndex=*/2);
6421 break;
6422
6423 // AVX512/AVX10 Reciprocal Square Root
6424 // <16 x float> @llvm.x86.avx512.rcp14.ps.512
6425 // (<16 x float>, <16 x float>, i16)
6426 // <8 x float> @llvm.x86.avx512.rcp14.ps.256
6427 // (<8 x float>, <8 x float>, i8)
6428 // <4 x float> @llvm.x86.avx512.rcp14.ps.128
6429 // (<4 x float>, <4 x float>, i8)
6430 //
6431 // <8 x double> @llvm.x86.avx512.rcp14.pd.512
6432 // (<8 x double>, <8 x double>, i8)
6433 // <4 x double> @llvm.x86.avx512.rcp14.pd.256
6434 // (<4 x double>, <4 x double>, i8)
6435 // <2 x double> @llvm.x86.avx512.rcp14.pd.128
6436 // (<2 x double>, <2 x double>, i8)
6437 //
6438 // <32 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.512
6439 // (<32 x bfloat>, <32 x bfloat>, i32)
6440 // <16 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.256
6441 // (<16 x bfloat>, <16 x bfloat>, i16)
6442 // <8 x bfloat> @llvm.x86.avx10.mask.rcp.bf16.128
6443 // (<8 x bfloat>, <8 x bfloat>, i8)
6444 //
6445 // <32 x half> @llvm.x86.avx512fp16.mask.rcp.ph.512
6446 // (<32 x half>, <32 x half>, i32)
6447 // <16 x half> @llvm.x86.avx512fp16.mask.rcp.ph.256
6448 // (<16 x half>, <16 x half>, i16)
6449 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.ph.128
6450 // (<8 x half>, <8 x half>, i8)
6451 //
6452 // TODO: 3-operand variants are not handled:
6453 // <2 x double> @llvm.x86.avx512.rcp14.sd
6454 // (<2 x double>, <2 x double>, <2 x double>, i8)
6455 // <4 x float> @llvm.x86.avx512.rcp14.ss
6456 // (<4 x float>, <4 x float>, <4 x float>, i8)
6457 // <8 x half> @llvm.x86.avx512fp16.mask.rcp.sh
6458 // (<8 x half>, <8 x half>, <8 x half>, i8)
6459 case Intrinsic::x86_avx512_rcp14_ps_512:
6460 case Intrinsic::x86_avx512_rcp14_ps_256:
6461 case Intrinsic::x86_avx512_rcp14_ps_128:
6462 case Intrinsic::x86_avx512_rcp14_pd_512:
6463 case Intrinsic::x86_avx512_rcp14_pd_256:
6464 case Intrinsic::x86_avx512_rcp14_pd_128:
6465 case Intrinsic::x86_avx10_mask_rcp_bf16_512:
6466 case Intrinsic::x86_avx10_mask_rcp_bf16_256:
6467 case Intrinsic::x86_avx10_mask_rcp_bf16_128:
6468 case Intrinsic::x86_avx512fp16_mask_rcp_ph_512:
6469 case Intrinsic::x86_avx512fp16_mask_rcp_ph_256:
6470 case Intrinsic::x86_avx512fp16_mask_rcp_ph_128:
6471 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/1,
6472 /*MaskIndex=*/2);
6473 break;
6474
6475 // <32 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.512
6476 // (<32 x half>, i32, <32 x half>, i32, i32)
6477 // <16 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.256
6478 // (<16 x half>, i32, <16 x half>, i32, i16)
6479 // <8 x half> @llvm.x86.avx512fp16.mask.rndscale.ph.128
6480 // (<8 x half>, i32, <8 x half>, i32, i8)
6481 //
6482 // <16 x float> @llvm.x86.avx512.mask.rndscale.ps.512
6483 // (<16 x float>, i32, <16 x float>, i16, i32)
6484 // <8 x float> @llvm.x86.avx512.mask.rndscale.ps.256
6485 // (<8 x float>, i32, <8 x float>, i8)
6486 // <4 x float> @llvm.x86.avx512.mask.rndscale.ps.128
6487 // (<4 x float>, i32, <4 x float>, i8)
6488 //
6489 // <8 x double> @llvm.x86.avx512.mask.rndscale.pd.512
6490 // (<8 x double>, i32, <8 x double>, i8, i32)
6491 // A Imm WriteThru Mask Rounding
6492 // <4 x double> @llvm.x86.avx512.mask.rndscale.pd.256
6493 // (<4 x double>, i32, <4 x double>, i8)
6494 // <2 x double> @llvm.x86.avx512.mask.rndscale.pd.128
6495 // (<2 x double>, i32, <2 x double>, i8)
6496 // A Imm WriteThru Mask
6497 //
6498 // <32 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.512
6499 // (<32 x bfloat>, i32, <32 x bfloat>, i32)
6500 // <16 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.256
6501 // (<16 x bfloat>, i32, <16 x bfloat>, i16)
6502 // <8 x bfloat> @llvm.x86.avx10.mask.rndscale.bf16.128
6503 // (<8 x bfloat>, i32, <8 x bfloat>, i8)
6504 //
6505 // Not supported: three vectors
6506 // - <8 x half> @llvm.x86.avx512fp16.mask.rndscale.sh
6507 // (<8 x half>, <8 x half>,<8 x half>, i8, i32, i32)
6508 // - <4 x float> @llvm.x86.avx512.mask.rndscale.ss
6509 // (<4 x float>, <4 x float>, <4 x float>, i8, i32, i32)
6510 // - <2 x double> @llvm.x86.avx512.mask.rndscale.sd
6511 // (<2 x double>, <2 x double>, <2 x double>, i8, i32,
6512 // i32)
6513 // A B WriteThru Mask Imm
6514 // Rounding
6515 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_512:
6516 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_256:
6517 case Intrinsic::x86_avx512fp16_mask_rndscale_ph_128:
6518 case Intrinsic::x86_avx512_mask_rndscale_ps_512:
6519 case Intrinsic::x86_avx512_mask_rndscale_ps_256:
6520 case Intrinsic::x86_avx512_mask_rndscale_ps_128:
6521 case Intrinsic::x86_avx512_mask_rndscale_pd_512:
6522 case Intrinsic::x86_avx512_mask_rndscale_pd_256:
6523 case Intrinsic::x86_avx512_mask_rndscale_pd_128:
6524 case Intrinsic::x86_avx10_mask_rndscale_bf16_512:
6525 case Intrinsic::x86_avx10_mask_rndscale_bf16_256:
6526 case Intrinsic::x86_avx10_mask_rndscale_bf16_128:
6527 handleAVX512VectorGenericMaskedFP(I, /*AIndex=*/0, /*WriteThruIndex=*/2,
6528 /*MaskIndex=*/3);
6529 break;
6530
6531 // AVX512 FP16 Arithmetic
6532 case Intrinsic::x86_avx512fp16_mask_add_sh_round:
6533 case Intrinsic::x86_avx512fp16_mask_sub_sh_round:
6534 case Intrinsic::x86_avx512fp16_mask_mul_sh_round:
6535 case Intrinsic::x86_avx512fp16_mask_div_sh_round:
6536 case Intrinsic::x86_avx512fp16_mask_max_sh_round:
6537 case Intrinsic::x86_avx512fp16_mask_min_sh_round: {
6538 visitGenericScalarHalfwordInst(I);
6539 break;
6540 }
6541
6542 // AVX Galois Field New Instructions
6543 case Intrinsic::x86_vgf2p8affineqb_128:
6544 case Intrinsic::x86_vgf2p8affineqb_256:
6545 case Intrinsic::x86_vgf2p8affineqb_512:
6546 handleAVXGF2P8Affine(I);
6547 break;
6548
6549 default:
6550 return false;
6551 }
6552
6553 return true;
6554 }
6555
6556 bool maybeHandleArmSIMDIntrinsic(IntrinsicInst &I) {
6557 switch (I.getIntrinsicID()) {
6558 case Intrinsic::aarch64_neon_rshrn:
6559 case Intrinsic::aarch64_neon_sqrshl:
6560 case Intrinsic::aarch64_neon_sqrshrn:
6561 case Intrinsic::aarch64_neon_sqrshrun:
6562 case Intrinsic::aarch64_neon_sqshl:
6563 case Intrinsic::aarch64_neon_sqshlu:
6564 case Intrinsic::aarch64_neon_sqshrn:
6565 case Intrinsic::aarch64_neon_sqshrun:
6566 case Intrinsic::aarch64_neon_srshl:
6567 case Intrinsic::aarch64_neon_sshl:
6568 case Intrinsic::aarch64_neon_uqrshl:
6569 case Intrinsic::aarch64_neon_uqrshrn:
6570 case Intrinsic::aarch64_neon_uqshl:
6571 case Intrinsic::aarch64_neon_uqshrn:
6572 case Intrinsic::aarch64_neon_urshl:
6573 case Intrinsic::aarch64_neon_ushl:
6574 // Not handled here: aarch64_neon_vsli (vector shift left and insert)
6575 handleVectorShiftIntrinsic(I, /* Variable */ false);
6576 break;
6577
6578 // TODO: handling max/min similarly to AND/OR may be more precise
6579 // Floating-Point Maximum/Minimum Pairwise
6580 case Intrinsic::aarch64_neon_fmaxp:
6581 case Intrinsic::aarch64_neon_fminp:
6582 // Floating-Point Maximum/Minimum Number Pairwise
6583 case Intrinsic::aarch64_neon_fmaxnmp:
6584 case Intrinsic::aarch64_neon_fminnmp:
6585 // Signed/Unsigned Maximum/Minimum Pairwise
6586 case Intrinsic::aarch64_neon_smaxp:
6587 case Intrinsic::aarch64_neon_sminp:
6588 case Intrinsic::aarch64_neon_umaxp:
6589 case Intrinsic::aarch64_neon_uminp:
6590 // Add Pairwise
6591 case Intrinsic::aarch64_neon_addp:
6592 // Floating-point Add Pairwise
6593 case Intrinsic::aarch64_neon_faddp:
6594 // Add Long Pairwise
6595 case Intrinsic::aarch64_neon_saddlp:
6596 case Intrinsic::aarch64_neon_uaddlp: {
6597 handlePairwiseShadowOrIntrinsic(I, /*Shards=*/1);
6598 break;
6599 }
6600
6601 // Floating-point Convert to integer, rounding to nearest with ties to Away
6602 case Intrinsic::aarch64_neon_fcvtas:
6603 case Intrinsic::aarch64_neon_fcvtau:
6604 // Floating-point convert to integer, rounding toward minus infinity
6605 case Intrinsic::aarch64_neon_fcvtms:
6606 case Intrinsic::aarch64_neon_fcvtmu:
6607 // Floating-point convert to integer, rounding to nearest with ties to even
6608 case Intrinsic::aarch64_neon_fcvtns:
6609 case Intrinsic::aarch64_neon_fcvtnu:
6610 // Floating-point convert to integer, rounding toward plus infinity
6611 case Intrinsic::aarch64_neon_fcvtps:
6612 case Intrinsic::aarch64_neon_fcvtpu:
6613 // Floating-point Convert to integer, rounding toward Zero
6614 case Intrinsic::aarch64_neon_fcvtzs:
6615 case Intrinsic::aarch64_neon_fcvtzu:
6616 // Floating-point convert to lower precision narrow, rounding to odd
6617 case Intrinsic::aarch64_neon_fcvtxn: {
6618 handleNEONVectorConvertIntrinsic(I);
6619 break;
6620 }
6621
6622 // Add reduction to scalar
6623 case Intrinsic::aarch64_neon_faddv:
6624 case Intrinsic::aarch64_neon_saddv:
6625 case Intrinsic::aarch64_neon_uaddv:
6626 // Signed/Unsigned min/max (Vector)
6627 // TODO: handling similarly to AND/OR may be more precise.
6628 case Intrinsic::aarch64_neon_smaxv:
6629 case Intrinsic::aarch64_neon_sminv:
6630 case Intrinsic::aarch64_neon_umaxv:
6631 case Intrinsic::aarch64_neon_uminv:
6632 // Floating-point min/max (vector)
6633 // The f{min,max}"nm"v variants handle NaN differently than f{min,max}v,
6634 // but our shadow propagation is the same.
6635 case Intrinsic::aarch64_neon_fmaxv:
6636 case Intrinsic::aarch64_neon_fminv:
6637 case Intrinsic::aarch64_neon_fmaxnmv:
6638 case Intrinsic::aarch64_neon_fminnmv:
6639 // Sum long across vector
6640 case Intrinsic::aarch64_neon_saddlv:
6641 case Intrinsic::aarch64_neon_uaddlv:
6642 handleVectorReduceIntrinsic(I, /*AllowShadowCast=*/true);
6643 break;
6644
6645 case Intrinsic::aarch64_neon_ld1x2:
6646 case Intrinsic::aarch64_neon_ld1x3:
6647 case Intrinsic::aarch64_neon_ld1x4:
6648 case Intrinsic::aarch64_neon_ld2:
6649 case Intrinsic::aarch64_neon_ld3:
6650 case Intrinsic::aarch64_neon_ld4:
6651 case Intrinsic::aarch64_neon_ld2r:
6652 case Intrinsic::aarch64_neon_ld3r:
6653 case Intrinsic::aarch64_neon_ld4r: {
6654 handleNEONVectorLoad(I, /*WithLane=*/false);
6655 break;
6656 }
6657
6658 case Intrinsic::aarch64_neon_ld2lane:
6659 case Intrinsic::aarch64_neon_ld3lane:
6660 case Intrinsic::aarch64_neon_ld4lane: {
6661 handleNEONVectorLoad(I, /*WithLane=*/true);
6662 break;
6663 }
6664
6665 // Saturating extract narrow
6666 case Intrinsic::aarch64_neon_sqxtn:
6667 case Intrinsic::aarch64_neon_sqxtun:
6668 case Intrinsic::aarch64_neon_uqxtn:
6669 // These only have one argument, but we (ab)use handleShadowOr because it
6670 // does work on single argument intrinsics and will typecast the shadow
6671 // (and update the origin).
6672 handleShadowOr(I);
6673 break;
6674
6675 case Intrinsic::aarch64_neon_st1x2:
6676 case Intrinsic::aarch64_neon_st1x3:
6677 case Intrinsic::aarch64_neon_st1x4:
6678 case Intrinsic::aarch64_neon_st2:
6679 case Intrinsic::aarch64_neon_st3:
6680 case Intrinsic::aarch64_neon_st4: {
6681 handleNEONVectorStoreIntrinsic(I, false);
6682 break;
6683 }
6684
6685 case Intrinsic::aarch64_neon_st2lane:
6686 case Intrinsic::aarch64_neon_st3lane:
6687 case Intrinsic::aarch64_neon_st4lane: {
6688 handleNEONVectorStoreIntrinsic(I, true);
6689 break;
6690 }
6691
6692 // Arm NEON vector table intrinsics have the source/table register(s) as
6693 // arguments, followed by the index register. They return the output.
6694 //
6695 // 'TBL writes a zero if an index is out-of-range, while TBX leaves the
6696 // original value unchanged in the destination register.'
6697 // Conveniently, zero denotes a clean shadow, which means out-of-range
6698 // indices for TBL will initialize the user data with zero and also clean
6699 // the shadow. (For TBX, neither the user data nor the shadow will be
6700 // updated, which is also correct.)
6701 case Intrinsic::aarch64_neon_tbl1:
6702 case Intrinsic::aarch64_neon_tbl2:
6703 case Intrinsic::aarch64_neon_tbl3:
6704 case Intrinsic::aarch64_neon_tbl4:
6705 case Intrinsic::aarch64_neon_tbx1:
6706 case Intrinsic::aarch64_neon_tbx2:
6707 case Intrinsic::aarch64_neon_tbx3:
6708 case Intrinsic::aarch64_neon_tbx4: {
6709 // The last trailing argument (index register) should be handled verbatim
6710 handleIntrinsicByApplyingToShadow(
6711 I, /*shadowIntrinsicID=*/I.getIntrinsicID(),
6712 /*trailingVerbatimArgs*/ 1);
6713 break;
6714 }
6715
6716 case Intrinsic::aarch64_neon_fmulx:
6717 case Intrinsic::aarch64_neon_pmul:
6718 case Intrinsic::aarch64_neon_pmull:
6719 case Intrinsic::aarch64_neon_smull:
6720 case Intrinsic::aarch64_neon_pmull64:
6721 case Intrinsic::aarch64_neon_umull: {
6722 handleNEONVectorMultiplyIntrinsic(I);
6723 break;
6724 }
6725
6726 default:
6727 return false;
6728 }
6729
6730 return true;
6731 }
6732
6733 void visitIntrinsicInst(IntrinsicInst &I) {
6734 if (maybeHandleCrossPlatformIntrinsic(I))
6735 return;
6736
6737 if (maybeHandleX86SIMDIntrinsic(I))
6738 return;
6739
6740 if (maybeHandleArmSIMDIntrinsic(I))
6741 return;
6742
6743 if (maybeHandleUnknownIntrinsic(I))
6744 return;
6745
6746 visitInstruction(I);
6747 }
6748
6749 void visitLibAtomicLoad(CallBase &CB) {
6750 // Since we use getNextNode here, we can't have CB terminate the BB.
6751 assert(isa<CallInst>(CB));
6752
6753 IRBuilder<> IRB(&CB);
6754 Value *Size = CB.getArgOperand(0);
6755 Value *SrcPtr = CB.getArgOperand(1);
6756 Value *DstPtr = CB.getArgOperand(2);
6757 Value *Ordering = CB.getArgOperand(3);
6758 // Convert the call to have at least Acquire ordering to make sure
6759 // the shadow operations aren't reordered before it.
6760 Value *NewOrdering =
6761 IRB.CreateExtractElement(makeAddAcquireOrderingTable(IRB), Ordering);
6762 CB.setArgOperand(3, NewOrdering);
6763
6764 NextNodeIRBuilder NextIRB(&CB);
6765 Value *SrcShadowPtr, *SrcOriginPtr;
6766 std::tie(SrcShadowPtr, SrcOriginPtr) =
6767 getShadowOriginPtr(SrcPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
6768 /*isStore*/ false);
6769 Value *DstShadowPtr =
6770 getShadowOriginPtr(DstPtr, NextIRB, NextIRB.getInt8Ty(), Align(1),
6771 /*isStore*/ true)
6772 .first;
6773
6774 NextIRB.CreateMemCpy(DstShadowPtr, Align(1), SrcShadowPtr, Align(1), Size);
6775 if (MS.TrackOrigins) {
6776 Value *SrcOrigin = NextIRB.CreateAlignedLoad(MS.OriginTy, SrcOriginPtr,
6778 Value *NewOrigin = updateOrigin(SrcOrigin, NextIRB);
6779 NextIRB.CreateCall(MS.MsanSetOriginFn, {DstPtr, Size, NewOrigin});
6780 }
6781 }
6782
6783 void visitLibAtomicStore(CallBase &CB) {
6784 IRBuilder<> IRB(&CB);
6785 Value *Size = CB.getArgOperand(0);
6786 Value *DstPtr = CB.getArgOperand(2);
6787 Value *Ordering = CB.getArgOperand(3);
6788 // Convert the call to have at least Release ordering to make sure
6789 // the shadow operations aren't reordered after it.
6790 Value *NewOrdering =
6791 IRB.CreateExtractElement(makeAddReleaseOrderingTable(IRB), Ordering);
6792 CB.setArgOperand(3, NewOrdering);
6793
6794 Value *DstShadowPtr =
6795 getShadowOriginPtr(DstPtr, IRB, IRB.getInt8Ty(), Align(1),
6796 /*isStore*/ true)
6797 .first;
6798
6799 // Atomic store always paints clean shadow/origin. See file header.
6800 IRB.CreateMemSet(DstShadowPtr, getCleanShadow(IRB.getInt8Ty()), Size,
6801 Align(1));
6802 }
6803
6804 void visitCallBase(CallBase &CB) {
6805 assert(!CB.getMetadata(LLVMContext::MD_nosanitize));
6806 if (CB.isInlineAsm()) {
6807 // For inline asm (either a call to asm function, or callbr instruction),
6808 // do the usual thing: check argument shadow and mark all outputs as
6809 // clean. Note that any side effects of the inline asm that are not
6810 // immediately visible in its constraints are not handled.
6812 visitAsmInstruction(CB);
6813 else
6814 visitInstruction(CB);
6815 return;
6816 }
6817 LibFunc LF;
6818 if (TLI->getLibFunc(CB, LF)) {
6819 // libatomic.a functions need to have special handling because there isn't
6820 // a good way to intercept them or compile the library with
6821 // instrumentation.
6822 switch (LF) {
6823 case LibFunc_atomic_load:
6824 if (!isa<CallInst>(CB)) {
6825 llvm::errs() << "MSAN -- cannot instrument invoke of libatomic load."
6826 "Ignoring!\n";
6827 break;
6828 }
6829 visitLibAtomicLoad(CB);
6830 return;
6831 case LibFunc_atomic_store:
6832 visitLibAtomicStore(CB);
6833 return;
6834 default:
6835 break;
6836 }
6837 }
6838
6839 if (auto *Call = dyn_cast<CallInst>(&CB)) {
6840 assert(!isa<IntrinsicInst>(Call) && "intrinsics are handled elsewhere");
6841
6842 // We are going to insert code that relies on the fact that the callee
6843 // will become a non-readonly function after it is instrumented by us. To
6844 // prevent this code from being optimized out, mark that function
6845 // non-readonly in advance.
6846 // TODO: We can likely do better than dropping memory() completely here.
6847 AttributeMask B;
6848 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
6849
6851 if (Function *Func = Call->getCalledFunction()) {
6852 Func->removeFnAttrs(B);
6853 }
6854
6856 }
6857 IRBuilder<> IRB(&CB);
6858 bool MayCheckCall = MS.EagerChecks;
6859 if (Function *Func = CB.getCalledFunction()) {
6860 // __sanitizer_unaligned_{load,store} functions may be called by users
6861 // and always expects shadows in the TLS. So don't check them.
6862 MayCheckCall &= !Func->getName().starts_with("__sanitizer_unaligned_");
6863 }
6864
6865 unsigned ArgOffset = 0;
6866 LLVM_DEBUG(dbgs() << " CallSite: " << CB << "\n");
6867 for (const auto &[i, A] : llvm::enumerate(CB.args())) {
6868 if (!A->getType()->isSized()) {
6869 LLVM_DEBUG(dbgs() << "Arg " << i << " is not sized: " << CB << "\n");
6870 continue;
6871 }
6872
6873 if (A->getType()->isScalableTy()) {
6874 LLVM_DEBUG(dbgs() << "Arg " << i << " is vscale: " << CB << "\n");
6875 // Handle as noundef, but don't reserve tls slots.
6876 insertCheckShadowOf(A, &CB);
6877 continue;
6878 }
6879
6880 unsigned Size = 0;
6881 const DataLayout &DL = F.getDataLayout();
6882
6883 bool ByVal = CB.paramHasAttr(i, Attribute::ByVal);
6884 bool NoUndef = CB.paramHasAttr(i, Attribute::NoUndef);
6885 bool EagerCheck = MayCheckCall && !ByVal && NoUndef;
6886
6887 if (EagerCheck) {
6888 insertCheckShadowOf(A, &CB);
6889 Size = DL.getTypeAllocSize(A->getType());
6890 } else {
6891 [[maybe_unused]] Value *Store = nullptr;
6892 // Compute the Shadow for arg even if it is ByVal, because
6893 // in that case getShadow() will copy the actual arg shadow to
6894 // __msan_param_tls.
6895 Value *ArgShadow = getShadow(A);
6896 Value *ArgShadowBase = getShadowPtrForArgument(IRB, ArgOffset);
6897 LLVM_DEBUG(dbgs() << " Arg#" << i << ": " << *A
6898 << " Shadow: " << *ArgShadow << "\n");
6899 if (ByVal) {
6900 // ByVal requires some special handling as it's too big for a single
6901 // load
6902 assert(A->getType()->isPointerTy() &&
6903 "ByVal argument is not a pointer!");
6904 Size = DL.getTypeAllocSize(CB.getParamByValType(i));
6905 if (ArgOffset + Size > kParamTLSSize)
6906 break;
6907 const MaybeAlign ParamAlignment(CB.getParamAlign(i));
6908 MaybeAlign Alignment = std::nullopt;
6909 if (ParamAlignment)
6910 Alignment = std::min(*ParamAlignment, kShadowTLSAlignment);
6911 Value *AShadowPtr, *AOriginPtr;
6912 std::tie(AShadowPtr, AOriginPtr) =
6913 getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), Alignment,
6914 /*isStore*/ false);
6915 if (!PropagateShadow) {
6916 Store = IRB.CreateMemSet(ArgShadowBase,
6918 Size, Alignment);
6919 } else {
6920 Store = IRB.CreateMemCpy(ArgShadowBase, Alignment, AShadowPtr,
6921 Alignment, Size);
6922 if (MS.TrackOrigins) {
6923 Value *ArgOriginBase = getOriginPtrForArgument(IRB, ArgOffset);
6924 // FIXME: OriginSize should be:
6925 // alignTo(A % kMinOriginAlignment + Size, kMinOriginAlignment)
6926 unsigned OriginSize = alignTo(Size, kMinOriginAlignment);
6927 IRB.CreateMemCpy(
6928 ArgOriginBase,
6929 /* by origin_tls[ArgOffset] */ kMinOriginAlignment,
6930 AOriginPtr,
6931 /* by getShadowOriginPtr */ kMinOriginAlignment, OriginSize);
6932 }
6933 }
6934 } else {
6935 // Any other parameters mean we need bit-grained tracking of uninit
6936 // data
6937 Size = DL.getTypeAllocSize(A->getType());
6938 if (ArgOffset + Size > kParamTLSSize)
6939 break;
6940 Store = IRB.CreateAlignedStore(ArgShadow, ArgShadowBase,
6942 Constant *Cst = dyn_cast<Constant>(ArgShadow);
6943 if (MS.TrackOrigins && !(Cst && Cst->isNullValue())) {
6944 IRB.CreateStore(getOrigin(A),
6945 getOriginPtrForArgument(IRB, ArgOffset));
6946 }
6947 }
6948 assert(Store != nullptr);
6949 LLVM_DEBUG(dbgs() << " Param:" << *Store << "\n");
6950 }
6951 assert(Size != 0);
6952 ArgOffset += alignTo(Size, kShadowTLSAlignment);
6953 }
6954 LLVM_DEBUG(dbgs() << " done with call args\n");
6955
6956 FunctionType *FT = CB.getFunctionType();
6957 if (FT->isVarArg()) {
6958 VAHelper->visitCallBase(CB, IRB);
6959 }
6960
6961 // Now, get the shadow for the RetVal.
6962 if (!CB.getType()->isSized())
6963 return;
6964 // Don't emit the epilogue for musttail call returns.
6965 if (isa<CallInst>(CB) && cast<CallInst>(CB).isMustTailCall())
6966 return;
6967
6968 if (MayCheckCall && CB.hasRetAttr(Attribute::NoUndef)) {
6969 setShadow(&CB, getCleanShadow(&CB));
6970 setOrigin(&CB, getCleanOrigin());
6971 return;
6972 }
6973
6974 IRBuilder<> IRBBefore(&CB);
6975 // Until we have full dynamic coverage, make sure the retval shadow is 0.
6976 Value *Base = getShadowPtrForRetval(IRBBefore);
6977 IRBBefore.CreateAlignedStore(getCleanShadow(&CB), Base,
6979 BasicBlock::iterator NextInsn;
6980 if (isa<CallInst>(CB)) {
6981 NextInsn = ++CB.getIterator();
6982 assert(NextInsn != CB.getParent()->end());
6983 } else {
6984 BasicBlock *NormalDest = cast<InvokeInst>(CB).getNormalDest();
6985 if (!NormalDest->getSinglePredecessor()) {
6986 // FIXME: this case is tricky, so we are just conservative here.
6987 // Perhaps we need to split the edge between this BB and NormalDest,
6988 // but a naive attempt to use SplitEdge leads to a crash.
6989 setShadow(&CB, getCleanShadow(&CB));
6990 setOrigin(&CB, getCleanOrigin());
6991 return;
6992 }
6993 // FIXME: NextInsn is likely in a basic block that has not been visited
6994 // yet. Anything inserted there will be instrumented by MSan later!
6995 NextInsn = NormalDest->getFirstInsertionPt();
6996 assert(NextInsn != NormalDest->end() &&
6997 "Could not find insertion point for retval shadow load");
6998 }
6999 IRBuilder<> IRBAfter(&*NextInsn);
7000 Value *RetvalShadow = IRBAfter.CreateAlignedLoad(
7001 getShadowTy(&CB), getShadowPtrForRetval(IRBAfter), kShadowTLSAlignment,
7002 "_msret");
7003 setShadow(&CB, RetvalShadow);
7004 if (MS.TrackOrigins)
7005 setOrigin(&CB, IRBAfter.CreateLoad(MS.OriginTy, getOriginPtrForRetval()));
7006 }
7007
7008 bool isAMustTailRetVal(Value *RetVal) {
7009 if (auto *I = dyn_cast<BitCastInst>(RetVal)) {
7010 RetVal = I->getOperand(0);
7011 }
7012 if (auto *I = dyn_cast<CallInst>(RetVal)) {
7013 return I->isMustTailCall();
7014 }
7015 return false;
7016 }
7017
7018 void visitReturnInst(ReturnInst &I) {
7019 IRBuilder<> IRB(&I);
7020 Value *RetVal = I.getReturnValue();
7021 if (!RetVal)
7022 return;
7023 // Don't emit the epilogue for musttail call returns.
7024 if (isAMustTailRetVal(RetVal))
7025 return;
7026 Value *ShadowPtr = getShadowPtrForRetval(IRB);
7027 bool HasNoUndef = F.hasRetAttribute(Attribute::NoUndef);
7028 bool StoreShadow = !(MS.EagerChecks && HasNoUndef);
7029 // FIXME: Consider using SpecialCaseList to specify a list of functions that
7030 // must always return fully initialized values. For now, we hardcode "main".
7031 bool EagerCheck = (MS.EagerChecks && HasNoUndef) || (F.getName() == "main");
7032
7033 Value *Shadow = getShadow(RetVal);
7034 bool StoreOrigin = true;
7035 if (EagerCheck) {
7036 insertCheckShadowOf(RetVal, &I);
7037 Shadow = getCleanShadow(RetVal);
7038 StoreOrigin = false;
7039 }
7040
7041 // The caller may still expect information passed over TLS if we pass our
7042 // check
7043 if (StoreShadow) {
7044 IRB.CreateAlignedStore(Shadow, ShadowPtr, kShadowTLSAlignment);
7045 if (MS.TrackOrigins && StoreOrigin)
7046 IRB.CreateStore(getOrigin(RetVal), getOriginPtrForRetval());
7047 }
7048 }
7049
7050 void visitPHINode(PHINode &I) {
7051 IRBuilder<> IRB(&I);
7052 if (!PropagateShadow) {
7053 setShadow(&I, getCleanShadow(&I));
7054 setOrigin(&I, getCleanOrigin());
7055 return;
7056 }
7057
7058 ShadowPHINodes.push_back(&I);
7059 setShadow(&I, IRB.CreatePHI(getShadowTy(&I), I.getNumIncomingValues(),
7060 "_msphi_s"));
7061 if (MS.TrackOrigins)
7062 setOrigin(
7063 &I, IRB.CreatePHI(MS.OriginTy, I.getNumIncomingValues(), "_msphi_o"));
7064 }
7065
7066 Value *getLocalVarIdptr(AllocaInst &I) {
7067 ConstantInt *IntConst =
7068 ConstantInt::get(Type::getInt32Ty((*F.getParent()).getContext()), 0);
7069 return new GlobalVariable(*F.getParent(), IntConst->getType(),
7070 /*isConstant=*/false, GlobalValue::PrivateLinkage,
7071 IntConst);
7072 }
7073
7074 Value *getLocalVarDescription(AllocaInst &I) {
7075 return createPrivateConstGlobalForString(*F.getParent(), I.getName());
7076 }
7077
7078 void poisonAllocaUserspace(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7079 if (PoisonStack && ClPoisonStackWithCall) {
7080 IRB.CreateCall(MS.MsanPoisonStackFn, {&I, Len});
7081 } else {
7082 Value *ShadowBase, *OriginBase;
7083 std::tie(ShadowBase, OriginBase) = getShadowOriginPtr(
7084 &I, IRB, IRB.getInt8Ty(), Align(1), /*isStore*/ true);
7085
7086 Value *PoisonValue = IRB.getInt8(PoisonStack ? ClPoisonStackPattern : 0);
7087 IRB.CreateMemSet(ShadowBase, PoisonValue, Len, I.getAlign());
7088 }
7089
7090 if (PoisonStack && MS.TrackOrigins) {
7091 Value *Idptr = getLocalVarIdptr(I);
7092 if (ClPrintStackNames) {
7093 Value *Descr = getLocalVarDescription(I);
7094 IRB.CreateCall(MS.MsanSetAllocaOriginWithDescriptionFn,
7095 {&I, Len, Idptr, Descr});
7096 } else {
7097 IRB.CreateCall(MS.MsanSetAllocaOriginNoDescriptionFn, {&I, Len, Idptr});
7098 }
7099 }
7100 }
7101
7102 void poisonAllocaKmsan(AllocaInst &I, IRBuilder<> &IRB, Value *Len) {
7103 Value *Descr = getLocalVarDescription(I);
7104 if (PoisonStack) {
7105 IRB.CreateCall(MS.MsanPoisonAllocaFn, {&I, Len, Descr});
7106 } else {
7107 IRB.CreateCall(MS.MsanUnpoisonAllocaFn, {&I, Len});
7108 }
7109 }
7110
7111 void instrumentAlloca(AllocaInst &I, Instruction *InsPoint = nullptr) {
7112 if (!InsPoint)
7113 InsPoint = &I;
7114 NextNodeIRBuilder IRB(InsPoint);
7115 const DataLayout &DL = F.getDataLayout();
7116 TypeSize TS = DL.getTypeAllocSize(I.getAllocatedType());
7117 Value *Len = IRB.CreateTypeSize(MS.IntptrTy, TS);
7118 if (I.isArrayAllocation())
7119 Len = IRB.CreateMul(Len,
7120 IRB.CreateZExtOrTrunc(I.getArraySize(), MS.IntptrTy));
7121
7122 if (MS.CompileKernel)
7123 poisonAllocaKmsan(I, IRB, Len);
7124 else
7125 poisonAllocaUserspace(I, IRB, Len);
7126 }
7127
7128 void visitAllocaInst(AllocaInst &I) {
7129 setShadow(&I, getCleanShadow(&I));
7130 setOrigin(&I, getCleanOrigin());
7131 // We'll get to this alloca later unless it's poisoned at the corresponding
7132 // llvm.lifetime.start.
7133 AllocaSet.insert(&I);
7134 }
7135
7136 void visitSelectInst(SelectInst &I) {
7137 // a = select b, c, d
7138 Value *B = I.getCondition();
7139 Value *C = I.getTrueValue();
7140 Value *D = I.getFalseValue();
7141
7142 handleSelectLikeInst(I, B, C, D);
7143 }
7144
7145 void handleSelectLikeInst(Instruction &I, Value *B, Value *C, Value *D) {
7146 IRBuilder<> IRB(&I);
7147
7148 Value *Sb = getShadow(B);
7149 Value *Sc = getShadow(C);
7150 Value *Sd = getShadow(D);
7151
7152 Value *Ob = MS.TrackOrigins ? getOrigin(B) : nullptr;
7153 Value *Oc = MS.TrackOrigins ? getOrigin(C) : nullptr;
7154 Value *Od = MS.TrackOrigins ? getOrigin(D) : nullptr;
7155
7156 // Result shadow if condition shadow is 0.
7157 Value *Sa0 = IRB.CreateSelect(B, Sc, Sd);
7158 Value *Sa1;
7159 if (I.getType()->isAggregateType()) {
7160 // To avoid "sign extending" i1 to an arbitrary aggregate type, we just do
7161 // an extra "select". This results in much more compact IR.
7162 // Sa = select Sb, poisoned, (select b, Sc, Sd)
7163 Sa1 = getPoisonedShadow(getShadowTy(I.getType()));
7164 } else if (isScalableNonVectorType(I.getType())) {
7165 // This is intended to handle target("aarch64.svcount"), which can't be
7166 // handled in the else branch because of incompatibility with CreateXor
7167 // ("The supported LLVM operations on this type are limited to load,
7168 // store, phi, select and alloca instructions").
7169
7170 // TODO: this currently underapproximates. Use Arm SVE EOR in the else
7171 // branch as needed instead.
7172 Sa1 = getCleanShadow(getShadowTy(I.getType()));
7173 } else {
7174 // Sa = select Sb, [ (c^d) | Sc | Sd ], [ b ? Sc : Sd ]
7175 // If Sb (condition is poisoned), look for bits in c and d that are equal
7176 // and both unpoisoned.
7177 // If !Sb (condition is unpoisoned), simply pick one of Sc and Sd.
7178
7179 // Cast arguments to shadow-compatible type.
7180 C = CreateAppToShadowCast(IRB, C);
7181 D = CreateAppToShadowCast(IRB, D);
7182
7183 // Result shadow if condition shadow is 1.
7184 Sa1 = IRB.CreateOr({IRB.CreateXor(C, D), Sc, Sd});
7185 }
7186 Value *Sa = IRB.CreateSelect(Sb, Sa1, Sa0, "_msprop_select");
7187 setShadow(&I, Sa);
7188 if (MS.TrackOrigins) {
7189 // Origins are always i32, so any vector conditions must be flattened.
7190 // FIXME: consider tracking vector origins for app vectors?
7191 if (B->getType()->isVectorTy()) {
7192 B = convertToBool(B, IRB);
7193 Sb = convertToBool(Sb, IRB);
7194 }
7195 // a = select b, c, d
7196 // Oa = Sb ? Ob : (b ? Oc : Od)
7197 setOrigin(&I, IRB.CreateSelect(Sb, Ob, IRB.CreateSelect(B, Oc, Od)));
7198 }
7199 }
7200
7201 void visitLandingPadInst(LandingPadInst &I) {
7202 // Do nothing.
7203 // See https://github.com/google/sanitizers/issues/504
7204 setShadow(&I, getCleanShadow(&I));
7205 setOrigin(&I, getCleanOrigin());
7206 }
7207
7208 void visitCatchSwitchInst(CatchSwitchInst &I) {
7209 setShadow(&I, getCleanShadow(&I));
7210 setOrigin(&I, getCleanOrigin());
7211 }
7212
7213 void visitFuncletPadInst(FuncletPadInst &I) {
7214 setShadow(&I, getCleanShadow(&I));
7215 setOrigin(&I, getCleanOrigin());
7216 }
7217
7218 void visitGetElementPtrInst(GetElementPtrInst &I) { handleShadowOr(I); }
7219
7220 void visitExtractValueInst(ExtractValueInst &I) {
7221 IRBuilder<> IRB(&I);
7222 Value *Agg = I.getAggregateOperand();
7223 LLVM_DEBUG(dbgs() << "ExtractValue: " << I << "\n");
7224 Value *AggShadow = getShadow(Agg);
7225 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7226 Value *ResShadow = IRB.CreateExtractValue(AggShadow, I.getIndices());
7227 LLVM_DEBUG(dbgs() << " ResShadow: " << *ResShadow << "\n");
7228 setShadow(&I, ResShadow);
7229 setOriginForNaryOp(I);
7230 }
7231
7232 void visitInsertValueInst(InsertValueInst &I) {
7233 IRBuilder<> IRB(&I);
7234 LLVM_DEBUG(dbgs() << "InsertValue: " << I << "\n");
7235 Value *AggShadow = getShadow(I.getAggregateOperand());
7236 Value *InsShadow = getShadow(I.getInsertedValueOperand());
7237 LLVM_DEBUG(dbgs() << " AggShadow: " << *AggShadow << "\n");
7238 LLVM_DEBUG(dbgs() << " InsShadow: " << *InsShadow << "\n");
7239 Value *Res = IRB.CreateInsertValue(AggShadow, InsShadow, I.getIndices());
7240 LLVM_DEBUG(dbgs() << " Res: " << *Res << "\n");
7241 setShadow(&I, Res);
7242 setOriginForNaryOp(I);
7243 }
7244
7245 void dumpInst(Instruction &I) {
7246 if (CallInst *CI = dyn_cast<CallInst>(&I)) {
7247 errs() << "ZZZ call " << CI->getCalledFunction()->getName() << "\n";
7248 } else {
7249 errs() << "ZZZ " << I.getOpcodeName() << "\n";
7250 }
7251 errs() << "QQQ " << I << "\n";
7252 }
7253
7254 void visitResumeInst(ResumeInst &I) {
7255 LLVM_DEBUG(dbgs() << "Resume: " << I << "\n");
7256 // Nothing to do here.
7257 }
7258
7259 void visitCleanupReturnInst(CleanupReturnInst &CRI) {
7260 LLVM_DEBUG(dbgs() << "CleanupReturn: " << CRI << "\n");
7261 // Nothing to do here.
7262 }
7263
7264 void visitCatchReturnInst(CatchReturnInst &CRI) {
7265 LLVM_DEBUG(dbgs() << "CatchReturn: " << CRI << "\n");
7266 // Nothing to do here.
7267 }
7268
7269 void instrumentAsmArgument(Value *Operand, Type *ElemTy, Instruction &I,
7270 IRBuilder<> &IRB, const DataLayout &DL,
7271 bool isOutput) {
7272 // For each assembly argument, we check its value for being initialized.
7273 // If the argument is a pointer, we assume it points to a single element
7274 // of the corresponding type (or to a 8-byte word, if the type is unsized).
7275 // Each such pointer is instrumented with a call to the runtime library.
7276 Type *OpType = Operand->getType();
7277 // Check the operand value itself.
7278 insertCheckShadowOf(Operand, &I);
7279 if (!OpType->isPointerTy() || !isOutput) {
7280 assert(!isOutput);
7281 return;
7282 }
7283 if (!ElemTy->isSized())
7284 return;
7285 auto Size = DL.getTypeStoreSize(ElemTy);
7286 Value *SizeVal = IRB.CreateTypeSize(MS.IntptrTy, Size);
7287 if (MS.CompileKernel) {
7288 IRB.CreateCall(MS.MsanInstrumentAsmStoreFn, {Operand, SizeVal});
7289 } else {
7290 // ElemTy, derived from elementtype(), does not encode the alignment of
7291 // the pointer. Conservatively assume that the shadow memory is unaligned.
7292 // When Size is large, avoid StoreInst as it would expand to many
7293 // instructions.
7294 auto [ShadowPtr, _] =
7295 getShadowOriginPtrUserspace(Operand, IRB, IRB.getInt8Ty(), Align(1));
7296 if (Size <= 32)
7297 IRB.CreateAlignedStore(getCleanShadow(ElemTy), ShadowPtr, Align(1));
7298 else
7299 IRB.CreateMemSet(ShadowPtr, ConstantInt::getNullValue(IRB.getInt8Ty()),
7300 SizeVal, Align(1));
7301 }
7302 }
7303
7304 /// Get the number of output arguments returned by pointers.
7305 int getNumOutputArgs(InlineAsm *IA, CallBase *CB) {
7306 int NumRetOutputs = 0;
7307 int NumOutputs = 0;
7308 Type *RetTy = cast<Value>(CB)->getType();
7309 if (!RetTy->isVoidTy()) {
7310 // Register outputs are returned via the CallInst return value.
7311 auto *ST = dyn_cast<StructType>(RetTy);
7312 if (ST)
7313 NumRetOutputs = ST->getNumElements();
7314 else
7315 NumRetOutputs = 1;
7316 }
7317 InlineAsm::ConstraintInfoVector Constraints = IA->ParseConstraints();
7318 for (const InlineAsm::ConstraintInfo &Info : Constraints) {
7319 switch (Info.Type) {
7321 NumOutputs++;
7322 break;
7323 default:
7324 break;
7325 }
7326 }
7327 return NumOutputs - NumRetOutputs;
7328 }
7329
7330 void visitAsmInstruction(Instruction &I) {
7331 // Conservative inline assembly handling: check for poisoned shadow of
7332 // asm() arguments, then unpoison the result and all the memory locations
7333 // pointed to by those arguments.
7334 // An inline asm() statement in C++ contains lists of input and output
7335 // arguments used by the assembly code. These are mapped to operands of the
7336 // CallInst as follows:
7337 // - nR register outputs ("=r) are returned by value in a single structure
7338 // (SSA value of the CallInst);
7339 // - nO other outputs ("=m" and others) are returned by pointer as first
7340 // nO operands of the CallInst;
7341 // - nI inputs ("r", "m" and others) are passed to CallInst as the
7342 // remaining nI operands.
7343 // The total number of asm() arguments in the source is nR+nO+nI, and the
7344 // corresponding CallInst has nO+nI+1 operands (the last operand is the
7345 // function to be called).
7346 const DataLayout &DL = F.getDataLayout();
7347 CallBase *CB = cast<CallBase>(&I);
7348 IRBuilder<> IRB(&I);
7349 InlineAsm *IA = cast<InlineAsm>(CB->getCalledOperand());
7350 int OutputArgs = getNumOutputArgs(IA, CB);
7351 // The last operand of a CallInst is the function itself.
7352 int NumOperands = CB->getNumOperands() - 1;
7353
7354 // Check input arguments. Doing so before unpoisoning output arguments, so
7355 // that we won't overwrite uninit values before checking them.
7356 for (int i = OutputArgs; i < NumOperands; i++) {
7357 Value *Operand = CB->getOperand(i);
7358 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7359 /*isOutput*/ false);
7360 }
7361 // Unpoison output arguments. This must happen before the actual InlineAsm
7362 // call, so that the shadow for memory published in the asm() statement
7363 // remains valid.
7364 for (int i = 0; i < OutputArgs; i++) {
7365 Value *Operand = CB->getOperand(i);
7366 instrumentAsmArgument(Operand, CB->getParamElementType(i), I, IRB, DL,
7367 /*isOutput*/ true);
7368 }
7369
7370 setShadow(&I, getCleanShadow(&I));
7371 setOrigin(&I, getCleanOrigin());
7372 }
7373
7374 void visitFreezeInst(FreezeInst &I) {
7375 // Freeze always returns a fully defined value.
7376 setShadow(&I, getCleanShadow(&I));
7377 setOrigin(&I, getCleanOrigin());
7378 }
7379
7380 void visitInstruction(Instruction &I) {
7381 // Everything else: stop propagating and check for poisoned shadow.
7383 dumpInst(I);
7384 LLVM_DEBUG(dbgs() << "DEFAULT: " << I << "\n");
7385 for (size_t i = 0, n = I.getNumOperands(); i < n; i++) {
7386 Value *Operand = I.getOperand(i);
7387 if (Operand->getType()->isSized())
7388 insertCheckShadowOf(Operand, &I);
7389 }
7390 setShadow(&I, getCleanShadow(&I));
7391 setOrigin(&I, getCleanOrigin());
7392 }
7393};
7394
7395struct VarArgHelperBase : public VarArgHelper {
7396 Function &F;
7397 MemorySanitizer &MS;
7398 MemorySanitizerVisitor &MSV;
7399 SmallVector<CallInst *, 16> VAStartInstrumentationList;
7400 const unsigned VAListTagSize;
7401
7402 VarArgHelperBase(Function &F, MemorySanitizer &MS,
7403 MemorySanitizerVisitor &MSV, unsigned VAListTagSize)
7404 : F(F), MS(MS), MSV(MSV), VAListTagSize(VAListTagSize) {}
7405
7406 Value *getShadowAddrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7407 Value *Base = IRB.CreatePointerCast(MS.VAArgTLS, MS.IntptrTy);
7408 return IRB.CreateAdd(Base, ConstantInt::get(MS.IntptrTy, ArgOffset));
7409 }
7410
7411 /// Compute the shadow address for a given va_arg.
7412 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset) {
7413 return IRB.CreatePtrAdd(
7414 MS.VAArgTLS, ConstantInt::get(MS.IntptrTy, ArgOffset), "_msarg_va_s");
7415 }
7416
7417 /// Compute the shadow address for a given va_arg.
7418 Value *getShadowPtrForVAArgument(IRBuilder<> &IRB, unsigned ArgOffset,
7419 unsigned ArgSize) {
7420 // Make sure we don't overflow __msan_va_arg_tls.
7421 if (ArgOffset + ArgSize > kParamTLSSize)
7422 return nullptr;
7423 return getShadowPtrForVAArgument(IRB, ArgOffset);
7424 }
7425
7426 /// Compute the origin address for a given va_arg.
7427 Value *getOriginPtrForVAArgument(IRBuilder<> &IRB, int ArgOffset) {
7428 // getOriginPtrForVAArgument() is always called after
7429 // getShadowPtrForVAArgument(), so __msan_va_arg_origin_tls can never
7430 // overflow.
7431 return IRB.CreatePtrAdd(MS.VAArgOriginTLS,
7432 ConstantInt::get(MS.IntptrTy, ArgOffset),
7433 "_msarg_va_o");
7434 }
7435
7436 void CleanUnusedTLS(IRBuilder<> &IRB, Value *ShadowBase,
7437 unsigned BaseOffset) {
7438 // The tails of __msan_va_arg_tls is not large enough to fit full
7439 // value shadow, but it will be copied to backup anyway. Make it
7440 // clean.
7441 if (BaseOffset >= kParamTLSSize)
7442 return;
7443 Value *TailSize =
7444 ConstantInt::getSigned(IRB.getInt32Ty(), kParamTLSSize - BaseOffset);
7445 IRB.CreateMemSet(ShadowBase, ConstantInt::getNullValue(IRB.getInt8Ty()),
7446 TailSize, Align(8));
7447 }
7448
7449 void unpoisonVAListTagForInst(IntrinsicInst &I) {
7450 IRBuilder<> IRB(&I);
7451 Value *VAListTag = I.getArgOperand(0);
7452 const Align Alignment = Align(8);
7453 auto [ShadowPtr, OriginPtr] = MSV.getShadowOriginPtr(
7454 VAListTag, IRB, IRB.getInt8Ty(), Alignment, /*isStore*/ true);
7455 // Unpoison the whole __va_list_tag.
7456 IRB.CreateMemSet(ShadowPtr, Constant::getNullValue(IRB.getInt8Ty()),
7457 VAListTagSize, Alignment, false);
7458 }
7459
7460 void visitVAStartInst(VAStartInst &I) override {
7461 if (F.getCallingConv() == CallingConv::Win64)
7462 return;
7463 VAStartInstrumentationList.push_back(&I);
7464 unpoisonVAListTagForInst(I);
7465 }
7466
7467 void visitVACopyInst(VACopyInst &I) override {
7468 if (F.getCallingConv() == CallingConv::Win64)
7469 return;
7470 unpoisonVAListTagForInst(I);
7471 }
7472};
7473
7474/// AMD64-specific implementation of VarArgHelper.
7475struct VarArgAMD64Helper : public VarArgHelperBase {
7476 // An unfortunate workaround for asymmetric lowering of va_arg stuff.
7477 // See a comment in visitCallBase for more details.
7478 static const unsigned AMD64GpEndOffset = 48; // AMD64 ABI Draft 0.99.6 p3.5.7
7479 static const unsigned AMD64FpEndOffsetSSE = 176;
7480 // If SSE is disabled, fp_offset in va_list is zero.
7481 static const unsigned AMD64FpEndOffsetNoSSE = AMD64GpEndOffset;
7482
7483 unsigned AMD64FpEndOffset;
7484 AllocaInst *VAArgTLSCopy = nullptr;
7485 AllocaInst *VAArgTLSOriginCopy = nullptr;
7486 Value *VAArgOverflowSize = nullptr;
7487
7488 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7489
7490 VarArgAMD64Helper(Function &F, MemorySanitizer &MS,
7491 MemorySanitizerVisitor &MSV)
7492 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/24) {
7493 AMD64FpEndOffset = AMD64FpEndOffsetSSE;
7494 for (const auto &Attr : F.getAttributes().getFnAttrs()) {
7495 if (Attr.isStringAttribute() &&
7496 (Attr.getKindAsString() == "target-features")) {
7497 if (Attr.getValueAsString().contains("-sse"))
7498 AMD64FpEndOffset = AMD64FpEndOffsetNoSSE;
7499 break;
7500 }
7501 }
7502 }
7503
7504 ArgKind classifyArgument(Value *arg) {
7505 // A very rough approximation of X86_64 argument classification rules.
7506 Type *T = arg->getType();
7507 if (T->isX86_FP80Ty())
7508 return AK_Memory;
7509 if (T->isFPOrFPVectorTy())
7510 return AK_FloatingPoint;
7511 if (T->isIntegerTy() && T->getPrimitiveSizeInBits() <= 64)
7512 return AK_GeneralPurpose;
7513 if (T->isPointerTy())
7514 return AK_GeneralPurpose;
7515 return AK_Memory;
7516 }
7517
7518 // For VarArg functions, store the argument shadow in an ABI-specific format
7519 // that corresponds to va_list layout.
7520 // We do this because Clang lowers va_arg in the frontend, and this pass
7521 // only sees the low level code that deals with va_list internals.
7522 // A much easier alternative (provided that Clang emits va_arg instructions)
7523 // would have been to associate each live instance of va_list with a copy of
7524 // MSanParamTLS, and extract shadow on va_arg() call in the argument list
7525 // order.
7526 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7527 unsigned GpOffset = 0;
7528 unsigned FpOffset = AMD64GpEndOffset;
7529 unsigned OverflowOffset = AMD64FpEndOffset;
7530 const DataLayout &DL = F.getDataLayout();
7531
7532 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7533 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7534 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7535 if (IsByVal) {
7536 // ByVal arguments always go to the overflow area.
7537 // Fixed arguments passed through the overflow area will be stepped
7538 // over by va_start, so don't count them towards the offset.
7539 if (IsFixed)
7540 continue;
7541 assert(A->getType()->isPointerTy());
7542 Type *RealTy = CB.getParamByValType(ArgNo);
7543 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7544 uint64_t AlignedSize = alignTo(ArgSize, 8);
7545 unsigned BaseOffset = OverflowOffset;
7546 Value *ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7547 Value *OriginBase = nullptr;
7548 if (MS.TrackOrigins)
7549 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7550 OverflowOffset += AlignedSize;
7551
7552 if (OverflowOffset > kParamTLSSize) {
7553 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7554 continue; // We have no space to copy shadow there.
7555 }
7556
7557 Value *ShadowPtr, *OriginPtr;
7558 std::tie(ShadowPtr, OriginPtr) =
7559 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(), kShadowTLSAlignment,
7560 /*isStore*/ false);
7561 IRB.CreateMemCpy(ShadowBase, kShadowTLSAlignment, ShadowPtr,
7562 kShadowTLSAlignment, ArgSize);
7563 if (MS.TrackOrigins)
7564 IRB.CreateMemCpy(OriginBase, kShadowTLSAlignment, OriginPtr,
7565 kShadowTLSAlignment, ArgSize);
7566 } else {
7567 ArgKind AK = classifyArgument(A);
7568 if (AK == AK_GeneralPurpose && GpOffset >= AMD64GpEndOffset)
7569 AK = AK_Memory;
7570 if (AK == AK_FloatingPoint && FpOffset >= AMD64FpEndOffset)
7571 AK = AK_Memory;
7572 Value *ShadowBase, *OriginBase = nullptr;
7573 switch (AK) {
7574 case AK_GeneralPurpose:
7575 ShadowBase = getShadowPtrForVAArgument(IRB, GpOffset);
7576 if (MS.TrackOrigins)
7577 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset);
7578 GpOffset += 8;
7579 assert(GpOffset <= kParamTLSSize);
7580 break;
7581 case AK_FloatingPoint:
7582 ShadowBase = getShadowPtrForVAArgument(IRB, FpOffset);
7583 if (MS.TrackOrigins)
7584 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
7585 FpOffset += 16;
7586 assert(FpOffset <= kParamTLSSize);
7587 break;
7588 case AK_Memory:
7589 if (IsFixed)
7590 continue;
7591 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7592 uint64_t AlignedSize = alignTo(ArgSize, 8);
7593 unsigned BaseOffset = OverflowOffset;
7594 ShadowBase = getShadowPtrForVAArgument(IRB, OverflowOffset);
7595 if (MS.TrackOrigins) {
7596 OriginBase = getOriginPtrForVAArgument(IRB, OverflowOffset);
7597 }
7598 OverflowOffset += AlignedSize;
7599 if (OverflowOffset > kParamTLSSize) {
7600 // We have no space to copy shadow there.
7601 CleanUnusedTLS(IRB, ShadowBase, BaseOffset);
7602 continue;
7603 }
7604 }
7605 // Take fixed arguments into account for GpOffset and FpOffset,
7606 // but don't actually store shadows for them.
7607 // TODO(glider): don't call get*PtrForVAArgument() for them.
7608 if (IsFixed)
7609 continue;
7610 Value *Shadow = MSV.getShadow(A);
7611 IRB.CreateAlignedStore(Shadow, ShadowBase, kShadowTLSAlignment);
7612 if (MS.TrackOrigins) {
7613 Value *Origin = MSV.getOrigin(A);
7614 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
7615 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
7617 }
7618 }
7619 }
7620 Constant *OverflowSize =
7621 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AMD64FpEndOffset);
7622 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
7623 }
7624
7625 void finalizeInstrumentation() override {
7626 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7627 "finalizeInstrumentation called twice");
7628 if (!VAStartInstrumentationList.empty()) {
7629 // If there is a va_start in this function, make a backup copy of
7630 // va_arg_tls somewhere in the function entry block.
7631 IRBuilder<> IRB(MSV.FnPrologueEnd);
7632 VAArgOverflowSize =
7633 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7634 Value *CopySize = IRB.CreateAdd(
7635 ConstantInt::get(MS.IntptrTy, AMD64FpEndOffset), VAArgOverflowSize);
7636 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7637 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7638 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7639 CopySize, kShadowTLSAlignment, false);
7640
7641 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7642 Intrinsic::umin, CopySize,
7643 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7644 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7645 kShadowTLSAlignment, SrcSize);
7646 if (MS.TrackOrigins) {
7647 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7648 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
7649 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
7650 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
7651 }
7652 }
7653
7654 // Instrument va_start.
7655 // Copy va_list shadow from the backup copy of the TLS contents.
7656 for (CallInst *OrigInst : VAStartInstrumentationList) {
7657 NextNodeIRBuilder IRB(OrigInst);
7658 Value *VAListTag = OrigInst->getArgOperand(0);
7659
7660 Value *RegSaveAreaPtrPtr =
7661 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 16));
7662 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
7663 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
7664 const Align Alignment = Align(16);
7665 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
7666 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7667 Alignment, /*isStore*/ true);
7668 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
7669 AMD64FpEndOffset);
7670 if (MS.TrackOrigins)
7671 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
7672 Alignment, AMD64FpEndOffset);
7673 Value *OverflowArgAreaPtrPtr =
7674 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, 8));
7675 Value *OverflowArgAreaPtr =
7676 IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
7677 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
7678 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
7679 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
7680 Alignment, /*isStore*/ true);
7681 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
7682 AMD64FpEndOffset);
7683 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
7684 VAArgOverflowSize);
7685 if (MS.TrackOrigins) {
7686 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
7687 AMD64FpEndOffset);
7688 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
7689 VAArgOverflowSize);
7690 }
7691 }
7692 }
7693};
7694
7695/// AArch64-specific implementation of VarArgHelper.
7696struct VarArgAArch64Helper : public VarArgHelperBase {
7697 static const unsigned kAArch64GrArgSize = 64;
7698 static const unsigned kAArch64VrArgSize = 128;
7699
7700 static const unsigned AArch64GrBegOffset = 0;
7701 static const unsigned AArch64GrEndOffset = kAArch64GrArgSize;
7702 // Make VR space aligned to 16 bytes.
7703 static const unsigned AArch64VrBegOffset = AArch64GrEndOffset;
7704 static const unsigned AArch64VrEndOffset =
7705 AArch64VrBegOffset + kAArch64VrArgSize;
7706 static const unsigned AArch64VAEndOffset = AArch64VrEndOffset;
7707
7708 AllocaInst *VAArgTLSCopy = nullptr;
7709 Value *VAArgOverflowSize = nullptr;
7710
7711 enum ArgKind { AK_GeneralPurpose, AK_FloatingPoint, AK_Memory };
7712
7713 VarArgAArch64Helper(Function &F, MemorySanitizer &MS,
7714 MemorySanitizerVisitor &MSV)
7715 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/32) {}
7716
7717 // A very rough approximation of aarch64 argument classification rules.
7718 std::pair<ArgKind, uint64_t> classifyArgument(Type *T) {
7719 if (T->isIntOrPtrTy() && T->getPrimitiveSizeInBits() <= 64)
7720 return {AK_GeneralPurpose, 1};
7721 if (T->isFloatingPointTy() && T->getPrimitiveSizeInBits() <= 128)
7722 return {AK_FloatingPoint, 1};
7723
7724 if (T->isArrayTy()) {
7725 auto R = classifyArgument(T->getArrayElementType());
7726 R.second *= T->getScalarType()->getArrayNumElements();
7727 return R;
7728 }
7729
7730 if (const FixedVectorType *FV = dyn_cast<FixedVectorType>(T)) {
7731 auto R = classifyArgument(FV->getScalarType());
7732 R.second *= FV->getNumElements();
7733 return R;
7734 }
7735
7736 LLVM_DEBUG(errs() << "Unknown vararg type: " << *T << "\n");
7737 return {AK_Memory, 0};
7738 }
7739
7740 // The instrumentation stores the argument shadow in a non ABI-specific
7741 // format because it does not know which argument is named (since Clang,
7742 // like x86_64 case, lowers the va_args in the frontend and this pass only
7743 // sees the low level code that deals with va_list internals).
7744 // The first seven GR registers are saved in the first 56 bytes of the
7745 // va_arg tls arra, followed by the first 8 FP/SIMD registers, and then
7746 // the remaining arguments.
7747 // Using constant offset within the va_arg TLS array allows fast copy
7748 // in the finalize instrumentation.
7749 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7750 unsigned GrOffset = AArch64GrBegOffset;
7751 unsigned VrOffset = AArch64VrBegOffset;
7752 unsigned OverflowOffset = AArch64VAEndOffset;
7753
7754 const DataLayout &DL = F.getDataLayout();
7755 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7756 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7757 auto [AK, RegNum] = classifyArgument(A->getType());
7758 if (AK == AK_GeneralPurpose &&
7759 (GrOffset + RegNum * 8) > AArch64GrEndOffset)
7760 AK = AK_Memory;
7761 if (AK == AK_FloatingPoint &&
7762 (VrOffset + RegNum * 16) > AArch64VrEndOffset)
7763 AK = AK_Memory;
7764 Value *Base;
7765 switch (AK) {
7766 case AK_GeneralPurpose:
7767 Base = getShadowPtrForVAArgument(IRB, GrOffset);
7768 GrOffset += 8 * RegNum;
7769 break;
7770 case AK_FloatingPoint:
7771 Base = getShadowPtrForVAArgument(IRB, VrOffset);
7772 VrOffset += 16 * RegNum;
7773 break;
7774 case AK_Memory:
7775 // Don't count fixed arguments in the overflow area - va_start will
7776 // skip right over them.
7777 if (IsFixed)
7778 continue;
7779 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7780 uint64_t AlignedSize = alignTo(ArgSize, 8);
7781 unsigned BaseOffset = OverflowOffset;
7782 Base = getShadowPtrForVAArgument(IRB, BaseOffset);
7783 OverflowOffset += AlignedSize;
7784 if (OverflowOffset > kParamTLSSize) {
7785 // We have no space to copy shadow there.
7786 CleanUnusedTLS(IRB, Base, BaseOffset);
7787 continue;
7788 }
7789 break;
7790 }
7791 // Count Gp/Vr fixed arguments to their respective offsets, but don't
7792 // bother to actually store a shadow.
7793 if (IsFixed)
7794 continue;
7795 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
7796 }
7797 Constant *OverflowSize =
7798 ConstantInt::get(IRB.getInt64Ty(), OverflowOffset - AArch64VAEndOffset);
7799 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
7800 }
7801
7802 // Retrieve a va_list field of 'void*' size.
7803 Value *getVAField64(IRBuilder<> &IRB, Value *VAListTag, int offset) {
7804 Value *SaveAreaPtrPtr =
7805 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
7806 return IRB.CreateLoad(Type::getInt64Ty(*MS.C), SaveAreaPtrPtr);
7807 }
7808
7809 // Retrieve a va_list field of 'int' size.
7810 Value *getVAField32(IRBuilder<> &IRB, Value *VAListTag, int offset) {
7811 Value *SaveAreaPtr =
7812 IRB.CreatePtrAdd(VAListTag, ConstantInt::get(MS.IntptrTy, offset));
7813 Value *SaveArea32 = IRB.CreateLoad(IRB.getInt32Ty(), SaveAreaPtr);
7814 return IRB.CreateSExt(SaveArea32, MS.IntptrTy);
7815 }
7816
7817 void finalizeInstrumentation() override {
7818 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
7819 "finalizeInstrumentation called twice");
7820 if (!VAStartInstrumentationList.empty()) {
7821 // If there is a va_start in this function, make a backup copy of
7822 // va_arg_tls somewhere in the function entry block.
7823 IRBuilder<> IRB(MSV.FnPrologueEnd);
7824 VAArgOverflowSize =
7825 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
7826 Value *CopySize = IRB.CreateAdd(
7827 ConstantInt::get(MS.IntptrTy, AArch64VAEndOffset), VAArgOverflowSize);
7828 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
7829 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
7830 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
7831 CopySize, kShadowTLSAlignment, false);
7832
7833 Value *SrcSize = IRB.CreateBinaryIntrinsic(
7834 Intrinsic::umin, CopySize,
7835 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
7836 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
7837 kShadowTLSAlignment, SrcSize);
7838 }
7839
7840 Value *GrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64GrArgSize);
7841 Value *VrArgSize = ConstantInt::get(MS.IntptrTy, kAArch64VrArgSize);
7842
7843 // Instrument va_start, copy va_list shadow from the backup copy of
7844 // the TLS contents.
7845 for (CallInst *OrigInst : VAStartInstrumentationList) {
7846 NextNodeIRBuilder IRB(OrigInst);
7847
7848 Value *VAListTag = OrigInst->getArgOperand(0);
7849
7850 // The variadic ABI for AArch64 creates two areas to save the incoming
7851 // argument registers (one for 64-bit general register xn-x7 and another
7852 // for 128-bit FP/SIMD vn-v7).
7853 // We need then to propagate the shadow arguments on both regions
7854 // 'va::__gr_top + va::__gr_offs' and 'va::__vr_top + va::__vr_offs'.
7855 // The remaining arguments are saved on shadow for 'va::stack'.
7856 // One caveat is it requires only to propagate the non-named arguments,
7857 // however on the call site instrumentation 'all' the arguments are
7858 // saved. So to copy the shadow values from the va_arg TLS array
7859 // we need to adjust the offset for both GR and VR fields based on
7860 // the __{gr,vr}_offs value (since they are stores based on incoming
7861 // named arguments).
7862 Type *RegSaveAreaPtrTy = IRB.getPtrTy();
7863
7864 // Read the stack pointer from the va_list.
7865 Value *StackSaveAreaPtr =
7866 IRB.CreateIntToPtr(getVAField64(IRB, VAListTag, 0), RegSaveAreaPtrTy);
7867
7868 // Read both the __gr_top and __gr_off and add them up.
7869 Value *GrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 8);
7870 Value *GrOffSaveArea = getVAField32(IRB, VAListTag, 24);
7871
7872 Value *GrRegSaveAreaPtr = IRB.CreateIntToPtr(
7873 IRB.CreateAdd(GrTopSaveAreaPtr, GrOffSaveArea), RegSaveAreaPtrTy);
7874
7875 // Read both the __vr_top and __vr_off and add them up.
7876 Value *VrTopSaveAreaPtr = getVAField64(IRB, VAListTag, 16);
7877 Value *VrOffSaveArea = getVAField32(IRB, VAListTag, 28);
7878
7879 Value *VrRegSaveAreaPtr = IRB.CreateIntToPtr(
7880 IRB.CreateAdd(VrTopSaveAreaPtr, VrOffSaveArea), RegSaveAreaPtrTy);
7881
7882 // It does not know how many named arguments is being used and, on the
7883 // callsite all the arguments were saved. Since __gr_off is defined as
7884 // '0 - ((8 - named_gr) * 8)', the idea is to just propagate the variadic
7885 // argument by ignoring the bytes of shadow from named arguments.
7886 Value *GrRegSaveAreaShadowPtrOff =
7887 IRB.CreateAdd(GrArgSize, GrOffSaveArea);
7888
7889 Value *GrRegSaveAreaShadowPtr =
7890 MSV.getShadowOriginPtr(GrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7891 Align(8), /*isStore*/ true)
7892 .first;
7893
7894 Value *GrSrcPtr =
7895 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy, GrRegSaveAreaShadowPtrOff);
7896 Value *GrCopySize = IRB.CreateSub(GrArgSize, GrRegSaveAreaShadowPtrOff);
7897
7898 IRB.CreateMemCpy(GrRegSaveAreaShadowPtr, Align(8), GrSrcPtr, Align(8),
7899 GrCopySize);
7900
7901 // Again, but for FP/SIMD values.
7902 Value *VrRegSaveAreaShadowPtrOff =
7903 IRB.CreateAdd(VrArgSize, VrOffSaveArea);
7904
7905 Value *VrRegSaveAreaShadowPtr =
7906 MSV.getShadowOriginPtr(VrRegSaveAreaPtr, IRB, IRB.getInt8Ty(),
7907 Align(8), /*isStore*/ true)
7908 .first;
7909
7910 Value *VrSrcPtr = IRB.CreateInBoundsPtrAdd(
7911 IRB.CreateInBoundsPtrAdd(VAArgTLSCopy,
7912 IRB.getInt32(AArch64VrBegOffset)),
7913 VrRegSaveAreaShadowPtrOff);
7914 Value *VrCopySize = IRB.CreateSub(VrArgSize, VrRegSaveAreaShadowPtrOff);
7915
7916 IRB.CreateMemCpy(VrRegSaveAreaShadowPtr, Align(8), VrSrcPtr, Align(8),
7917 VrCopySize);
7918
7919 // And finally for remaining arguments.
7920 Value *StackSaveAreaShadowPtr =
7921 MSV.getShadowOriginPtr(StackSaveAreaPtr, IRB, IRB.getInt8Ty(),
7922 Align(16), /*isStore*/ true)
7923 .first;
7924
7925 Value *StackSrcPtr = IRB.CreateInBoundsPtrAdd(
7926 VAArgTLSCopy, IRB.getInt32(AArch64VAEndOffset));
7927
7928 IRB.CreateMemCpy(StackSaveAreaShadowPtr, Align(16), StackSrcPtr,
7929 Align(16), VAArgOverflowSize);
7930 }
7931 }
7932};
7933
7934/// PowerPC64-specific implementation of VarArgHelper.
7935struct VarArgPowerPC64Helper : public VarArgHelperBase {
7936 AllocaInst *VAArgTLSCopy = nullptr;
7937 Value *VAArgSize = nullptr;
7938
7939 VarArgPowerPC64Helper(Function &F, MemorySanitizer &MS,
7940 MemorySanitizerVisitor &MSV)
7941 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/8) {}
7942
7943 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
7944 // For PowerPC, we need to deal with alignment of stack arguments -
7945 // they are mostly aligned to 8 bytes, but vectors and i128 arrays
7946 // are aligned to 16 bytes, byvals can be aligned to 8 or 16 bytes,
7947 // For that reason, we compute current offset from stack pointer (which is
7948 // always properly aligned), and offset for the first vararg, then subtract
7949 // them.
7950 unsigned VAArgBase;
7951 Triple TargetTriple(F.getParent()->getTargetTriple());
7952 // Parameter save area starts at 48 bytes from frame pointer for ABIv1,
7953 // and 32 bytes for ABIv2. This is usually determined by target
7954 // endianness, but in theory could be overridden by function attribute.
7955 if (TargetTriple.isPPC64ELFv2ABI())
7956 VAArgBase = 32;
7957 else
7958 VAArgBase = 48;
7959 unsigned VAArgOffset = VAArgBase;
7960 const DataLayout &DL = F.getDataLayout();
7961 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
7962 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
7963 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
7964 if (IsByVal) {
7965 assert(A->getType()->isPointerTy());
7966 Type *RealTy = CB.getParamByValType(ArgNo);
7967 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
7968 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(8));
7969 if (ArgAlign < 8)
7970 ArgAlign = Align(8);
7971 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
7972 if (!IsFixed) {
7973 Value *Base =
7974 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
7975 if (Base) {
7976 Value *AShadowPtr, *AOriginPtr;
7977 std::tie(AShadowPtr, AOriginPtr) =
7978 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
7979 kShadowTLSAlignment, /*isStore*/ false);
7980
7981 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
7982 kShadowTLSAlignment, ArgSize);
7983 }
7984 }
7985 VAArgOffset += alignTo(ArgSize, Align(8));
7986 } else {
7987 Value *Base;
7988 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
7989 Align ArgAlign = Align(8);
7990 if (A->getType()->isArrayTy()) {
7991 // Arrays are aligned to element size, except for long double
7992 // arrays, which are aligned to 8 bytes.
7993 Type *ElementTy = A->getType()->getArrayElementType();
7994 if (!ElementTy->isPPC_FP128Ty())
7995 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
7996 } else if (A->getType()->isVectorTy()) {
7997 // Vectors are naturally aligned.
7998 ArgAlign = Align(ArgSize);
7999 }
8000 if (ArgAlign < 8)
8001 ArgAlign = Align(8);
8002 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8003 if (DL.isBigEndian()) {
8004 // Adjusting the shadow for argument with size < 8 to match the
8005 // placement of bits in big endian system
8006 if (ArgSize < 8)
8007 VAArgOffset += (8 - ArgSize);
8008 }
8009 if (!IsFixed) {
8010 Base =
8011 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8012 if (Base)
8013 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8014 }
8015 VAArgOffset += ArgSize;
8016 VAArgOffset = alignTo(VAArgOffset, Align(8));
8017 }
8018 if (IsFixed)
8019 VAArgBase = VAArgOffset;
8020 }
8021
8022 Constant *TotalVAArgSize =
8023 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
8024 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8025 // a new class member i.e. it is the total size of all VarArgs.
8026 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8027 }
8028
8029 void finalizeInstrumentation() override {
8030 assert(!VAArgSize && !VAArgTLSCopy &&
8031 "finalizeInstrumentation called twice");
8032 IRBuilder<> IRB(MSV.FnPrologueEnd);
8033 VAArgSize = IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8034 Value *CopySize = VAArgSize;
8035
8036 if (!VAStartInstrumentationList.empty()) {
8037 // If there is a va_start in this function, make a backup copy of
8038 // va_arg_tls somewhere in the function entry block.
8039
8040 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8041 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8042 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8043 CopySize, kShadowTLSAlignment, false);
8044
8045 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8046 Intrinsic::umin, CopySize,
8047 ConstantInt::get(IRB.getInt64Ty(), kParamTLSSize));
8048 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8049 kShadowTLSAlignment, SrcSize);
8050 }
8051
8052 // Instrument va_start.
8053 // Copy va_list shadow from the backup copy of the TLS contents.
8054 for (CallInst *OrigInst : VAStartInstrumentationList) {
8055 NextNodeIRBuilder IRB(OrigInst);
8056 Value *VAListTag = OrigInst->getArgOperand(0);
8057 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8058
8059 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8060
8061 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8062 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8063 const DataLayout &DL = F.getDataLayout();
8064 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8065 const Align Alignment = Align(IntptrSize);
8066 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8067 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8068 Alignment, /*isStore*/ true);
8069 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8070 CopySize);
8071 }
8072 }
8073};
8074
8075/// PowerPC32-specific implementation of VarArgHelper.
8076struct VarArgPowerPC32Helper : public VarArgHelperBase {
8077 AllocaInst *VAArgTLSCopy = nullptr;
8078 Value *VAArgSize = nullptr;
8079
8080 VarArgPowerPC32Helper(Function &F, MemorySanitizer &MS,
8081 MemorySanitizerVisitor &MSV)
8082 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/12) {}
8083
8084 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8085 unsigned VAArgBase;
8086 // Parameter save area is 8 bytes from frame pointer in PPC32
8087 VAArgBase = 8;
8088 unsigned VAArgOffset = VAArgBase;
8089 const DataLayout &DL = F.getDataLayout();
8090 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8091 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8092 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8093 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8094 if (IsByVal) {
8095 assert(A->getType()->isPointerTy());
8096 Type *RealTy = CB.getParamByValType(ArgNo);
8097 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8098 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
8099 if (ArgAlign < IntptrSize)
8100 ArgAlign = Align(IntptrSize);
8101 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8102 if (!IsFixed) {
8103 Value *Base =
8104 getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase, ArgSize);
8105 if (Base) {
8106 Value *AShadowPtr, *AOriginPtr;
8107 std::tie(AShadowPtr, AOriginPtr) =
8108 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8109 kShadowTLSAlignment, /*isStore*/ false);
8110
8111 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8112 kShadowTLSAlignment, ArgSize);
8113 }
8114 }
8115 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
8116 } else {
8117 Value *Base;
8118 Type *ArgTy = A->getType();
8119
8120 // On PPC 32 floating point variable arguments are stored in separate
8121 // area: fp_save_area = reg_save_area + 4*8. We do not copy shaodow for
8122 // them as they will be found when checking call arguments.
8123 if (!ArgTy->isFloatingPointTy()) {
8124 uint64_t ArgSize = DL.getTypeAllocSize(ArgTy);
8125 Align ArgAlign = Align(IntptrSize);
8126 if (ArgTy->isArrayTy()) {
8127 // Arrays are aligned to element size, except for long double
8128 // arrays, which are aligned to 8 bytes.
8129 Type *ElementTy = ArgTy->getArrayElementType();
8130 if (!ElementTy->isPPC_FP128Ty())
8131 ArgAlign = Align(DL.getTypeAllocSize(ElementTy));
8132 } else if (ArgTy->isVectorTy()) {
8133 // Vectors are naturally aligned.
8134 ArgAlign = Align(ArgSize);
8135 }
8136 if (ArgAlign < IntptrSize)
8137 ArgAlign = Align(IntptrSize);
8138 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8139 if (DL.isBigEndian()) {
8140 // Adjusting the shadow for argument with size < IntptrSize to match
8141 // the placement of bits in big endian system
8142 if (ArgSize < IntptrSize)
8143 VAArgOffset += (IntptrSize - ArgSize);
8144 }
8145 if (!IsFixed) {
8146 Base = getShadowPtrForVAArgument(IRB, VAArgOffset - VAArgBase,
8147 ArgSize);
8148 if (Base)
8149 IRB.CreateAlignedStore(MSV.getShadow(A), Base,
8151 }
8152 VAArgOffset += ArgSize;
8153 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
8154 }
8155 }
8156 }
8157
8158 Constant *TotalVAArgSize =
8159 ConstantInt::get(MS.IntptrTy, VAArgOffset - VAArgBase);
8160 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8161 // a new class member i.e. it is the total size of all VarArgs.
8162 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8163 }
8164
8165 void finalizeInstrumentation() override {
8166 assert(!VAArgSize && !VAArgTLSCopy &&
8167 "finalizeInstrumentation called twice");
8168 IRBuilder<> IRB(MSV.FnPrologueEnd);
8169 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8170 Value *CopySize = VAArgSize;
8171
8172 if (!VAStartInstrumentationList.empty()) {
8173 // If there is a va_start in this function, make a backup copy of
8174 // va_arg_tls somewhere in the function entry block.
8175
8176 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8177 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8178 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8179 CopySize, kShadowTLSAlignment, false);
8180
8181 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8182 Intrinsic::umin, CopySize,
8183 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8184 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8185 kShadowTLSAlignment, SrcSize);
8186 }
8187
8188 // Instrument va_start.
8189 // Copy va_list shadow from the backup copy of the TLS contents.
8190 for (CallInst *OrigInst : VAStartInstrumentationList) {
8191 NextNodeIRBuilder IRB(OrigInst);
8192 Value *VAListTag = OrigInst->getArgOperand(0);
8193 Value *RegSaveAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8194 Value *RegSaveAreaSize = CopySize;
8195
8196 // In PPC32 va_list_tag is a struct
8197 RegSaveAreaPtrPtr =
8198 IRB.CreateAdd(RegSaveAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 8));
8199
8200 // On PPC 32 reg_save_area can only hold 32 bytes of data
8201 RegSaveAreaSize = IRB.CreateBinaryIntrinsic(
8202 Intrinsic::umin, CopySize, ConstantInt::get(MS.IntptrTy, 32));
8203
8204 RegSaveAreaPtrPtr = IRB.CreateIntToPtr(RegSaveAreaPtrPtr, MS.PtrTy);
8205 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8206
8207 const DataLayout &DL = F.getDataLayout();
8208 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8209 const Align Alignment = Align(IntptrSize);
8210
8211 { // Copy reg save area
8212 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8213 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8214 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8215 Alignment, /*isStore*/ true);
8216 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy,
8217 Alignment, RegSaveAreaSize);
8218
8219 RegSaveAreaShadowPtr =
8220 IRB.CreatePtrToInt(RegSaveAreaShadowPtr, MS.IntptrTy);
8221 Value *FPSaveArea = IRB.CreateAdd(RegSaveAreaShadowPtr,
8222 ConstantInt::get(MS.IntptrTy, 32));
8223 FPSaveArea = IRB.CreateIntToPtr(FPSaveArea, MS.PtrTy);
8224 // We fill fp shadow with zeroes as uninitialized fp args should have
8225 // been found during call base check
8226 IRB.CreateMemSet(FPSaveArea, ConstantInt::getNullValue(IRB.getInt8Ty()),
8227 ConstantInt::get(MS.IntptrTy, 32), Alignment);
8228 }
8229
8230 { // Copy overflow area
8231 // RegSaveAreaSize is min(CopySize, 32) -> no overflow can occur
8232 Value *OverflowAreaSize = IRB.CreateSub(CopySize, RegSaveAreaSize);
8233
8234 Value *OverflowAreaPtrPtr = IRB.CreatePtrToInt(VAListTag, MS.IntptrTy);
8235 OverflowAreaPtrPtr =
8236 IRB.CreateAdd(OverflowAreaPtrPtr, ConstantInt::get(MS.IntptrTy, 4));
8237 OverflowAreaPtrPtr = IRB.CreateIntToPtr(OverflowAreaPtrPtr, MS.PtrTy);
8238
8239 Value *OverflowAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowAreaPtrPtr);
8240
8241 Value *OverflowAreaShadowPtr, *OverflowAreaOriginPtr;
8242 std::tie(OverflowAreaShadowPtr, OverflowAreaOriginPtr) =
8243 MSV.getShadowOriginPtr(OverflowAreaPtr, IRB, IRB.getInt8Ty(),
8244 Alignment, /*isStore*/ true);
8245
8246 Value *OverflowVAArgTLSCopyPtr =
8247 IRB.CreatePtrToInt(VAArgTLSCopy, MS.IntptrTy);
8248 OverflowVAArgTLSCopyPtr =
8249 IRB.CreateAdd(OverflowVAArgTLSCopyPtr, RegSaveAreaSize);
8250
8251 OverflowVAArgTLSCopyPtr =
8252 IRB.CreateIntToPtr(OverflowVAArgTLSCopyPtr, MS.PtrTy);
8253 IRB.CreateMemCpy(OverflowAreaShadowPtr, Alignment,
8254 OverflowVAArgTLSCopyPtr, Alignment, OverflowAreaSize);
8255 }
8256 }
8257 }
8258};
8259
8260/// SystemZ-specific implementation of VarArgHelper.
8261struct VarArgSystemZHelper : public VarArgHelperBase {
8262 static const unsigned SystemZGpOffset = 16;
8263 static const unsigned SystemZGpEndOffset = 56;
8264 static const unsigned SystemZFpOffset = 128;
8265 static const unsigned SystemZFpEndOffset = 160;
8266 static const unsigned SystemZMaxVrArgs = 8;
8267 static const unsigned SystemZRegSaveAreaSize = 160;
8268 static const unsigned SystemZOverflowOffset = 160;
8269 static const unsigned SystemZVAListTagSize = 32;
8270 static const unsigned SystemZOverflowArgAreaPtrOffset = 16;
8271 static const unsigned SystemZRegSaveAreaPtrOffset = 24;
8272
8273 bool IsSoftFloatABI;
8274 AllocaInst *VAArgTLSCopy = nullptr;
8275 AllocaInst *VAArgTLSOriginCopy = nullptr;
8276 Value *VAArgOverflowSize = nullptr;
8277
8278 enum class ArgKind {
8279 GeneralPurpose,
8280 FloatingPoint,
8281 Vector,
8282 Memory,
8283 Indirect,
8284 };
8285
8286 enum class ShadowExtension { None, Zero, Sign };
8287
8288 VarArgSystemZHelper(Function &F, MemorySanitizer &MS,
8289 MemorySanitizerVisitor &MSV)
8290 : VarArgHelperBase(F, MS, MSV, SystemZVAListTagSize),
8291 IsSoftFloatABI(F.getFnAttribute("use-soft-float").getValueAsBool()) {}
8292
8293 ArgKind classifyArgument(Type *T) {
8294 // T is a SystemZABIInfo::classifyArgumentType() output, and there are
8295 // only a few possibilities of what it can be. In particular, enums, single
8296 // element structs and large types have already been taken care of.
8297
8298 // Some i128 and fp128 arguments are converted to pointers only in the
8299 // back end.
8300 if (T->isIntegerTy(128) || T->isFP128Ty())
8301 return ArgKind::Indirect;
8302 if (T->isFloatingPointTy())
8303 return IsSoftFloatABI ? ArgKind::GeneralPurpose : ArgKind::FloatingPoint;
8304 if (T->isIntegerTy() || T->isPointerTy())
8305 return ArgKind::GeneralPurpose;
8306 if (T->isVectorTy())
8307 return ArgKind::Vector;
8308 return ArgKind::Memory;
8309 }
8310
8311 ShadowExtension getShadowExtension(const CallBase &CB, unsigned ArgNo) {
8312 // ABI says: "One of the simple integer types no more than 64 bits wide.
8313 // ... If such an argument is shorter than 64 bits, replace it by a full
8314 // 64-bit integer representing the same number, using sign or zero
8315 // extension". Shadow for an integer argument has the same type as the
8316 // argument itself, so it can be sign or zero extended as well.
8317 bool ZExt = CB.paramHasAttr(ArgNo, Attribute::ZExt);
8318 bool SExt = CB.paramHasAttr(ArgNo, Attribute::SExt);
8319 if (ZExt) {
8320 assert(!SExt);
8321 return ShadowExtension::Zero;
8322 }
8323 if (SExt) {
8324 assert(!ZExt);
8325 return ShadowExtension::Sign;
8326 }
8327 return ShadowExtension::None;
8328 }
8329
8330 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8331 unsigned GpOffset = SystemZGpOffset;
8332 unsigned FpOffset = SystemZFpOffset;
8333 unsigned VrIndex = 0;
8334 unsigned OverflowOffset = SystemZOverflowOffset;
8335 const DataLayout &DL = F.getDataLayout();
8336 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8337 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8338 // SystemZABIInfo does not produce ByVal parameters.
8339 assert(!CB.paramHasAttr(ArgNo, Attribute::ByVal));
8340 Type *T = A->getType();
8341 ArgKind AK = classifyArgument(T);
8342 if (AK == ArgKind::Indirect) {
8343 T = MS.PtrTy;
8344 AK = ArgKind::GeneralPurpose;
8345 }
8346 if (AK == ArgKind::GeneralPurpose && GpOffset >= SystemZGpEndOffset)
8347 AK = ArgKind::Memory;
8348 if (AK == ArgKind::FloatingPoint && FpOffset >= SystemZFpEndOffset)
8349 AK = ArgKind::Memory;
8350 if (AK == ArgKind::Vector && (VrIndex >= SystemZMaxVrArgs || !IsFixed))
8351 AK = ArgKind::Memory;
8352 Value *ShadowBase = nullptr;
8353 Value *OriginBase = nullptr;
8354 ShadowExtension SE = ShadowExtension::None;
8355 switch (AK) {
8356 case ArgKind::GeneralPurpose: {
8357 // Always keep track of GpOffset, but store shadow only for varargs.
8358 uint64_t ArgSize = 8;
8359 if (GpOffset + ArgSize <= kParamTLSSize) {
8360 if (!IsFixed) {
8361 SE = getShadowExtension(CB, ArgNo);
8362 uint64_t GapSize = 0;
8363 if (SE == ShadowExtension::None) {
8364 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8365 assert(ArgAllocSize <= ArgSize);
8366 GapSize = ArgSize - ArgAllocSize;
8367 }
8368 ShadowBase = getShadowAddrForVAArgument(IRB, GpOffset + GapSize);
8369 if (MS.TrackOrigins)
8370 OriginBase = getOriginPtrForVAArgument(IRB, GpOffset + GapSize);
8371 }
8372 GpOffset += ArgSize;
8373 } else {
8374 GpOffset = kParamTLSSize;
8375 }
8376 break;
8377 }
8378 case ArgKind::FloatingPoint: {
8379 // Always keep track of FpOffset, but store shadow only for varargs.
8380 uint64_t ArgSize = 8;
8381 if (FpOffset + ArgSize <= kParamTLSSize) {
8382 if (!IsFixed) {
8383 // PoP says: "A short floating-point datum requires only the
8384 // left-most 32 bit positions of a floating-point register".
8385 // Therefore, in contrast to AK_GeneralPurpose and AK_Memory,
8386 // don't extend shadow and don't mind the gap.
8387 ShadowBase = getShadowAddrForVAArgument(IRB, FpOffset);
8388 if (MS.TrackOrigins)
8389 OriginBase = getOriginPtrForVAArgument(IRB, FpOffset);
8390 }
8391 FpOffset += ArgSize;
8392 } else {
8393 FpOffset = kParamTLSSize;
8394 }
8395 break;
8396 }
8397 case ArgKind::Vector: {
8398 // Keep track of VrIndex. No need to store shadow, since vector varargs
8399 // go through AK_Memory.
8400 assert(IsFixed);
8401 VrIndex++;
8402 break;
8403 }
8404 case ArgKind::Memory: {
8405 // Keep track of OverflowOffset and store shadow only for varargs.
8406 // Ignore fixed args, since we need to copy only the vararg portion of
8407 // the overflow area shadow.
8408 if (!IsFixed) {
8409 uint64_t ArgAllocSize = DL.getTypeAllocSize(T);
8410 uint64_t ArgSize = alignTo(ArgAllocSize, 8);
8411 if (OverflowOffset + ArgSize <= kParamTLSSize) {
8412 SE = getShadowExtension(CB, ArgNo);
8413 uint64_t GapSize =
8414 SE == ShadowExtension::None ? ArgSize - ArgAllocSize : 0;
8415 ShadowBase =
8416 getShadowAddrForVAArgument(IRB, OverflowOffset + GapSize);
8417 if (MS.TrackOrigins)
8418 OriginBase =
8419 getOriginPtrForVAArgument(IRB, OverflowOffset + GapSize);
8420 OverflowOffset += ArgSize;
8421 } else {
8422 OverflowOffset = kParamTLSSize;
8423 }
8424 }
8425 break;
8426 }
8427 case ArgKind::Indirect:
8428 llvm_unreachable("Indirect must be converted to GeneralPurpose");
8429 }
8430 if (ShadowBase == nullptr)
8431 continue;
8432 Value *Shadow = MSV.getShadow(A);
8433 if (SE != ShadowExtension::None)
8434 Shadow = MSV.CreateShadowCast(IRB, Shadow, IRB.getInt64Ty(),
8435 /*Signed*/ SE == ShadowExtension::Sign);
8436 ShadowBase = IRB.CreateIntToPtr(ShadowBase, MS.PtrTy, "_msarg_va_s");
8437 IRB.CreateStore(Shadow, ShadowBase);
8438 if (MS.TrackOrigins) {
8439 Value *Origin = MSV.getOrigin(A);
8440 TypeSize StoreSize = DL.getTypeStoreSize(Shadow->getType());
8441 MSV.paintOrigin(IRB, Origin, OriginBase, StoreSize,
8443 }
8444 }
8445 Constant *OverflowSize = ConstantInt::get(
8446 IRB.getInt64Ty(), OverflowOffset - SystemZOverflowOffset);
8447 IRB.CreateStore(OverflowSize, MS.VAArgOverflowSizeTLS);
8448 }
8449
8450 void copyRegSaveArea(IRBuilder<> &IRB, Value *VAListTag) {
8451 Value *RegSaveAreaPtrPtr = IRB.CreateIntToPtr(
8452 IRB.CreateAdd(
8453 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8454 ConstantInt::get(MS.IntptrTy, SystemZRegSaveAreaPtrOffset)),
8455 MS.PtrTy);
8456 Value *RegSaveAreaPtr = IRB.CreateLoad(MS.PtrTy, RegSaveAreaPtrPtr);
8457 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8458 const Align Alignment = Align(8);
8459 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8460 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(), Alignment,
8461 /*isStore*/ true);
8462 // TODO(iii): copy only fragments filled by visitCallBase()
8463 // TODO(iii): support packed-stack && !use-soft-float
8464 // For use-soft-float functions, it is enough to copy just the GPRs.
8465 unsigned RegSaveAreaSize =
8466 IsSoftFloatABI ? SystemZGpEndOffset : SystemZRegSaveAreaSize;
8467 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8468 RegSaveAreaSize);
8469 if (MS.TrackOrigins)
8470 IRB.CreateMemCpy(RegSaveAreaOriginPtr, Alignment, VAArgTLSOriginCopy,
8471 Alignment, RegSaveAreaSize);
8472 }
8473
8474 // FIXME: This implementation limits OverflowOffset to kParamTLSSize, so we
8475 // don't know real overflow size and can't clear shadow beyond kParamTLSSize.
8476 void copyOverflowArea(IRBuilder<> &IRB, Value *VAListTag) {
8477 Value *OverflowArgAreaPtrPtr = IRB.CreateIntToPtr(
8478 IRB.CreateAdd(
8479 IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8480 ConstantInt::get(MS.IntptrTy, SystemZOverflowArgAreaPtrOffset)),
8481 MS.PtrTy);
8482 Value *OverflowArgAreaPtr = IRB.CreateLoad(MS.PtrTy, OverflowArgAreaPtrPtr);
8483 Value *OverflowArgAreaShadowPtr, *OverflowArgAreaOriginPtr;
8484 const Align Alignment = Align(8);
8485 std::tie(OverflowArgAreaShadowPtr, OverflowArgAreaOriginPtr) =
8486 MSV.getShadowOriginPtr(OverflowArgAreaPtr, IRB, IRB.getInt8Ty(),
8487 Alignment, /*isStore*/ true);
8488 Value *SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSCopy,
8489 SystemZOverflowOffset);
8490 IRB.CreateMemCpy(OverflowArgAreaShadowPtr, Alignment, SrcPtr, Alignment,
8491 VAArgOverflowSize);
8492 if (MS.TrackOrigins) {
8493 SrcPtr = IRB.CreateConstGEP1_32(IRB.getInt8Ty(), VAArgTLSOriginCopy,
8494 SystemZOverflowOffset);
8495 IRB.CreateMemCpy(OverflowArgAreaOriginPtr, Alignment, SrcPtr, Alignment,
8496 VAArgOverflowSize);
8497 }
8498 }
8499
8500 void finalizeInstrumentation() override {
8501 assert(!VAArgOverflowSize && !VAArgTLSCopy &&
8502 "finalizeInstrumentation called twice");
8503 if (!VAStartInstrumentationList.empty()) {
8504 // If there is a va_start in this function, make a backup copy of
8505 // va_arg_tls somewhere in the function entry block.
8506 IRBuilder<> IRB(MSV.FnPrologueEnd);
8507 VAArgOverflowSize =
8508 IRB.CreateLoad(IRB.getInt64Ty(), MS.VAArgOverflowSizeTLS);
8509 Value *CopySize =
8510 IRB.CreateAdd(ConstantInt::get(MS.IntptrTy, SystemZOverflowOffset),
8511 VAArgOverflowSize);
8512 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8513 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8514 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8515 CopySize, kShadowTLSAlignment, false);
8516
8517 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8518 Intrinsic::umin, CopySize,
8519 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8520 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8521 kShadowTLSAlignment, SrcSize);
8522 if (MS.TrackOrigins) {
8523 VAArgTLSOriginCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8524 VAArgTLSOriginCopy->setAlignment(kShadowTLSAlignment);
8525 IRB.CreateMemCpy(VAArgTLSOriginCopy, kShadowTLSAlignment,
8526 MS.VAArgOriginTLS, kShadowTLSAlignment, SrcSize);
8527 }
8528 }
8529
8530 // Instrument va_start.
8531 // Copy va_list shadow from the backup copy of the TLS contents.
8532 for (CallInst *OrigInst : VAStartInstrumentationList) {
8533 NextNodeIRBuilder IRB(OrigInst);
8534 Value *VAListTag = OrigInst->getArgOperand(0);
8535 copyRegSaveArea(IRB, VAListTag);
8536 copyOverflowArea(IRB, VAListTag);
8537 }
8538 }
8539};
8540
8541/// i386-specific implementation of VarArgHelper.
8542struct VarArgI386Helper : public VarArgHelperBase {
8543 AllocaInst *VAArgTLSCopy = nullptr;
8544 Value *VAArgSize = nullptr;
8545
8546 VarArgI386Helper(Function &F, MemorySanitizer &MS,
8547 MemorySanitizerVisitor &MSV)
8548 : VarArgHelperBase(F, MS, MSV, /*VAListTagSize=*/4) {}
8549
8550 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8551 const DataLayout &DL = F.getDataLayout();
8552 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8553 unsigned VAArgOffset = 0;
8554 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8555 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8556 bool IsByVal = CB.paramHasAttr(ArgNo, Attribute::ByVal);
8557 if (IsByVal) {
8558 assert(A->getType()->isPointerTy());
8559 Type *RealTy = CB.getParamByValType(ArgNo);
8560 uint64_t ArgSize = DL.getTypeAllocSize(RealTy);
8561 Align ArgAlign = CB.getParamAlign(ArgNo).value_or(Align(IntptrSize));
8562 if (ArgAlign < IntptrSize)
8563 ArgAlign = Align(IntptrSize);
8564 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8565 if (!IsFixed) {
8566 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8567 if (Base) {
8568 Value *AShadowPtr, *AOriginPtr;
8569 std::tie(AShadowPtr, AOriginPtr) =
8570 MSV.getShadowOriginPtr(A, IRB, IRB.getInt8Ty(),
8571 kShadowTLSAlignment, /*isStore*/ false);
8572
8573 IRB.CreateMemCpy(Base, kShadowTLSAlignment, AShadowPtr,
8574 kShadowTLSAlignment, ArgSize);
8575 }
8576 VAArgOffset += alignTo(ArgSize, Align(IntptrSize));
8577 }
8578 } else {
8579 Value *Base;
8580 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8581 Align ArgAlign = Align(IntptrSize);
8582 VAArgOffset = alignTo(VAArgOffset, ArgAlign);
8583 if (DL.isBigEndian()) {
8584 // Adjusting the shadow for argument with size < IntptrSize to match
8585 // the placement of bits in big endian system
8586 if (ArgSize < IntptrSize)
8587 VAArgOffset += (IntptrSize - ArgSize);
8588 }
8589 if (!IsFixed) {
8590 Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8591 if (Base)
8592 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8593 VAArgOffset += ArgSize;
8594 VAArgOffset = alignTo(VAArgOffset, Align(IntptrSize));
8595 }
8596 }
8597 }
8598
8599 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8600 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8601 // a new class member i.e. it is the total size of all VarArgs.
8602 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8603 }
8604
8605 void finalizeInstrumentation() override {
8606 assert(!VAArgSize && !VAArgTLSCopy &&
8607 "finalizeInstrumentation called twice");
8608 IRBuilder<> IRB(MSV.FnPrologueEnd);
8609 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8610 Value *CopySize = VAArgSize;
8611
8612 if (!VAStartInstrumentationList.empty()) {
8613 // If there is a va_start in this function, make a backup copy of
8614 // va_arg_tls somewhere in the function entry block.
8615 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8616 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8617 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8618 CopySize, kShadowTLSAlignment, false);
8619
8620 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8621 Intrinsic::umin, CopySize,
8622 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8623 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8624 kShadowTLSAlignment, SrcSize);
8625 }
8626
8627 // Instrument va_start.
8628 // Copy va_list shadow from the backup copy of the TLS contents.
8629 for (CallInst *OrigInst : VAStartInstrumentationList) {
8630 NextNodeIRBuilder IRB(OrigInst);
8631 Value *VAListTag = OrigInst->getArgOperand(0);
8632 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8633 Value *RegSaveAreaPtrPtr =
8634 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8635 PointerType::get(*MS.C, 0));
8636 Value *RegSaveAreaPtr =
8637 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8638 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8639 const DataLayout &DL = F.getDataLayout();
8640 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8641 const Align Alignment = Align(IntptrSize);
8642 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8643 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8644 Alignment, /*isStore*/ true);
8645 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8646 CopySize);
8647 }
8648 }
8649};
8650
8651/// Implementation of VarArgHelper that is used for ARM32, MIPS, RISCV,
8652/// LoongArch64.
8653struct VarArgGenericHelper : public VarArgHelperBase {
8654 AllocaInst *VAArgTLSCopy = nullptr;
8655 Value *VAArgSize = nullptr;
8656
8657 VarArgGenericHelper(Function &F, MemorySanitizer &MS,
8658 MemorySanitizerVisitor &MSV, const unsigned VAListTagSize)
8659 : VarArgHelperBase(F, MS, MSV, VAListTagSize) {}
8660
8661 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {
8662 unsigned VAArgOffset = 0;
8663 const DataLayout &DL = F.getDataLayout();
8664 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8665 for (const auto &[ArgNo, A] : llvm::enumerate(CB.args())) {
8666 bool IsFixed = ArgNo < CB.getFunctionType()->getNumParams();
8667 if (IsFixed)
8668 continue;
8669 uint64_t ArgSize = DL.getTypeAllocSize(A->getType());
8670 if (DL.isBigEndian()) {
8671 // Adjusting the shadow for argument with size < IntptrSize to match the
8672 // placement of bits in big endian system
8673 if (ArgSize < IntptrSize)
8674 VAArgOffset += (IntptrSize - ArgSize);
8675 }
8676 Value *Base = getShadowPtrForVAArgument(IRB, VAArgOffset, ArgSize);
8677 VAArgOffset += ArgSize;
8678 VAArgOffset = alignTo(VAArgOffset, IntptrSize);
8679 if (!Base)
8680 continue;
8681 IRB.CreateAlignedStore(MSV.getShadow(A), Base, kShadowTLSAlignment);
8682 }
8683
8684 Constant *TotalVAArgSize = ConstantInt::get(MS.IntptrTy, VAArgOffset);
8685 // Here using VAArgOverflowSizeTLS as VAArgSizeTLS to avoid creation of
8686 // a new class member i.e. it is the total size of all VarArgs.
8687 IRB.CreateStore(TotalVAArgSize, MS.VAArgOverflowSizeTLS);
8688 }
8689
8690 void finalizeInstrumentation() override {
8691 assert(!VAArgSize && !VAArgTLSCopy &&
8692 "finalizeInstrumentation called twice");
8693 IRBuilder<> IRB(MSV.FnPrologueEnd);
8694 VAArgSize = IRB.CreateLoad(MS.IntptrTy, MS.VAArgOverflowSizeTLS);
8695 Value *CopySize = VAArgSize;
8696
8697 if (!VAStartInstrumentationList.empty()) {
8698 // If there is a va_start in this function, make a backup copy of
8699 // va_arg_tls somewhere in the function entry block.
8700 VAArgTLSCopy = IRB.CreateAlloca(Type::getInt8Ty(*MS.C), CopySize);
8701 VAArgTLSCopy->setAlignment(kShadowTLSAlignment);
8702 IRB.CreateMemSet(VAArgTLSCopy, Constant::getNullValue(IRB.getInt8Ty()),
8703 CopySize, kShadowTLSAlignment, false);
8704
8705 Value *SrcSize = IRB.CreateBinaryIntrinsic(
8706 Intrinsic::umin, CopySize,
8707 ConstantInt::get(MS.IntptrTy, kParamTLSSize));
8708 IRB.CreateMemCpy(VAArgTLSCopy, kShadowTLSAlignment, MS.VAArgTLS,
8709 kShadowTLSAlignment, SrcSize);
8710 }
8711
8712 // Instrument va_start.
8713 // Copy va_list shadow from the backup copy of the TLS contents.
8714 for (CallInst *OrigInst : VAStartInstrumentationList) {
8715 NextNodeIRBuilder IRB(OrigInst);
8716 Value *VAListTag = OrigInst->getArgOperand(0);
8717 Type *RegSaveAreaPtrTy = PointerType::getUnqual(*MS.C);
8718 Value *RegSaveAreaPtrPtr =
8719 IRB.CreateIntToPtr(IRB.CreatePtrToInt(VAListTag, MS.IntptrTy),
8720 PointerType::get(*MS.C, 0));
8721 Value *RegSaveAreaPtr =
8722 IRB.CreateLoad(RegSaveAreaPtrTy, RegSaveAreaPtrPtr);
8723 Value *RegSaveAreaShadowPtr, *RegSaveAreaOriginPtr;
8724 const DataLayout &DL = F.getDataLayout();
8725 unsigned IntptrSize = DL.getTypeStoreSize(MS.IntptrTy);
8726 const Align Alignment = Align(IntptrSize);
8727 std::tie(RegSaveAreaShadowPtr, RegSaveAreaOriginPtr) =
8728 MSV.getShadowOriginPtr(RegSaveAreaPtr, IRB, IRB.getInt8Ty(),
8729 Alignment, /*isStore*/ true);
8730 IRB.CreateMemCpy(RegSaveAreaShadowPtr, Alignment, VAArgTLSCopy, Alignment,
8731 CopySize);
8732 }
8733 }
8734};
8735
8736// ARM32, Loongarch64, MIPS and RISCV share the same calling conventions
8737// regarding VAArgs.
8738using VarArgARM32Helper = VarArgGenericHelper;
8739using VarArgRISCVHelper = VarArgGenericHelper;
8740using VarArgMIPSHelper = VarArgGenericHelper;
8741using VarArgLoongArch64Helper = VarArgGenericHelper;
8742
8743/// A no-op implementation of VarArgHelper.
8744struct VarArgNoOpHelper : public VarArgHelper {
8745 VarArgNoOpHelper(Function &F, MemorySanitizer &MS,
8746 MemorySanitizerVisitor &MSV) {}
8747
8748 void visitCallBase(CallBase &CB, IRBuilder<> &IRB) override {}
8749
8750 void visitVAStartInst(VAStartInst &I) override {}
8751
8752 void visitVACopyInst(VACopyInst &I) override {}
8753
8754 void finalizeInstrumentation() override {}
8755};
8756
8757} // end anonymous namespace
8758
8759static VarArgHelper *CreateVarArgHelper(Function &Func, MemorySanitizer &Msan,
8760 MemorySanitizerVisitor &Visitor) {
8761 // VarArg handling is only implemented on AMD64. False positives are possible
8762 // on other platforms.
8763 Triple TargetTriple(Func.getParent()->getTargetTriple());
8764
8765 if (TargetTriple.getArch() == Triple::x86)
8766 return new VarArgI386Helper(Func, Msan, Visitor);
8767
8768 if (TargetTriple.getArch() == Triple::x86_64)
8769 return new VarArgAMD64Helper(Func, Msan, Visitor);
8770
8771 if (TargetTriple.isARM())
8772 return new VarArgARM32Helper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8773
8774 if (TargetTriple.isAArch64())
8775 return new VarArgAArch64Helper(Func, Msan, Visitor);
8776
8777 if (TargetTriple.isSystemZ())
8778 return new VarArgSystemZHelper(Func, Msan, Visitor);
8779
8780 // On PowerPC32 VAListTag is a struct
8781 // {char, char, i16 padding, char *, char *}
8782 if (TargetTriple.isPPC32())
8783 return new VarArgPowerPC32Helper(Func, Msan, Visitor);
8784
8785 if (TargetTriple.isPPC64())
8786 return new VarArgPowerPC64Helper(Func, Msan, Visitor);
8787
8788 if (TargetTriple.isRISCV32())
8789 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8790
8791 if (TargetTriple.isRISCV64())
8792 return new VarArgRISCVHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
8793
8794 if (TargetTriple.isMIPS32())
8795 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/4);
8796
8797 if (TargetTriple.isMIPS64())
8798 return new VarArgMIPSHelper(Func, Msan, Visitor, /*VAListTagSize=*/8);
8799
8800 if (TargetTriple.isLoongArch64())
8801 return new VarArgLoongArch64Helper(Func, Msan, Visitor,
8802 /*VAListTagSize=*/8);
8803
8804 return new VarArgNoOpHelper(Func, Msan, Visitor);
8805}
8806
8807bool MemorySanitizer::sanitizeFunction(Function &F, TargetLibraryInfo &TLI) {
8808 if (!CompileKernel && F.getName() == kMsanModuleCtorName)
8809 return false;
8810
8811 if (F.hasFnAttribute(Attribute::DisableSanitizerInstrumentation))
8812 return false;
8813
8814 MemorySanitizerVisitor Visitor(F, *this, TLI);
8815
8816 // Clear out memory attributes.
8818 B.addAttribute(Attribute::Memory).addAttribute(Attribute::Speculatable);
8819 F.removeFnAttrs(B);
8820
8821 return Visitor.runOnFunction();
8822}
#define Success
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
constexpr LLT S1
AMDGPU Uniform Intrinsic Combine
This file implements a class to represent arbitrary precision integral constant values and operations...
static bool isStore(int Opcode)
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
static cl::opt< ITMode > IT(cl::desc("IT block support"), cl::Hidden, cl::init(DefaultIT), cl::values(clEnumValN(DefaultIT, "arm-default-it", "Generate any type of IT block"), clEnumValN(RestrictedIT, "arm-restrict-it", "Disallow complex IT blocks")))
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClWithComdat("asan-with-comdat", cl::desc("Place ASan constructors in comdat sections"), cl::Hidden, cl::init(true))
VarLocInsertPt getNextNode(const DbgRecord *DVR)
Atomic ordering constants.
This file contains the simple types necessary to represent the attributes associated with functions a...
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< StatepointGC > D("statepoint-example", "an example strategy for statepoint")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
Analysis containing CSE Info
Definition CSEInfo.cpp:27
This file contains the declarations for the subclasses of Constant, which represent the different fla...
const MemoryMapParams Linux_LoongArch64_MemoryMapParams
const MemoryMapParams Linux_X86_64_MemoryMapParams
static cl::opt< int > ClTrackOrigins("dfsan-track-origins", cl::desc("Track origins of labels"), cl::Hidden, cl::init(0))
static AtomicOrdering addReleaseOrdering(AtomicOrdering AO)
static AtomicOrdering addAcquireOrdering(AtomicOrdering AO)
const MemoryMapParams Linux_AArch64_MemoryMapParams
static bool isAMustTailRetVal(Value *RetVal)
This file provides an implementation of debug counters.
#define DEBUG_COUNTER(VARNAME, COUNTERNAME, DESC)
This file defines the DenseMap class.
This file builds on the ADT/GraphTraits.h file to build generic depth first graph iterator.
@ Default
static bool runOnFunction(Function &F, bool PostInlining)
This is the interface for a simple mod/ref and alias analysis over globals.
static size_t TypeSizeToSizeIndex(uint32_t TypeSize)
#define op(i)
Hexagon Common GEP
#define _
Module.h This file contains the declarations for the Module class.
static LVOptions Options
Definition LVOptions.cpp:25
#define F(x, y, z)
Definition MD5.cpp:54
#define I(x, y, z)
Definition MD5.cpp:57
Machine Check Debug Module
static const PlatformMemoryMapParams Linux_S390_MemoryMapParams
static const Align kMinOriginAlignment
static cl::opt< uint64_t > ClShadowBase("msan-shadow-base", cl::desc("Define custom MSan ShadowBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClPoisonUndef("msan-poison-undef", cl::desc("Poison fully undef temporary values. " "Partially undefined constant vectors " "are unaffected by this flag (see " "-msan-poison-undef-vectors)."), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_X86_MemoryMapParams
static cl::opt< uint64_t > ClOriginBase("msan-origin-base", cl::desc("Define custom MSan OriginBase"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClCheckConstantShadow("msan-check-constant-shadow", cl::desc("Insert checks for constant shadow values"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_LoongArch_MemoryMapParams
static const MemoryMapParams NetBSD_X86_64_MemoryMapParams
static const PlatformMemoryMapParams Linux_MIPS_MemoryMapParams
static const unsigned kOriginSize
static cl::opt< bool > ClWithComdat("msan-with-comdat", cl::desc("Place MSan constructors in comdat sections"), cl::Hidden, cl::init(false))
static cl::opt< int > ClTrackOrigins("msan-track-origins", cl::desc("Track origins (allocation sites) of poisoned memory"), cl::Hidden, cl::init(0))
Track origins of uninitialized values.
static cl::opt< int > ClInstrumentationWithCallThreshold("msan-instrumentation-with-call-threshold", cl::desc("If the function being instrumented requires more than " "this number of checks and origin stores, use callbacks instead of " "inline checks (-1 means never use callbacks)."), cl::Hidden, cl::init(3500))
static cl::opt< int > ClPoisonStackPattern("msan-poison-stack-pattern", cl::desc("poison uninitialized stack variables with the given pattern"), cl::Hidden, cl::init(0xff))
static const Align kShadowTLSAlignment
static cl::opt< bool > ClHandleICmpExact("msan-handle-icmp-exact", cl::desc("exact handling of relational integer ICmp"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams Linux_ARM_MemoryMapParams
static cl::opt< bool > ClDumpStrictInstructions("msan-dump-strict-instructions", cl::desc("print out instructions with default strict semantics i.e.," "check that all the inputs are fully initialized, and mark " "the output as fully initialized. These semantics are applied " "to instructions that could not be handled explicitly nor " "heuristically."), cl::Hidden, cl::init(false))
static Constant * getOrInsertGlobal(Module &M, StringRef Name, Type *Ty)
static cl::opt< bool > ClPreciseDisjointOr("msan-precise-disjoint-or", cl::desc("Precisely poison disjoint OR. If false (legacy behavior), " "disjointedness is ignored (i.e., 1|1 is initialized)."), cl::Hidden, cl::init(false))
static const MemoryMapParams Linux_S390X_MemoryMapParams
static cl::opt< bool > ClPoisonStack("msan-poison-stack", cl::desc("poison uninitialized stack variables"), cl::Hidden, cl::init(true))
static const MemoryMapParams Linux_I386_MemoryMapParams
const char kMsanInitName[]
static cl::opt< bool > ClPoisonUndefVectors("msan-poison-undef-vectors", cl::desc("Precisely poison partially undefined constant vectors. " "If false (legacy behavior), the entire vector is " "considered fully initialized, which may lead to false " "negatives. Fully undefined constant vectors are " "unaffected by this flag (see -msan-poison-undef)."), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPrintStackNames("msan-print-stack-names", cl::desc("Print name of local stack variable"), cl::Hidden, cl::init(true))
static cl::opt< uint64_t > ClAndMask("msan-and-mask", cl::desc("Define custom MSan AndMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleLifetimeIntrinsics("msan-handle-lifetime-intrinsics", cl::desc("when possible, poison scoped variables at the beginning of the scope " "(slower, but more precise)"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClKeepGoing("msan-keep-going", cl::desc("keep going after reporting a UMR"), cl::Hidden, cl::init(false))
static const MemoryMapParams FreeBSD_X86_64_MemoryMapParams
static GlobalVariable * createPrivateConstGlobalForString(Module &M, StringRef Str)
Create a non-const global initialized with the given string.
static const PlatformMemoryMapParams Linux_PowerPC_MemoryMapParams
static const size_t kNumberOfAccessSizes
static cl::opt< bool > ClEagerChecks("msan-eager-checks", cl::desc("check arguments and return values at function call boundaries"), cl::Hidden, cl::init(false))
static cl::opt< int > ClDisambiguateWarning("msan-disambiguate-warning-threshold", cl::desc("Define threshold for number of checks per " "debug location to force origin update."), cl::Hidden, cl::init(3))
static VarArgHelper * CreateVarArgHelper(Function &Func, MemorySanitizer &Msan, MemorySanitizerVisitor &Visitor)
static const MemoryMapParams Linux_MIPS64_MemoryMapParams
static const MemoryMapParams Linux_PowerPC64_MemoryMapParams
static cl::opt< uint64_t > ClXorMask("msan-xor-mask", cl::desc("Define custom MSan XorMask"), cl::Hidden, cl::init(0))
static cl::opt< bool > ClHandleAsmConservative("msan-handle-asm-conservative", cl::desc("conservative handling of inline assembly"), cl::Hidden, cl::init(true))
static const PlatformMemoryMapParams FreeBSD_X86_MemoryMapParams
static const PlatformMemoryMapParams FreeBSD_ARM_MemoryMapParams
static const unsigned kParamTLSSize
static cl::opt< bool > ClHandleICmp("msan-handle-icmp", cl::desc("propagate shadow through ICmpEQ and ICmpNE"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClEnableKmsan("msan-kernel", cl::desc("Enable KernelMemorySanitizer instrumentation"), cl::Hidden, cl::init(false))
static cl::opt< bool > ClPoisonStackWithCall("msan-poison-stack-with-call", cl::desc("poison uninitialized stack variables with a call"), cl::Hidden, cl::init(false))
static const PlatformMemoryMapParams NetBSD_X86_MemoryMapParams
static cl::opt< bool > ClDumpHeuristicInstructions("msan-dump-heuristic-instructions", cl::desc("Prints 'unknown' instructions that were handled heuristically. " "Use -msan-dump-strict-instructions to print instructions that " "could not be handled explicitly nor heuristically."), cl::Hidden, cl::init(false))
static const unsigned kRetvalTLSSize
static const MemoryMapParams FreeBSD_AArch64_MemoryMapParams
const char kMsanModuleCtorName[]
static const MemoryMapParams FreeBSD_I386_MemoryMapParams
static cl::opt< bool > ClCheckAccessAddress("msan-check-access-address", cl::desc("report accesses through a pointer which has poisoned shadow"), cl::Hidden, cl::init(true))
static cl::opt< bool > ClDisableChecks("msan-disable-checks", cl::desc("Apply no_sanitize to the whole file"), cl::Hidden, cl::init(false))
#define T
FunctionAnalysisManager FAM
if(PassOpts->AAPipeline)
const SmallVectorImpl< MachineOperand > & Cond
static const char * name
void visit(MachineFunction &MF, MachineBasicBlock &Start, std::function< void(MachineBasicBlock *)> op)
This file implements a set that has insertion order iteration characteristics.
This file defines the SmallPtrSet class.
This file defines the SmallVector class.
This file contains some functions that are useful when dealing with strings.
#define LLVM_DEBUG(...)
Definition Debug.h:114
static TableGen::Emitter::OptClass< SkeletonEmitter > X("gen-skeleton-class", "Generate example skeleton class")
static SymbolRef::Type getType(const Symbol *Sym)
Definition TapiFile.cpp:39
Value * RHS
Value * LHS
static APInt getSignedMinValue(unsigned numBits)
Gets minimum signed value of APInt for a specific bit width.
Definition APInt.h:220
void setAlignment(Align Align)
PassT::Result & getResult(IRUnitT &IR, ExtraArgTs... ExtraArgs)
Get the result of an analysis pass for a given IR unit.
const T & front() const
front - Get the first element.
Definition ArrayRef.h:145
static LLVM_ABI ArrayType * get(Type *ElementType, uint64_t NumElements)
This static method is the primary way to construct an ArrayType.
This class stores enough information to efficiently remove some attributes from an existing AttrBuild...
AttributeMask & addAttribute(Attribute::AttrKind Val)
Add an attribute to the mask.
iterator end()
Definition BasicBlock.h:472
LLVM_ABI const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
LLVM_ABI const BasicBlock * getSinglePredecessor() const
Return the predecessor of this block if it has a single predecessor block.
InstListType::iterator iterator
Instruction iterators...
Definition BasicBlock.h:170
bool isInlineAsm() const
Check if this call is an inline asm statement.
Function * getCalledFunction() const
Returns the function called, or null if this is an indirect function invocation or the function signa...
bool hasRetAttr(Attribute::AttrKind Kind) const
Determine whether the return value has the given attribute.
LLVM_ABI bool paramHasAttr(unsigned ArgNo, Attribute::AttrKind Kind) const
Determine whether the argument or parameter has the given attribute.
void removeFnAttrs(const AttributeMask &AttrsToRemove)
Removes the attributes from the function.
void setCannotMerge()
MaybeAlign getParamAlign(unsigned ArgNo) const
Extract the alignment for a call or parameter (0=unknown).
Type * getParamByValType(unsigned ArgNo) const
Extract the byval type for a call or parameter.
Value * getCalledOperand() const
Type * getParamElementType(unsigned ArgNo) const
Extract the elementtype type for a parameter.
Value * getArgOperand(unsigned i) const
void setArgOperand(unsigned i, Value *v)
FunctionType * getFunctionType() const
iterator_range< User::op_iterator > args()
Iteration adapter for range-for loops.
void addParamAttr(unsigned ArgNo, Attribute::AttrKind Kind)
Adds the attribute to the indicated argument.
Predicate
This enumeration lists the possible predicates for CmpInst subclasses.
Definition InstrTypes.h:676
@ ICMP_SLT
signed less than
Definition InstrTypes.h:705
@ ICMP_SLE
signed less or equal
Definition InstrTypes.h:706
@ ICMP_SGT
signed greater than
Definition InstrTypes.h:703
@ ICMP_SGE
signed greater or equal
Definition InstrTypes.h:704
static LLVM_ABI Constant * get(ArrayType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getString(LLVMContext &Context, StringRef Initializer, bool AddNull=true)
This method constructs a CDS and initializes it with a text string.
static LLVM_ABI Constant * get(LLVMContext &Context, ArrayRef< uint8_t > Elts)
get() constructors - Return a constant with vector type with an element count and element type matchi...
static ConstantInt * getSigned(IntegerType *Ty, int64_t V, bool ImplicitTrunc=true)
Return a ConstantInt with the specified value for the specified type.
Definition Constants.h:138
static LLVM_ABI ConstantInt * getBool(LLVMContext &Context, bool V)
static LLVM_ABI Constant * get(StructType *T, ArrayRef< Constant * > V)
static LLVM_ABI Constant * getSplat(ElementCount EC, Constant *Elt)
Return a ConstantVector with the specified constant in each element.
static LLVM_ABI Constant * get(ArrayRef< Constant * > V)
This is an important base class in LLVM.
Definition Constant.h:43
static LLVM_ABI Constant * getAllOnesValue(Type *Ty)
LLVM_ABI bool isAllOnesValue() const
Return true if this is the value that would be returned by getAllOnesValue.
static LLVM_ABI Constant * getNullValue(Type *Ty)
Constructor to create a '0' constant of arbitrary type.
LLVM_ABI Constant * getAggregateElement(unsigned Elt) const
For aggregates (struct/array/vector) return the constant that corresponds to the specified element if...
LLVM_ABI bool isZeroValue() const
Return true if the value is negative zero or null value.
Definition Constants.cpp:76
LLVM_ABI bool isNullValue() const
Return true if this is the value that would be returned by getNullValue.
Definition Constants.cpp:90
static bool shouldExecute(CounterInfo &Counter)
bool empty() const
Definition DenseMap.h:109
unsigned getNumElements() const
static LLVM_ABI FixedVectorType * get(Type *ElementType, unsigned NumElts)
Definition Type.cpp:802
static FixedVectorType * getHalfElementsVectorType(FixedVectorType *VTy)
A handy container for a FunctionType+Callee-pointer pair, which can be passed around as a single enti...
unsigned getNumParams() const
Return the number of fixed parameters this function type requires.
LLVM_ABI void setComdat(Comdat *C)
Definition Globals.cpp:214
@ PrivateLinkage
Like Internal, but omit from symbol table.
Definition GlobalValue.h:61
@ ExternalLinkage
Externally visible function.
Definition GlobalValue.h:53
Analysis pass providing a never-invalidated alias analysis result.
ConstantInt * getInt1(bool V)
Get a constant value representing either true or false.
Definition IRBuilder.h:497
Value * CreateInsertElement(Type *VecTy, Value *NewElt, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2579
Value * CreateConstGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition IRBuilder.h:1939
AllocaInst * CreateAlloca(Type *Ty, unsigned AddrSpace, Value *ArraySize=nullptr, const Twine &Name="")
Definition IRBuilder.h:1833
IntegerType * getInt1Ty()
Fetch the type representing a single bit.
Definition IRBuilder.h:547
LLVM_ABI CallInst * CreateMaskedCompressStore(Value *Val, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr)
Create a call to Masked Compress Store intrinsic.
Value * CreateInsertValue(Value *Agg, Value *Val, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2633
Value * CreateExtractElement(Value *Vec, Value *Idx, const Twine &Name="")
Definition IRBuilder.h:2567
IntegerType * getIntNTy(unsigned N)
Fetch the type representing an N-bit integer.
Definition IRBuilder.h:575
LoadInst * CreateAlignedLoad(Type *Ty, Value *Ptr, MaybeAlign Align, const char *Name)
Definition IRBuilder.h:1867
Value * CreateZExtOrTrunc(Value *V, Type *DestTy, const Twine &Name="")
Create a ZExt or Trunc from the integer value V to DestTy.
Definition IRBuilder.h:2103
CallInst * CreateMemCpy(Value *Dst, MaybeAlign DstAlign, Value *Src, MaybeAlign SrcAlign, uint64_t Size, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memcpy between the specified pointers.
Definition IRBuilder.h:687
LLVM_ABI CallInst * CreateAndReduce(Value *Src)
Create a vector int AND reduction intrinsic of the source vector.
Value * CreatePointerCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2254
Value * CreateExtractValue(Value *Agg, ArrayRef< unsigned > Idxs, const Twine &Name="")
Definition IRBuilder.h:2626
LLVM_ABI CallInst * CreateMaskedLoad(Type *Ty, Value *Ptr, Align Alignment, Value *Mask, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Load intrinsic.
LLVM_ABI Value * CreateSelect(Value *C, Value *True, Value *False, const Twine &Name="", Instruction *MDFrom=nullptr)
BasicBlock::iterator GetInsertPoint() const
Definition IRBuilder.h:202
Value * CreateSExt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2097
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2202
Value * CreateLShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1513
IntegerType * getInt32Ty()
Fetch the type representing a 32-bit integer.
Definition IRBuilder.h:562
ConstantInt * getInt8(uint8_t C)
Get a constant 8-bit value.
Definition IRBuilder.h:512
Value * CreatePtrAdd(Value *Ptr, Value *Offset, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:2039
IntegerType * getInt64Ty()
Fetch the type representing a 64-bit integer.
Definition IRBuilder.h:567
Value * CreateUDiv(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1454
Value * CreateICmpNE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2336
Value * CreateGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="", GEPNoWrapFlags NW=GEPNoWrapFlags::none())
Definition IRBuilder.h:1926
Value * CreateNeg(Value *V, const Twine &Name="", bool HasNSW=false)
Definition IRBuilder.h:1784
LLVM_ABI CallInst * CreateOrReduce(Value *Src)
Create a vector int OR reduction intrinsic of the source vector.
LLVM_ABI Value * CreateBinaryIntrinsic(Intrinsic::ID ID, Value *LHS, Value *RHS, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with 2 operands which is mangled on the first type.
LLVM_ABI CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > Types, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with Args, mangled using Types.
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition IRBuilder.h:522
PHINode * CreatePHI(Type *Ty, unsigned NumReservedValues, const Twine &Name="")
Definition IRBuilder.h:2497
Value * CreateNot(Value *V, const Twine &Name="")
Definition IRBuilder.h:1808
Value * CreateICmpEQ(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2332
LLVM_ABI DebugLoc getCurrentDebugLocation() const
Get location information used by debugging information.
Definition IRBuilder.cpp:64
Value * CreateSub(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1420
Value * CreateBitCast(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2207
ConstantInt * getIntN(unsigned N, uint64_t C)
Get a constant N-bit value, zero extended or truncated from a 64-bit value.
Definition IRBuilder.h:533
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition IRBuilder.h:1850
Value * CreateShl(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1492
CallInst * CreateMemSet(Value *Ptr, Value *Val, uint64_t Size, MaybeAlign Align, bool isVolatile=false, const AAMDNodes &AAInfo=AAMDNodes())
Create and insert a memset to the specified pointer and the specified value.
Definition IRBuilder.h:630
Value * CreateZExt(Value *V, Type *DestTy, const Twine &Name="", bool IsNonNeg=false)
Definition IRBuilder.h:2085
Value * CreateShuffleVector(Value *V1, Value *V2, Value *Mask, const Twine &Name="")
Definition IRBuilder.h:2601
LLVMContext & getContext() const
Definition IRBuilder.h:203
Value * CreateAnd(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1551
StoreInst * CreateStore(Value *Val, Value *Ptr, bool isVolatile=false)
Definition IRBuilder.h:1863
LLVM_ABI CallInst * CreateMaskedStore(Value *Val, Value *Ptr, Align Alignment, Value *Mask)
Create a call to Masked Store intrinsic.
Value * CreateAdd(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1403
Value * CreatePtrToInt(Value *V, Type *DestTy, const Twine &Name="")
Definition IRBuilder.h:2197
Value * CreateIsNotNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg != 0.
Definition IRBuilder.h:2659
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:2511
Value * CreateTrunc(Value *V, Type *DestTy, const Twine &Name="", bool IsNUW=false, bool IsNSW=false)
Definition IRBuilder.h:2071
PointerType * getPtrTy(unsigned AddrSpace=0)
Fetch the type representing a pointer.
Definition IRBuilder.h:605
Value * CreateBinOp(Instruction::BinaryOps Opc, Value *LHS, Value *RHS, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition IRBuilder.h:1708
Value * CreateICmpSLT(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2364
LLVM_ABI Value * CreateTypeSize(Type *Ty, TypeSize Size)
Create an expression which evaluates to the number of units in Size at runtime.
Value * CreateICmpUGE(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2344
Value * CreateIntCast(Value *V, Type *DestTy, bool isSigned, const Twine &Name="")
Definition IRBuilder.h:2280
Value * CreateIsNull(Value *Arg, const Twine &Name="")
Return a boolean value testing if Arg == 0.
Definition IRBuilder.h:2654
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition IRBuilder.h:207
Type * getVoidTy()
Fetch the type representing void.
Definition IRBuilder.h:600
StoreInst * CreateAlignedStore(Value *Val, Value *Ptr, MaybeAlign Align, bool isVolatile=false)
Definition IRBuilder.h:1886
LLVM_ABI CallInst * CreateMaskedExpandLoad(Type *Ty, Value *Ptr, MaybeAlign Align, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Expand Load intrinsic.
Value * CreateInBoundsPtrAdd(Value *Ptr, Value *Offset, const Twine &Name="")
Definition IRBuilder.h:2044
Value * CreateAShr(Value *LHS, Value *RHS, const Twine &Name="", bool isExact=false)
Definition IRBuilder.h:1532
Value * CreateXor(Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:1599
Value * CreateICmp(CmpInst::Predicate P, Value *LHS, Value *RHS, const Twine &Name="")
Definition IRBuilder.h:2442
Value * CreateOr(Value *LHS, Value *RHS, const Twine &Name="", bool IsDisjoint=false)
Definition IRBuilder.h:1573
IntegerType * getInt8Ty()
Fetch the type representing an 8-bit integer.
Definition IRBuilder.h:552
Value * CreateMul(Value *LHS, Value *RHS, const Twine &Name="", bool HasNUW=false, bool HasNSW=false)
Definition IRBuilder.h:1437
LLVM_ABI CallInst * CreateMaskedScatter(Value *Val, Value *Ptrs, Align Alignment, Value *Mask=nullptr)
Create a call to Masked Scatter intrinsic.
LLVM_ABI CallInst * CreateMaskedGather(Type *Ty, Value *Ptrs, Align Alignment, Value *Mask=nullptr, Value *PassThru=nullptr, const Twine &Name="")
Create a call to Masked Gather intrinsic.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition IRBuilder.h:2788
std::vector< ConstraintInfo > ConstraintInfoVector
Definition InlineAsm.h:123
void visit(Iterator Start, Iterator End)
Definition InstVisitor.h:87
const DebugLoc & getDebugLoc() const
Return the debug location for this node as a DebugLoc.
LLVM_ABI InstListType::iterator eraseFromParent()
This method unlinks 'this' from the containing basic block and deletes it.
MDNode * getMetadata(unsigned KindID) const
Get the metadata of given kind attached to this Instruction.
LLVM_ABI bool comesBefore(const Instruction *Other) const
Given an instruction Other in the same basic block as this instruction, return true if this instructi...
static LLVM_ABI IntegerType * get(LLVMContext &C, unsigned NumBits)
This static method is the primary way of constructing an IntegerType.
Definition Type.cpp:318
LLVM_ABI MDNode * createUnlikelyBranchWeights()
Return metadata containing two branch weights, with significant bias towards false destination.
Definition MDBuilder.cpp:48
A Module instance is used to store all the information related to an LLVM module.
Definition Module.h:67
void addIncoming(Value *V, BasicBlock *BB)
Add an incoming value to the end of the PHI list.
static LLVM_ABI PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
Definition Analysis.h:112
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition Analysis.h:115
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition Analysis.h:118
PreservedAnalyses & abandon()
Mark an analysis as abandoned.
Definition Analysis.h:171
bool remove(const value_type &X)
Remove an item from the set vector.
Definition SetVector.h:181
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:151
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
static LLVM_ABI StructType * get(LLVMContext &Context, ArrayRef< Type * > Elements, bool isPacked=false)
This static method is the primary way to create a literal StructType.
Definition Type.cpp:413
unsigned getNumElements() const
Random access to the elements.
Type * getElementType(unsigned N) const
Analysis pass providing the TargetLibraryInfo.
Provides information about what library functions are available for the current target.
AttributeList getAttrList(LLVMContext *C, ArrayRef< unsigned > ArgNos, bool Signed, bool Ret=false, AttributeList AL=AttributeList()) const
bool getLibFunc(StringRef funcName, LibFunc &F) const
Searches for a particular function name.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
bool isMIPS64() const
Tests whether the target is MIPS 64-bit (little and big endian).
Definition Triple.h:1061
@ loongarch64
Definition Triple.h:65
bool isRISCV32() const
Tests whether the target is 32-bit RISC-V.
Definition Triple.h:1104
bool isPPC32() const
Tests whether the target is 32-bit PowerPC (little and big endian).
Definition Triple.h:1077
ArchType getArch() const
Get the parsed architecture type of this triple.
Definition Triple.h:414
bool isRISCV64() const
Tests whether the target is 64-bit RISC-V.
Definition Triple.h:1109
bool isLoongArch64() const
Tests whether the target is 64-bit LoongArch.
Definition Triple.h:1050
bool isMIPS32() const
Tests whether the target is MIPS 32-bit (little and big endian).
Definition Triple.h:1056
bool isARM() const
Tests whether the target is ARM (little and big endian).
Definition Triple.h:938
bool isPPC64() const
Tests whether the target is 64-bit PowerPC (little and big endian).
Definition Triple.h:1082
bool isAArch64() const
Tests whether the target is AArch64 (little and big endian).
Definition Triple.h:1029
bool isSystemZ() const
Tests whether the target is SystemZ.
Definition Triple.h:1128
The instances of the Type class are immutable: once they are created, they are never changed.
Definition Type.h:45
LLVM_ABI unsigned getIntegerBitWidth() const
bool isVectorTy() const
True if this is an instance of VectorType.
Definition Type.h:273
bool isArrayTy() const
True if this is an instance of ArrayType.
Definition Type.h:264
LLVM_ABI bool isScalableTy(SmallPtrSetImpl< const Type * > &Visited) const
Return true if this is a type whose size is a known multiple of vscale.
Definition Type.cpp:61
bool isIntOrIntVectorTy() const
Return true if this is an integer type or a vector of integer types.
Definition Type.h:246
bool isPointerTy() const
True if this is an instance of PointerType.
Definition Type.h:267
Type * getArrayElementType() const
Definition Type.h:408
bool isPPC_FP128Ty() const
Return true if this is powerpc long double.
Definition Type.h:165
static LLVM_ABI Type * getVoidTy(LLVMContext &C)
Definition Type.cpp:280
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Definition Type.h:352
LLVM_ABI TypeSize getPrimitiveSizeInBits() const LLVM_READONLY
Return the basic size of this type if it is a primitive type.
Definition Type.cpp:197
bool isSized(SmallPtrSetImpl< Type * > *Visited=nullptr) const
Return true if it makes sense to take the size of this type.
Definition Type.h:311
LLVM_ABI unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
Definition Type.cpp:230
bool isFloatingPointTy() const
Return true if this is one of the floating-point types.
Definition Type.h:184
bool isIntOrPtrTy() const
Return true if this is an integer type or a pointer type.
Definition Type.h:255
bool isIntegerTy() const
True if this is an instance of IntegerType.
Definition Type.h:240
bool isFPOrFPVectorTy() const
Return true if this is a FP type or a vector of FP.
Definition Type.h:225
bool isVoidTy() const
Return true if this is 'void'.
Definition Type.h:139
Value * getOperand(unsigned i) const
Definition User.h:233
unsigned getNumOperands() const
Definition User.h:255
size_type count(const KeyT &Val) const
Return 1 if the specified key is in the map, 0 otherwise.
Definition ValueMap.h:156
Type * getType() const
All values are typed, get the type of this value.
Definition Value.h:256
LLVM_ABI void setName(const Twine &Name)
Change the name of the value.
Definition Value.cpp:397
LLVM_ABI StringRef getName() const
Return a constant reference to the value's name.
Definition Value.cpp:322
ElementCount getElementCount() const
Return an ElementCount instance to represent the (possibly scalable) number of elements in the vector...
Type * getElementType() const
int getNumOccurrences() const
constexpr ScalarTy getFixedValue() const
Definition TypeSize.h:200
constexpr bool isScalable() const
Returns whether the quantity is scaled by a runtime quantity (vscale).
Definition TypeSize.h:168
An efficient, type-erasing, non-owning reference to a callable.
const ParentTy * getParent() const
Definition ilist_node.h:34
self_iterator getIterator()
Definition ilist_node.h:123
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
CallInst * Call
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ BasicBlock
Various leaf nodes.
Definition ISDOpcodes.h:81
initializer< Ty > init(const Ty &Val)
Function * Kernel
Summary of a kernel (=entry point for target offloading).
Definition OpenMPOpt.h:21
NodeAddr< FuncNode * > Func
Definition RDFGraph.h:393
friend class Instruction
Iterator for Instructions in a `BasicBlock.
Definition BasicBlock.h:73
This is an optimization pass for GlobalISel generic memory operations.
unsigned Log2_32_Ceil(uint32_t Value)
Return the ceil log base 2 of the specified value, 32 if the value is zero.
Definition MathExtras.h:344
@ Offset
Definition DWP.cpp:532
FunctionAddr VTableAddr Value
Definition InstrProf.h:137
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition STLExtras.h:1667
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
Definition STLExtras.h:2530
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ Done
Definition Threading.h:60
bool isAligned(Align Lhs, uint64_t SizeInBytes)
Checks that SizeInBytes is a multiple of the alignment.
Definition Alignment.h:134
LLVM_ABI std::pair< Instruction *, Value * > SplitBlockAndInsertSimpleForLoop(Value *End, BasicBlock::iterator SplitBefore)
Insert a for (int i = 0; i < End; i++) loop structure (with the exception that End is assumed > 0,...
InnerAnalysisManagerProxy< FunctionAnalysisManager, Module > FunctionAnalysisManagerModuleProxy
Provide the FunctionAnalysisManager to Module proxy.
constexpr bool isPowerOf2_64(uint64_t Value)
Return true if the argument is a power of two > 0 (64 bit edition.)
Definition MathExtras.h:284
unsigned Log2_64(uint64_t Value)
Return the floor log base 2 of the specified value, -1 if the value is zero.
Definition MathExtras.h:337
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
LLVM_ABI std::pair< Function *, FunctionCallee > getOrCreateSanitizerCtorAndInitFunctions(Module &M, StringRef CtorName, StringRef InitName, ArrayRef< Type * > InitArgTypes, ArrayRef< Value * > InitArgs, function_ref< void(Function *, FunctionCallee)> FunctionsCreatedCallback, StringRef VersionCheckName=StringRef(), bool Weak=false)
Creates sanitizer constructor function lazily.
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:167
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
bool isa(const From &Val)
isa<X> - Return true if the parameter to the template is an instance of one of the template type argu...
Definition Casting.h:547
LLVM_ABI bool isKnownNonZero(const Value *V, const SimplifyQuery &Q, unsigned Depth=0)
Return true if the given value is known to be non-zero when defined.
LLVM_ABI raw_fd_ostream & errs()
This returns a reference to a raw_ostream for standard error.
AtomicOrdering
Atomic ordering for LLVM's memory model.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:74
IRBuilder(LLVMContext &, FolderTy, InserterTy, MDNode *, ArrayRef< OperandBundleDef >) -> IRBuilder< FolderTy, InserterTy >
@ Or
Bitwise or logical OR of integers.
@ And
Bitwise or logical AND of integers.
@ Add
Sum of integers.
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
DWARFExpression::Operation Op
RoundingMode
Rounding mode.
ArrayRef(const T &OneElt) -> ArrayRef< T >
constexpr unsigned BitWidth
LLVM_ABI void appendToGlobalCtors(Module &M, Function *F, int Priority, Constant *Data=nullptr)
Append F to the list of global ctors of module M with the given Priority.
decltype(auto) cast(const From &Val)
cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:559
iterator_range< df_iterator< T > > depth_first(const T &G)
LLVM_ABI Instruction * SplitBlockAndInsertIfThen(Value *Cond, BasicBlock::iterator SplitBefore, bool Unreachable, MDNode *BranchWeights=nullptr, DomTreeUpdater *DTU=nullptr, LoopInfo *LI=nullptr, BasicBlock *ThenBlock=nullptr)
Split the containing block at the specified instruction - everything before SplitBefore stays in the ...
LLVM_ABI void maybeMarkSanitizerLibraryCallNoBuiltin(CallInst *CI, const TargetLibraryInfo *TLI)
Given a CallInst, check if it calls a string function known to CodeGen, and mark it with NoBuiltin if...
Definition Local.cpp:3865
LLVM_ABI bool removeUnreachableBlocks(Function &F, DomTreeUpdater *DTU=nullptr, MemorySSAUpdater *MSSAU=nullptr)
Remove all blocks that can not be reached from the function's entry.
Definition Local.cpp:2883
LLVM_ABI bool checkIfAlreadyInstrumented(Module &M, StringRef Flag)
Check if module has flag attached, if not add the flag.
std::string itostr(int64_t X)
AnalysisManager< Module > ModuleAnalysisManager
Convenience typedef for the Module analysis manager.
Definition MIRParser.h:39
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
LLVM_ABI void printPipeline(raw_ostream &OS, function_ref< StringRef(StringRef)> MapClassName2PassName)
LLVM_ABI PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
A CRTP mix-in to automatically provide informational APIs needed for passes.
Definition PassManager.h:69