LLVM 22.0.0git
AArch64FrameLowering.cpp
Go to the documentation of this file.
1//===- AArch64FrameLowering.cpp - AArch64 Frame Lowering -------*- C++ -*-====//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This file contains the AArch64 implementation of TargetFrameLowering class.
10//
11// On AArch64, stack frames are structured as follows:
12//
13// The stack grows downward.
14//
15// All of the individual frame areas on the frame below are optional, i.e. it's
16// possible to create a function so that the particular area isn't present
17// in the frame.
18//
19// At function entry, the "frame" looks as follows:
20//
21// | | Higher address
22// |-----------------------------------|
23// | |
24// | arguments passed on the stack |
25// | |
26// |-----------------------------------| <- sp
27// | | Lower address
28//
29//
30// After the prologue has run, the frame has the following general structure.
31// Note that this doesn't depict the case where a red-zone is used. Also,
32// technically the last frame area (VLAs) doesn't get created until in the
33// main function body, after the prologue is run. However, it's depicted here
34// for completeness.
35//
36// | | Higher address
37// |-----------------------------------|
38// | |
39// | arguments passed on the stack |
40// | |
41// |-----------------------------------|
42// | |
43// | (Win64 only) varargs from reg |
44// | |
45// |-----------------------------------|
46// | |
47// | (Win64 only) callee-saved SVE reg |
48// | |
49// |-----------------------------------|
50// | |
51// | callee-saved gpr registers | <--.
52// | | | On Darwin platforms these
53// |- - - - - - - - - - - - - - - - - -| | callee saves are swapped,
54// | prev_lr | | (frame record first)
55// | prev_fp | <--'
56// | async context if needed |
57// | (a.k.a. "frame record") |
58// |-----------------------------------| <- fp(=x29)
59// Default SVE stack layout Split SVE objects
60// (aarch64-split-sve-objects=false) (aarch64-split-sve-objects=true)
61// |-----------------------------------| |-----------------------------------|
62// | <hazard padding> | | callee-saved PPR registers |
63// |-----------------------------------| |-----------------------------------|
64// | | | PPR stack objects |
65// | callee-saved fp/simd/SVE regs | |-----------------------------------|
66// | | | <hazard padding> |
67// |-----------------------------------| |-----------------------------------|
68// | | | callee-saved ZPR/FPR registers |
69// | SVE stack objects | |-----------------------------------|
70// | | | ZPR stack objects |
71// |-----------------------------------| |-----------------------------------|
72// ^ NB: FPR CSRs are promoted to ZPRs
73// |-----------------------------------|
74// |.empty.space.to.make.part.below....|
75// |.aligned.in.case.it.needs.more.than| (size of this area is unknown at
76// |.the.standard.16-byte.alignment....| compile time; if present)
77// |-----------------------------------|
78// | local variables of fixed size |
79// | including spill slots |
80// | <FPR> |
81// | <hazard padding> |
82// | <GPR> |
83// |-----------------------------------| <- bp(not defined by ABI,
84// |.variable-sized.local.variables....| LLVM chooses X19)
85// |.(VLAs)............................| (size of this area is unknown at
86// |...................................| compile time)
87// |-----------------------------------| <- sp
88// | | Lower address
89//
90//
91// To access the data in a frame, at-compile time, a constant offset must be
92// computable from one of the pointers (fp, bp, sp) to access it. The size
93// of the areas with a dotted background cannot be computed at compile-time
94// if they are present, making it required to have all three of fp, bp and
95// sp to be set up to be able to access all contents in the frame areas,
96// assuming all of the frame areas are non-empty.
97//
98// For most functions, some of the frame areas are empty. For those functions,
99// it may not be necessary to set up fp or bp:
100// * A base pointer is definitely needed when there are both VLAs and local
101// variables with more-than-default alignment requirements.
102// * A frame pointer is definitely needed when there are local variables with
103// more-than-default alignment requirements.
104//
105// For Darwin platforms the frame-record (fp, lr) is stored at the top of the
106// callee-saved area, since the unwind encoding does not allow for encoding
107// this dynamically and existing tools depend on this layout. For other
108// platforms, the frame-record is stored at the bottom of the (gpr) callee-saved
109// area to allow SVE stack objects (allocated directly below the callee-saves,
110// if available) to be accessed directly from the framepointer.
111// The SVE spill/fill instructions have VL-scaled addressing modes such
112// as:
113// ldr z8, [fp, #-7 mul vl]
114// For SVE the size of the vector length (VL) is not known at compile-time, so
115// '#-7 mul vl' is an offset that can only be evaluated at runtime. With this
116// layout, we don't need to add an unscaled offset to the framepointer before
117// accessing the SVE object in the frame.
118//
119// In some cases when a base pointer is not strictly needed, it is generated
120// anyway when offsets from the frame pointer to access local variables become
121// so large that the offset can't be encoded in the immediate fields of loads
122// or stores.
123//
124// Outgoing function arguments must be at the bottom of the stack frame when
125// calling another function. If we do not have variable-sized stack objects, we
126// can allocate a "reserved call frame" area at the bottom of the local
127// variable area, large enough for all outgoing calls. If we do have VLAs, then
128// the stack pointer must be decremented and incremented around each call to
129// make space for the arguments below the VLAs.
130//
131// FIXME: also explain the redzone concept.
132//
133// About stack hazards: Under some SME contexts, a coprocessor with its own
134// separate cache can used for FP operations. This can create hazards if the CPU
135// and the SME unit try to access the same area of memory, including if the
136// access is to an area of the stack. To try to alleviate this we attempt to
137// introduce extra padding into the stack frame between FP and GPR accesses,
138// controlled by the aarch64-stack-hazard-size option. Without changing the
139// layout of the stack frame in the diagram above, a stack object of size
140// aarch64-stack-hazard-size is added between GPR and FPR CSRs. Another is added
141// to the stack objects section, and stack objects are sorted so that FPR >
142// Hazard padding slot > GPRs (where possible). Unfortunately some things are
143// not handled well (VLA area, arguments on the stack, objects with both GPR and
144// FPR accesses), but if those are controlled by the user then the entire stack
145// frame becomes GPR at the start/end with FPR in the middle, surrounded by
146// Hazard padding.
147//
148// An example of the prologue:
149//
150// .globl __foo
151// .align 2
152// __foo:
153// Ltmp0:
154// .cfi_startproc
155// .cfi_personality 155, ___gxx_personality_v0
156// Leh_func_begin:
157// .cfi_lsda 16, Lexception33
158//
159// stp xa,bx, [sp, -#offset]!
160// ...
161// stp x28, x27, [sp, #offset-32]
162// stp fp, lr, [sp, #offset-16]
163// add fp, sp, #offset - 16
164// sub sp, sp, #1360
165//
166// The Stack:
167// +-------------------------------------------+
168// 10000 | ........ | ........ | ........ | ........ |
169// 10004 | ........ | ........ | ........ | ........ |
170// +-------------------------------------------+
171// 10008 | ........ | ........ | ........ | ........ |
172// 1000c | ........ | ........ | ........ | ........ |
173// +===========================================+
174// 10010 | X28 Register |
175// 10014 | X28 Register |
176// +-------------------------------------------+
177// 10018 | X27 Register |
178// 1001c | X27 Register |
179// +===========================================+
180// 10020 | Frame Pointer |
181// 10024 | Frame Pointer |
182// +-------------------------------------------+
183// 10028 | Link Register |
184// 1002c | Link Register |
185// +===========================================+
186// 10030 | ........ | ........ | ........ | ........ |
187// 10034 | ........ | ........ | ........ | ........ |
188// +-------------------------------------------+
189// 10038 | ........ | ........ | ........ | ........ |
190// 1003c | ........ | ........ | ........ | ........ |
191// +-------------------------------------------+
192//
193// [sp] = 10030 :: >>initial value<<
194// sp = 10020 :: stp fp, lr, [sp, #-16]!
195// fp = sp == 10020 :: mov fp, sp
196// [sp] == 10020 :: stp x28, x27, [sp, #-16]!
197// sp == 10010 :: >>final value<<
198//
199// The frame pointer (w29) points to address 10020. If we use an offset of
200// '16' from 'w29', we get the CFI offsets of -8 for w30, -16 for w29, -24
201// for w27, and -32 for w28:
202//
203// Ltmp1:
204// .cfi_def_cfa w29, 16
205// Ltmp2:
206// .cfi_offset w30, -8
207// Ltmp3:
208// .cfi_offset w29, -16
209// Ltmp4:
210// .cfi_offset w27, -24
211// Ltmp5:
212// .cfi_offset w28, -32
213//
214//===----------------------------------------------------------------------===//
215
216#include "AArch64FrameLowering.h"
217#include "AArch64InstrInfo.h"
220#include "AArch64RegisterInfo.h"
221#include "AArch64Subtarget.h"
225#include "llvm/ADT/ScopeExit.h"
226#include "llvm/ADT/SmallVector.h"
244#include "llvm/IR/Attributes.h"
245#include "llvm/IR/CallingConv.h"
246#include "llvm/IR/DataLayout.h"
247#include "llvm/IR/DebugLoc.h"
248#include "llvm/IR/Function.h"
249#include "llvm/MC/MCAsmInfo.h"
250#include "llvm/MC/MCDwarf.h"
252#include "llvm/Support/Debug.h"
259#include <cassert>
260#include <cstdint>
261#include <iterator>
262#include <optional>
263#include <vector>
264
265using namespace llvm;
266
267#define DEBUG_TYPE "frame-info"
268
269static cl::opt<bool> EnableRedZone("aarch64-redzone",
270 cl::desc("enable use of redzone on AArch64"),
271 cl::init(false), cl::Hidden);
272
274 "stack-tagging-merge-settag",
275 cl::desc("merge settag instruction in function epilog"), cl::init(true),
276 cl::Hidden);
277
278static cl::opt<bool> OrderFrameObjects("aarch64-order-frame-objects",
279 cl::desc("sort stack allocations"),
280 cl::init(true), cl::Hidden);
281
282static cl::opt<bool>
283 SplitSVEObjects("aarch64-split-sve-objects",
284 cl::desc("Split allocation of ZPR & PPR objects"),
285 cl::init(true), cl::Hidden);
286
288 "homogeneous-prolog-epilog", cl::Hidden,
289 cl::desc("Emit homogeneous prologue and epilogue for the size "
290 "optimization (default = off)"));
291
292// Stack hazard size for analysis remarks. StackHazardSize takes precedence.
294 StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0),
295 cl::Hidden);
296// Whether to insert padding into non-streaming functions (for testing).
297static cl::opt<bool>
298 StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming",
299 cl::init(false), cl::Hidden);
300
302 "aarch64-disable-multivector-spill-fill",
303 cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false),
304 cl::Hidden);
305
306int64_t
307AArch64FrameLowering::getArgumentStackToRestore(MachineFunction &MF,
308 MachineBasicBlock &MBB) const {
309 MachineBasicBlock::iterator MBBI = MBB.getLastNonDebugInstr();
311 bool IsTailCallReturn = (MBB.end() != MBBI)
313 : false;
314
315 int64_t ArgumentPopSize = 0;
316 if (IsTailCallReturn) {
317 MachineOperand &StackAdjust = MBBI->getOperand(1);
318
319 // For a tail-call in a callee-pops-arguments environment, some or all of
320 // the stack may actually be in use for the call's arguments, this is
321 // calculated during LowerCall and consumed here...
322 ArgumentPopSize = StackAdjust.getImm();
323 } else {
324 // ... otherwise the amount to pop is *all* of the argument space,
325 // conveniently stored in the MachineFunctionInfo by
326 // LowerFormalArguments. This will, of course, be zero for the C calling
327 // convention.
328 ArgumentPopSize = AFI->getArgumentStackToRestore();
329 }
330
331 return ArgumentPopSize;
332}
333
335 MachineFunction &MF);
336
337enum class AssignObjectOffsets { No, Yes };
338/// Process all the SVE stack objects and the SVE stack size and offsets for
339/// each object. If AssignOffsets is "Yes", the offsets get assigned (and SVE
340/// stack sizes set). Returns the size of the SVE stack.
342 AssignObjectOffsets AssignOffsets);
343
344static unsigned getStackHazardSize(const MachineFunction &MF) {
345 return MF.getSubtarget<AArch64Subtarget>().getStreamingHazardSize();
346}
347
353
356 // With split SVE objects, the hazard padding is added to the PPR region,
357 // which places it between the [GPR, PPR] area and the [ZPR, FPR] area. This
358 // avoids hazards between both GPRs and FPRs and ZPRs and PPRs.
361 : 0,
362 AFI->getStackSizePPR());
363}
364
365// Conservatively, returns true if the function is likely to have SVE vectors
366// on the stack. This function is safe to be called before callee-saves or
367// object offsets have been determined.
369 const MachineFunction &MF) {
370 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
371 if (AFI->isSVECC())
372 return true;
373
374 if (AFI->hasCalculatedStackSizeSVE())
375 return bool(AFL.getSVEStackSize(MF));
376
377 const MachineFrameInfo &MFI = MF.getFrameInfo();
378 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd(); FI++) {
379 if (MFI.hasScalableStackID(FI))
380 return true;
381 }
382
383 return false;
384}
385
386/// Returns true if a homogeneous prolog or epilog code can be emitted
387/// for the size optimization. If possible, a frame helper call is injected.
388/// When Exit block is given, this check is for epilog.
389bool AArch64FrameLowering::homogeneousPrologEpilog(
390 MachineFunction &MF, MachineBasicBlock *Exit) const {
391 if (!MF.getFunction().hasMinSize())
392 return false;
394 return false;
395 if (EnableRedZone)
396 return false;
397
398 // TODO: Window is supported yet.
399 if (needsWinCFI(MF))
400 return false;
401
402 // TODO: SVE is not supported yet.
403 if (isLikelyToHaveSVEStack(*this, MF))
404 return false;
405
406 // Bail on stack adjustment needed on return for simplicity.
407 const MachineFrameInfo &MFI = MF.getFrameInfo();
408 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
409 if (MFI.hasVarSizedObjects() || RegInfo->hasStackRealignment(MF))
410 return false;
411 if (Exit && getArgumentStackToRestore(MF, *Exit))
412 return false;
413
414 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
416 return false;
417
418 // If there are an odd number of GPRs before LR and FP in the CSRs list,
419 // they will not be paired into one RegPairInfo, which is incompatible with
420 // the assumption made by the homogeneous prolog epilog pass.
421 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
422 unsigned NumGPRs = 0;
423 for (unsigned I = 0; CSRegs[I]; ++I) {
424 Register Reg = CSRegs[I];
425 if (Reg == AArch64::LR) {
426 assert(CSRegs[I + 1] == AArch64::FP);
427 if (NumGPRs % 2 != 0)
428 return false;
429 break;
430 }
431 if (AArch64::GPR64RegClass.contains(Reg))
432 ++NumGPRs;
433 }
434
435 return true;
436}
437
438/// Returns true if CSRs should be paired.
439bool AArch64FrameLowering::producePairRegisters(MachineFunction &MF) const {
440 return produceCompactUnwindFrame(*this, MF) || homogeneousPrologEpilog(MF);
441}
442
443/// This is the biggest offset to the stack pointer we can encode in aarch64
444/// instructions (without using a separate calculation and a temp register).
445/// Note that the exception here are vector stores/loads which cannot encode any
446/// displacements (see estimateRSStackSizeLimit(), isAArch64FrameOffsetLegal()).
447static const unsigned DefaultSafeSPDisplacement = 255;
448
449/// Look at each instruction that references stack frames and return the stack
450/// size limit beyond which some of these instructions will require a scratch
451/// register during their expansion later.
453 // FIXME: For now, just conservatively guesstimate based on unscaled indexing
454 // range. We'll end up allocating an unnecessary spill slot a lot, but
455 // realistically that's not a big deal at this stage of the game.
456 for (MachineBasicBlock &MBB : MF) {
457 for (MachineInstr &MI : MBB) {
458 if (MI.isDebugInstr() || MI.isPseudo() ||
459 MI.getOpcode() == AArch64::ADDXri ||
460 MI.getOpcode() == AArch64::ADDSXri)
461 continue;
462
463 for (const MachineOperand &MO : MI.operands()) {
464 if (!MO.isFI())
465 continue;
466
468 if (isAArch64FrameOffsetLegal(MI, Offset, nullptr, nullptr, nullptr) ==
470 return 0;
471 }
472 }
473 }
475}
476
481
482unsigned
483AArch64FrameLowering::getFixedObjectSize(const MachineFunction &MF,
484 const AArch64FunctionInfo *AFI,
485 bool IsWin64, bool IsFunclet) const {
486 assert(AFI->getTailCallReservedStack() % 16 == 0 &&
487 "Tail call reserved stack must be aligned to 16 bytes");
488 if (!IsWin64 || IsFunclet) {
489 return AFI->getTailCallReservedStack();
490 } else {
491 if (AFI->getTailCallReservedStack() != 0 &&
492 !MF.getFunction().getAttributes().hasAttrSomewhere(
493 Attribute::SwiftAsync))
494 report_fatal_error("cannot generate ABI-changing tail call for Win64");
495 unsigned FixedObjectSize = AFI->getTailCallReservedStack();
496
497 // Var args are stored here in the primary function.
498 FixedObjectSize += AFI->getVarArgsGPRSize();
499
500 if (MF.hasEHFunclets()) {
501 // Catch objects are stored here in the primary function.
502 const MachineFrameInfo &MFI = MF.getFrameInfo();
503 const WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
504 SmallSetVector<int, 8> CatchObjFrameIndices;
505 for (const WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
506 for (const WinEHHandlerType &H : TBME.HandlerArray) {
507 int FrameIndex = H.CatchObj.FrameIndex;
508 if ((FrameIndex != INT_MAX) &&
509 CatchObjFrameIndices.insert(FrameIndex)) {
510 FixedObjectSize = alignTo(FixedObjectSize,
511 MFI.getObjectAlign(FrameIndex).value()) +
512 MFI.getObjectSize(FrameIndex);
513 }
514 }
515 }
516 // To support EH funclets we allocate an UnwindHelp object
517 FixedObjectSize += 8;
518 }
519 return alignTo(FixedObjectSize, 16);
520 }
521}
522
524 if (!EnableRedZone)
525 return false;
526
527 // Don't use the red zone if the function explicitly asks us not to.
528 // This is typically used for kernel code.
529 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
530 const unsigned RedZoneSize =
532 if (!RedZoneSize)
533 return false;
534
535 const MachineFrameInfo &MFI = MF.getFrameInfo();
537 uint64_t NumBytes = AFI->getLocalStackSize();
538
539 // If neither NEON or SVE are available, a COPY from one Q-reg to
540 // another requires a spill -> reload sequence. We can do that
541 // using a pre-decrementing store/post-decrementing load, but
542 // if we do so, we can't use the Red Zone.
543 bool LowerQRegCopyThroughMem = Subtarget.hasFPARMv8() &&
544 !Subtarget.isNeonAvailable() &&
545 !Subtarget.hasSVE();
546
547 return !(MFI.hasCalls() || hasFP(MF) || NumBytes > RedZoneSize ||
548 AFI->hasSVEStackSize() || LowerQRegCopyThroughMem);
549}
550
551/// hasFPImpl - Return true if the specified function should have a dedicated
552/// frame pointer register.
554 const MachineFrameInfo &MFI = MF.getFrameInfo();
555 const TargetRegisterInfo *RegInfo = MF.getSubtarget().getRegisterInfo();
557
558 // Win64 EH requires a frame pointer if funclets are present, as the locals
559 // are accessed off the frame pointer in both the parent function and the
560 // funclets.
561 if (MF.hasEHFunclets())
562 return true;
563 // Retain behavior of always omitting the FP for leaf functions when possible.
565 return true;
566 if (MFI.hasVarSizedObjects() || MFI.isFrameAddressTaken() ||
567 MFI.hasStackMap() || MFI.hasPatchPoint() ||
568 RegInfo->hasStackRealignment(MF))
569 return true;
570
571 // If we:
572 //
573 // 1. Have streaming mode changes
574 // OR:
575 // 2. Have a streaming body with SVE stack objects
576 //
577 // Then the value of VG restored when unwinding to this function may not match
578 // the value of VG used to set up the stack.
579 //
580 // This is a problem as the CFA can be described with an expression of the
581 // form: CFA = SP + NumBytes + VG * NumScalableBytes.
582 //
583 // If the value of VG used in that expression does not match the value used to
584 // set up the stack, an incorrect address for the CFA will be computed, and
585 // unwinding will fail.
586 //
587 // We work around this issue by ensuring the frame-pointer can describe the
588 // CFA in either of these cases.
589 if (AFI.needsDwarfUnwindInfo(MF) &&
592 return true;
593 // With large callframes around we may need to use FP to access the scavenging
594 // emergency spillslot.
595 //
596 // Unfortunately some calls to hasFP() like machine verifier ->
597 // getReservedReg() -> hasFP in the middle of global isel are too early
598 // to know the max call frame size. Hopefully conservatively returning "true"
599 // in those cases is fine.
600 // DefaultSafeSPDisplacement is fine as we only emergency spill GP regs.
601 if (!MFI.isMaxCallFrameSizeComputed() ||
603 return true;
604
605 return false;
606}
607
608/// Should the Frame Pointer be reserved for the current function?
610 const TargetMachine &TM = MF.getTarget();
611 const Triple &TT = TM.getTargetTriple();
612
613 // These OSes require the frame chain is valid, even if the current frame does
614 // not use a frame pointer.
615 if (TT.isOSDarwin() || TT.isOSWindows())
616 return true;
617
618 // If the function has a frame pointer, it is reserved.
619 if (hasFP(MF))
620 return true;
621
622 // Frontend has requested to preserve the frame pointer.
623 if (TM.Options.FramePointerIsReserved(MF))
624 return true;
625
626 return false;
627}
628
629/// hasReservedCallFrame - Under normal circumstances, when a frame pointer is
630/// not required, we reserve argument space for call sites in the function
631/// immediately on entry to the current function. This eliminates the need for
632/// add/sub sp brackets around call sites. Returns true if the call frame is
633/// included as part of the stack frame.
635 const MachineFunction &MF) const {
636 // The stack probing code for the dynamically allocated outgoing arguments
637 // area assumes that the stack is probed at the top - either by the prologue
638 // code, which issues a probe if `hasVarSizedObjects` return true, or by the
639 // most recent variable-sized object allocation. Changing the condition here
640 // may need to be followed up by changes to the probe issuing logic.
641 return !MF.getFrameInfo().hasVarSizedObjects();
642}
643
647
648 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
649 const AArch64InstrInfo *TII = Subtarget.getInstrInfo();
650 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
651 [[maybe_unused]] MachineFrameInfo &MFI = MF.getFrameInfo();
652 DebugLoc DL = I->getDebugLoc();
653 unsigned Opc = I->getOpcode();
654 bool IsDestroy = Opc == TII->getCallFrameDestroyOpcode();
655 uint64_t CalleePopAmount = IsDestroy ? I->getOperand(1).getImm() : 0;
656
657 if (!hasReservedCallFrame(MF)) {
658 int64_t Amount = I->getOperand(0).getImm();
659 Amount = alignTo(Amount, getStackAlign());
660 if (!IsDestroy)
661 Amount = -Amount;
662
663 // N.b. if CalleePopAmount is valid but zero (i.e. callee would pop, but it
664 // doesn't have to pop anything), then the first operand will be zero too so
665 // this adjustment is a no-op.
666 if (CalleePopAmount == 0) {
667 // FIXME: in-function stack adjustment for calls is limited to 24-bits
668 // because there's no guaranteed temporary register available.
669 //
670 // ADD/SUB (immediate) has only LSL #0 and LSL #12 available.
671 // 1) For offset <= 12-bit, we use LSL #0
672 // 2) For 12-bit <= offset <= 24-bit, we use two instructions. One uses
673 // LSL #0, and the other uses LSL #12.
674 //
675 // Most call frames will be allocated at the start of a function so
676 // this is OK, but it is a limitation that needs dealing with.
677 assert(Amount > -0xffffff && Amount < 0xffffff && "call frame too large");
678
679 if (TLI->hasInlineStackProbe(MF) &&
681 // When stack probing is enabled, the decrement of SP may need to be
682 // probed. We only need to do this if the call site needs 1024 bytes of
683 // space or more, because a region smaller than that is allowed to be
684 // unprobed at an ABI boundary. We rely on the fact that SP has been
685 // probed exactly at this point, either by the prologue or most recent
686 // dynamic allocation.
688 "non-reserved call frame without var sized objects?");
689 Register ScratchReg =
690 MF.getRegInfo().createVirtualRegister(&AArch64::GPR64RegClass);
691 inlineStackProbeFixed(I, ScratchReg, -Amount, StackOffset::get(0, 0));
692 } else {
693 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
694 StackOffset::getFixed(Amount), TII);
695 }
696 }
697 } else if (CalleePopAmount != 0) {
698 // If the calling convention demands that the callee pops arguments from the
699 // stack, we want to add it back if we have a reserved call frame.
700 assert(CalleePopAmount < 0xffffff && "call frame too large");
701 emitFrameOffset(MBB, I, DL, AArch64::SP, AArch64::SP,
702 StackOffset::getFixed(-(int64_t)CalleePopAmount), TII);
703 }
704 return MBB.erase(I);
705}
706
708 MachineBasicBlock &MBB) const {
709
710 MachineFunction &MF = *MBB.getParent();
711 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
712 const auto &TRI = *Subtarget.getRegisterInfo();
713 const auto &MFI = *MF.getInfo<AArch64FunctionInfo>();
714
715 CFIInstBuilder CFIBuilder(MBB, MBB.begin(), MachineInstr::NoFlags);
716
717 // Reset the CFA to `SP + 0`.
718 CFIBuilder.buildDefCFA(AArch64::SP, 0);
719
720 // Flip the RA sign state.
721 if (MFI.shouldSignReturnAddress(MF))
722 MFI.branchProtectionPAuthLR() ? CFIBuilder.buildNegateRAStateWithPC()
723 : CFIBuilder.buildNegateRAState();
724
725 // Shadow call stack uses X18, reset it.
726 if (MFI.needsShadowCallStackPrologueEpilogue(MF))
727 CFIBuilder.buildSameValue(AArch64::X18);
728
729 // Emit .cfi_same_value for callee-saved registers.
730 const std::vector<CalleeSavedInfo> &CSI =
732 for (const auto &Info : CSI) {
733 MCRegister Reg = Info.getReg();
734 if (!TRI.regNeedsCFI(Reg, Reg))
735 continue;
736 CFIBuilder.buildSameValue(Reg);
737 }
738}
739
741 switch (Reg.id()) {
742 default:
743 // The called routine is expected to preserve r19-r28
744 // r29 and r30 are used as frame pointer and link register resp.
745 return 0;
746
747 // GPRs
748#define CASE(n) \
749 case AArch64::W##n: \
750 case AArch64::X##n: \
751 return AArch64::X##n
752 CASE(0);
753 CASE(1);
754 CASE(2);
755 CASE(3);
756 CASE(4);
757 CASE(5);
758 CASE(6);
759 CASE(7);
760 CASE(8);
761 CASE(9);
762 CASE(10);
763 CASE(11);
764 CASE(12);
765 CASE(13);
766 CASE(14);
767 CASE(15);
768 CASE(16);
769 CASE(17);
770 CASE(18);
771#undef CASE
772
773 // FPRs
774#define CASE(n) \
775 case AArch64::B##n: \
776 case AArch64::H##n: \
777 case AArch64::S##n: \
778 case AArch64::D##n: \
779 case AArch64::Q##n: \
780 return HasSVE ? AArch64::Z##n : AArch64::Q##n
781 CASE(0);
782 CASE(1);
783 CASE(2);
784 CASE(3);
785 CASE(4);
786 CASE(5);
787 CASE(6);
788 CASE(7);
789 CASE(8);
790 CASE(9);
791 CASE(10);
792 CASE(11);
793 CASE(12);
794 CASE(13);
795 CASE(14);
796 CASE(15);
797 CASE(16);
798 CASE(17);
799 CASE(18);
800 CASE(19);
801 CASE(20);
802 CASE(21);
803 CASE(22);
804 CASE(23);
805 CASE(24);
806 CASE(25);
807 CASE(26);
808 CASE(27);
809 CASE(28);
810 CASE(29);
811 CASE(30);
812 CASE(31);
813#undef CASE
814 }
815}
816
817void AArch64FrameLowering::emitZeroCallUsedRegs(BitVector RegsToZero,
818 MachineBasicBlock &MBB) const {
819 // Insertion point.
821
822 // Fake a debug loc.
823 DebugLoc DL;
824 if (MBBI != MBB.end())
825 DL = MBBI->getDebugLoc();
826
827 const MachineFunction &MF = *MBB.getParent();
828 const AArch64Subtarget &STI = MF.getSubtarget<AArch64Subtarget>();
829 const AArch64RegisterInfo &TRI = *STI.getRegisterInfo();
830
831 BitVector GPRsToZero(TRI.getNumRegs());
832 BitVector FPRsToZero(TRI.getNumRegs());
833 bool HasSVE = STI.isSVEorStreamingSVEAvailable();
834 for (MCRegister Reg : RegsToZero.set_bits()) {
835 if (TRI.isGeneralPurposeRegister(MF, Reg)) {
836 // For GPRs, we only care to clear out the 64-bit register.
837 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
838 GPRsToZero.set(XReg);
839 } else if (AArch64InstrInfo::isFpOrNEON(Reg)) {
840 // For FPRs,
841 if (MCRegister XReg = getRegisterOrZero(Reg, HasSVE))
842 FPRsToZero.set(XReg);
843 }
844 }
845
846 const AArch64InstrInfo &TII = *STI.getInstrInfo();
847
848 // Zero out GPRs.
849 for (MCRegister Reg : GPRsToZero.set_bits())
850 TII.buildClearRegister(Reg, MBB, MBBI, DL);
851
852 // Zero out FP/vector registers.
853 for (MCRegister Reg : FPRsToZero.set_bits())
854 TII.buildClearRegister(Reg, MBB, MBBI, DL);
855
856 if (HasSVE) {
857 for (MCRegister PReg :
858 {AArch64::P0, AArch64::P1, AArch64::P2, AArch64::P3, AArch64::P4,
859 AArch64::P5, AArch64::P6, AArch64::P7, AArch64::P8, AArch64::P9,
860 AArch64::P10, AArch64::P11, AArch64::P12, AArch64::P13, AArch64::P14,
861 AArch64::P15}) {
862 if (RegsToZero[PReg])
863 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PFALSE), PReg);
864 }
865 }
866}
867
868bool AArch64FrameLowering::windowsRequiresStackProbe(
869 const MachineFunction &MF, uint64_t StackSizeInBytes) const {
870 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
871 const AArch64FunctionInfo &MFI = *MF.getInfo<AArch64FunctionInfo>();
872 // TODO: When implementing stack protectors, take that into account
873 // for the probe threshold.
874 return Subtarget.isTargetWindows() && MFI.hasStackProbing() &&
875 StackSizeInBytes >= uint64_t(MFI.getStackProbeSize());
876}
877
879 const MachineBasicBlock &MBB) {
880 const MachineFunction *MF = MBB.getParent();
881 LiveRegs.addLiveIns(MBB);
882 // Mark callee saved registers as used so we will not choose them.
883 const MCPhysReg *CSRegs = MF->getRegInfo().getCalleeSavedRegs();
884 for (unsigned i = 0; CSRegs[i]; ++i)
885 LiveRegs.addReg(CSRegs[i]);
886}
887
889AArch64FrameLowering::findScratchNonCalleeSaveRegister(MachineBasicBlock *MBB,
890 bool HasCall) const {
891 MachineFunction *MF = MBB->getParent();
892
893 // If MBB is an entry block, use X9 as the scratch register
894 // preserve_none functions may be using X9 to pass arguments,
895 // so prefer to pick an available register below.
896 if (&MF->front() == MBB &&
898 return AArch64::X9;
899
900 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
901 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
902 LivePhysRegs LiveRegs(TRI);
903 getLiveRegsForEntryMBB(LiveRegs, *MBB);
904 if (HasCall) {
905 LiveRegs.addReg(AArch64::X16);
906 LiveRegs.addReg(AArch64::X17);
907 LiveRegs.addReg(AArch64::X18);
908 }
909
910 // Prefer X9 since it was historically used for the prologue scratch reg.
911 const MachineRegisterInfo &MRI = MF->getRegInfo();
912 if (LiveRegs.available(MRI, AArch64::X9))
913 return AArch64::X9;
914
915 for (unsigned Reg : AArch64::GPR64RegClass) {
916 if (LiveRegs.available(MRI, Reg))
917 return Reg;
918 }
919 return AArch64::NoRegister;
920}
921
923 const MachineBasicBlock &MBB) const {
924 const MachineFunction *MF = MBB.getParent();
925 MachineBasicBlock *TmpMBB = const_cast<MachineBasicBlock *>(&MBB);
926 const AArch64Subtarget &Subtarget = MF->getSubtarget<AArch64Subtarget>();
927 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
928 const AArch64TargetLowering *TLI = Subtarget.getTargetLowering();
930
931 if (AFI->hasSwiftAsyncContext()) {
932 const AArch64RegisterInfo &TRI = *Subtarget.getRegisterInfo();
933 const MachineRegisterInfo &MRI = MF->getRegInfo();
936 // The StoreSwiftAsyncContext clobbers X16 and X17. Make sure they are
937 // available.
938 if (!LiveRegs.available(MRI, AArch64::X16) ||
939 !LiveRegs.available(MRI, AArch64::X17))
940 return false;
941 }
942
943 // Certain stack probing sequences might clobber flags, then we can't use
944 // the block as a prologue if the flags register is a live-in.
946 MBB.isLiveIn(AArch64::NZCV))
947 return false;
948
949 if (RegInfo->hasStackRealignment(*MF) || TLI->hasInlineStackProbe(*MF))
950 if (findScratchNonCalleeSaveRegister(TmpMBB) == AArch64::NoRegister)
951 return false;
952
953 // May need a scratch register (for return value) if require making a special
954 // call
955 if (requiresSaveVG(*MF) ||
956 windowsRequiresStackProbe(*MF, std::numeric_limits<uint64_t>::max()))
957 if (findScratchNonCalleeSaveRegister(TmpMBB, true) == AArch64::NoRegister)
958 return false;
959
960 return true;
961}
962
964 const Function &F = MF.getFunction();
965 return MF.getTarget().getMCAsmInfo()->usesWindowsCFI() &&
966 F.needsUnwindTableEntry();
967}
968
969bool AArch64FrameLowering::shouldSignReturnAddressEverywhere(
970 const MachineFunction &MF) const {
971 // FIXME: With WinCFI, extra care should be taken to place SEH_PACSignLR
972 // and SEH_EpilogEnd instructions in the correct order.
974 return false;
976 bool SignReturnAddressAll = AFI->shouldSignReturnAddress(/*SpillsLR=*/false);
977 return SignReturnAddressAll;
978}
979
980// Given a load or a store instruction, generate an appropriate unwinding SEH
981// code on Windows.
983AArch64FrameLowering::insertSEH(MachineBasicBlock::iterator MBBI,
984 const TargetInstrInfo &TII,
985 MachineInstr::MIFlag Flag) const {
986 unsigned Opc = MBBI->getOpcode();
987 MachineBasicBlock *MBB = MBBI->getParent();
988 MachineFunction &MF = *MBB->getParent();
989 DebugLoc DL = MBBI->getDebugLoc();
990 unsigned ImmIdx = MBBI->getNumOperands() - 1;
991 int Imm = MBBI->getOperand(ImmIdx).getImm();
993 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
994 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
995
996 switch (Opc) {
997 default:
998 report_fatal_error("No SEH Opcode for this instruction");
999 case AArch64::STR_ZXI:
1000 case AArch64::LDR_ZXI: {
1001 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1002 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveZReg))
1003 .addImm(Reg0)
1004 .addImm(Imm)
1005 .setMIFlag(Flag);
1006 break;
1007 }
1008 case AArch64::STR_PXI:
1009 case AArch64::LDR_PXI: {
1010 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1011 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SavePReg))
1012 .addImm(Reg0)
1013 .addImm(Imm)
1014 .setMIFlag(Flag);
1015 break;
1016 }
1017 case AArch64::LDPDpost:
1018 Imm = -Imm;
1019 [[fallthrough]];
1020 case AArch64::STPDpre: {
1021 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1022 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1023 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP_X))
1024 .addImm(Reg0)
1025 .addImm(Reg1)
1026 .addImm(Imm * 8)
1027 .setMIFlag(Flag);
1028 break;
1029 }
1030 case AArch64::LDPXpost:
1031 Imm = -Imm;
1032 [[fallthrough]];
1033 case AArch64::STPXpre: {
1034 Register Reg0 = MBBI->getOperand(1).getReg();
1035 Register Reg1 = MBBI->getOperand(2).getReg();
1036 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1037 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR_X))
1038 .addImm(Imm * 8)
1039 .setMIFlag(Flag);
1040 else
1041 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP_X))
1042 .addImm(RegInfo->getSEHRegNum(Reg0))
1043 .addImm(RegInfo->getSEHRegNum(Reg1))
1044 .addImm(Imm * 8)
1045 .setMIFlag(Flag);
1046 break;
1047 }
1048 case AArch64::LDRDpost:
1049 Imm = -Imm;
1050 [[fallthrough]];
1051 case AArch64::STRDpre: {
1052 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1053 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg_X))
1054 .addImm(Reg)
1055 .addImm(Imm)
1056 .setMIFlag(Flag);
1057 break;
1058 }
1059 case AArch64::LDRXpost:
1060 Imm = -Imm;
1061 [[fallthrough]];
1062 case AArch64::STRXpre: {
1063 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1064 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg_X))
1065 .addImm(Reg)
1066 .addImm(Imm)
1067 .setMIFlag(Flag);
1068 break;
1069 }
1070 case AArch64::STPDi:
1071 case AArch64::LDPDi: {
1072 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1073 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1074 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFRegP))
1075 .addImm(Reg0)
1076 .addImm(Reg1)
1077 .addImm(Imm * 8)
1078 .setMIFlag(Flag);
1079 break;
1080 }
1081 case AArch64::STPXi:
1082 case AArch64::LDPXi: {
1083 Register Reg0 = MBBI->getOperand(0).getReg();
1084 Register Reg1 = MBBI->getOperand(1).getReg();
1085
1086 int SEHReg0 = RegInfo->getSEHRegNum(Reg0);
1087 int SEHReg1 = RegInfo->getSEHRegNum(Reg1);
1088
1089 if (Reg0 == AArch64::FP && Reg1 == AArch64::LR)
1090 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFPLR))
1091 .addImm(Imm * 8)
1092 .setMIFlag(Flag);
1093 else if (SEHReg0 >= 19 && SEHReg1 >= 19)
1094 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveRegP))
1095 .addImm(SEHReg0)
1096 .addImm(SEHReg1)
1097 .addImm(Imm * 8)
1098 .setMIFlag(Flag);
1099 else
1100 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegIP))
1101 .addImm(SEHReg0)
1102 .addImm(SEHReg1)
1103 .addImm(Imm * 8)
1104 .setMIFlag(Flag);
1105 break;
1106 }
1107 case AArch64::STRXui:
1108 case AArch64::LDRXui: {
1109 int Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1110 if (Reg >= 19)
1111 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveReg))
1112 .addImm(Reg)
1113 .addImm(Imm * 8)
1114 .setMIFlag(Flag);
1115 else
1116 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegI))
1117 .addImm(Reg)
1118 .addImm(Imm * 8)
1119 .setMIFlag(Flag);
1120 break;
1121 }
1122 case AArch64::STRDui:
1123 case AArch64::LDRDui: {
1124 unsigned Reg = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1125 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveFReg))
1126 .addImm(Reg)
1127 .addImm(Imm * 8)
1128 .setMIFlag(Flag);
1129 break;
1130 }
1131 case AArch64::STPQi:
1132 case AArch64::LDPQi: {
1133 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(0).getReg());
1134 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1135 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQP))
1136 .addImm(Reg0)
1137 .addImm(Reg1)
1138 .addImm(Imm * 16)
1139 .setMIFlag(Flag);
1140 break;
1141 }
1142 case AArch64::LDPQpost:
1143 Imm = -Imm;
1144 [[fallthrough]];
1145 case AArch64::STPQpre: {
1146 unsigned Reg0 = RegInfo->getSEHRegNum(MBBI->getOperand(1).getReg());
1147 unsigned Reg1 = RegInfo->getSEHRegNum(MBBI->getOperand(2).getReg());
1148 MIB = BuildMI(MF, DL, TII.get(AArch64::SEH_SaveAnyRegQPX))
1149 .addImm(Reg0)
1150 .addImm(Reg1)
1151 .addImm(Imm * 16)
1152 .setMIFlag(Flag);
1153 break;
1154 }
1155 }
1156 auto I = MBB->insertAfter(MBBI, MIB);
1157 return I;
1158}
1159
1162 if (!AFI->needsDwarfUnwindInfo(MF) || !AFI->hasStreamingModeChanges())
1163 return false;
1164 // For Darwin platforms we don't save VG for non-SVE functions, even if SME
1165 // is enabled with streaming mode changes.
1166 auto &ST = MF.getSubtarget<AArch64Subtarget>();
1167 if (ST.isTargetDarwin())
1168 return ST.hasSVE();
1169 return true;
1170}
1171
1172static bool isTargetWindows(const MachineFunction &MF) {
1174}
1175
1177 MachineFunction &MF) const {
1178 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1179 const TargetInstrInfo *TII = Subtarget.getInstrInfo();
1180
1181 auto EmitSignRA = [&](MachineBasicBlock &MBB) {
1182 DebugLoc DL; // Set debug location to unknown.
1184
1185 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_PROLOGUE))
1187 };
1188
1189 auto EmitAuthRA = [&](MachineBasicBlock &MBB) {
1190 DebugLoc DL;
1191 MachineBasicBlock::iterator MBBI = MBB.getFirstTerminator();
1192 if (MBBI != MBB.end())
1193 DL = MBBI->getDebugLoc();
1194
1195 BuildMI(MBB, MBBI, DL, TII->get(AArch64::PAUTH_EPILOGUE))
1197 };
1198
1199 // This should be in sync with PEIImpl::calculateSaveRestoreBlocks.
1200 EmitSignRA(MF.front());
1201 for (MachineBasicBlock &MBB : MF) {
1202 if (MBB.isEHFuncletEntry())
1203 EmitSignRA(MBB);
1204 if (MBB.isReturnBlock())
1205 EmitAuthRA(MBB);
1206 }
1207}
1208
1210 MachineBasicBlock &MBB) const {
1211 AArch64PrologueEmitter PrologueEmitter(MF, MBB, *this);
1212 PrologueEmitter.emitPrologue();
1213}
1214
1216 MachineBasicBlock &MBB) const {
1217 AArch64EpilogueEmitter EpilogueEmitter(MF, MBB, *this);
1218 EpilogueEmitter.emitEpilogue();
1219}
1220
1223 MF.getInfo<AArch64FunctionInfo>()->needsDwarfUnwindInfo(MF);
1224}
1225
1227 return enableCFIFixup(MF) &&
1228 MF.getInfo<AArch64FunctionInfo>()->needsAsyncDwarfUnwindInfo(MF);
1229}
1230
1231/// getFrameIndexReference - Provide a base+offset reference to an FI slot for
1232/// debug info. It's the same as what we use for resolving the code-gen
1233/// references for now. FIXME: This can go wrong when references are
1234/// SP-relative and simple call frames aren't used.
1237 Register &FrameReg) const {
1239 MF, FI, FrameReg,
1240 /*PreferFP=*/
1241 MF.getFunction().hasFnAttribute(Attribute::SanitizeHWAddress) ||
1242 MF.getFunction().hasFnAttribute(Attribute::SanitizeMemTag),
1243 /*ForSimm=*/false);
1244}
1245
1248 int FI) const {
1249 // This function serves to provide a comparable offset from a single reference
1250 // point (the value of SP at function entry) that can be used for analysis,
1251 // e.g. the stack-frame-layout analysis pass. It is not guaranteed to be
1252 // correct for all objects in the presence of VLA-area objects or dynamic
1253 // stack re-alignment.
1254
1255 const auto &MFI = MF.getFrameInfo();
1256
1257 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1258 StackOffset ZPRStackSize = getZPRStackSize(MF);
1259 StackOffset PPRStackSize = getPPRStackSize(MF);
1260 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1261
1262 // For VLA-area objects, just emit an offset at the end of the stack frame.
1263 // Whilst not quite correct, these objects do live at the end of the frame and
1264 // so it is more useful for analysis for the offset to reflect this.
1265 if (MFI.isVariableSizedObjectIndex(FI)) {
1266 return StackOffset::getFixed(-((int64_t)MFI.getStackSize())) - SVEStackSize;
1267 }
1268
1269 // This is correct in the absence of any SVE stack objects.
1270 if (!SVEStackSize)
1271 return StackOffset::getFixed(ObjectOffset - getOffsetOfLocalArea());
1272
1273 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1274 bool FPAfterSVECalleeSaves =
1276 if (MFI.hasScalableStackID(FI)) {
1277 if (FPAfterSVECalleeSaves &&
1278 -ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1279 assert(!AFI->hasSplitSVEObjects() &&
1280 "split-sve-objects not supported with FPAfterSVECalleeSaves");
1281 return StackOffset::getScalable(ObjectOffset);
1282 }
1283 StackOffset AccessOffset{};
1284 // The scalable vectors are below (lower address) the scalable predicates
1285 // with split SVE objects, so we must subtract the size of the predicates.
1286 if (AFI->hasSplitSVEObjects() &&
1287 MFI.getStackID(FI) == TargetStackID::ScalableVector)
1288 AccessOffset = -PPRStackSize;
1289 return AccessOffset +
1290 StackOffset::get(-((int64_t)AFI->getCalleeSavedStackSize()),
1291 ObjectOffset);
1292 }
1293
1294 bool IsFixed = MFI.isFixedObjectIndex(FI);
1295 bool IsCSR =
1296 !IsFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1297
1298 StackOffset ScalableOffset = {};
1299 if (!IsFixed && !IsCSR) {
1300 ScalableOffset = -SVEStackSize;
1301 } else if (FPAfterSVECalleeSaves && IsCSR) {
1302 ScalableOffset =
1304 }
1305
1306 return StackOffset::getFixed(ObjectOffset) + ScalableOffset;
1307}
1308
1314
1315StackOffset AArch64FrameLowering::getFPOffset(const MachineFunction &MF,
1316 int64_t ObjectOffset) const {
1317 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1318 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1319 const Function &F = MF.getFunction();
1320 bool IsWin64 = Subtarget.isCallingConvWin64(F.getCallingConv(), F.isVarArg());
1321 unsigned FixedObject =
1322 getFixedObjectSize(MF, AFI, IsWin64, /*IsFunclet=*/false);
1323 int64_t CalleeSaveSize = AFI->getCalleeSavedStackSize(MF.getFrameInfo());
1324 int64_t FPAdjust =
1325 CalleeSaveSize - AFI->getCalleeSaveBaseToFrameRecordOffset();
1326 return StackOffset::getFixed(ObjectOffset + FixedObject + FPAdjust);
1327}
1328
1329StackOffset AArch64FrameLowering::getStackOffset(const MachineFunction &MF,
1330 int64_t ObjectOffset) const {
1331 const auto &MFI = MF.getFrameInfo();
1332 return StackOffset::getFixed(ObjectOffset + (int64_t)MFI.getStackSize());
1333}
1334
1335// TODO: This function currently does not work for scalable vectors.
1337 int FI) const {
1338 const AArch64RegisterInfo *RegInfo =
1339 MF.getSubtarget<AArch64Subtarget>().getRegisterInfo();
1340 int ObjectOffset = MF.getFrameInfo().getObjectOffset(FI);
1341 return RegInfo->getLocalAddressRegister(MF) == AArch64::FP
1342 ? getFPOffset(MF, ObjectOffset).getFixed()
1343 : getStackOffset(MF, ObjectOffset).getFixed();
1344}
1345
1347 const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP,
1348 bool ForSimm) const {
1349 const auto &MFI = MF.getFrameInfo();
1350 int64_t ObjectOffset = MFI.getObjectOffset(FI);
1351 bool isFixed = MFI.isFixedObjectIndex(FI);
1352 auto StackID = static_cast<TargetStackID::Value>(MFI.getStackID(FI));
1353 return resolveFrameOffsetReference(MF, ObjectOffset, isFixed, StackID,
1354 FrameReg, PreferFP, ForSimm);
1355}
1356
1358 const MachineFunction &MF, int64_t ObjectOffset, bool isFixed,
1359 TargetStackID::Value StackID, Register &FrameReg, bool PreferFP,
1360 bool ForSimm) const {
1361 const auto &MFI = MF.getFrameInfo();
1362 const auto &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1363 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
1364 const auto *AFI = MF.getInfo<AArch64FunctionInfo>();
1365
1366 int64_t FPOffset = getFPOffset(MF, ObjectOffset).getFixed();
1367 int64_t Offset = getStackOffset(MF, ObjectOffset).getFixed();
1368 bool isCSR =
1369 !isFixed && ObjectOffset >= -((int)AFI->getCalleeSavedStackSize(MFI));
1370 bool isSVE = MFI.isScalableStackID(StackID);
1371
1372 StackOffset ZPRStackSize = getZPRStackSize(MF);
1373 StackOffset PPRStackSize = getPPRStackSize(MF);
1374 StackOffset SVEStackSize = ZPRStackSize + PPRStackSize;
1375
1376 // Use frame pointer to reference fixed objects. Use it for locals if
1377 // there are VLAs or a dynamically realigned SP (and thus the SP isn't
1378 // reliable as a base). Make sure useFPForScavengingIndex() does the
1379 // right thing for the emergency spill slot.
1380 bool UseFP = false;
1381 if (AFI->hasStackFrame() && !isSVE) {
1382 // We shouldn't prefer using the FP to access fixed-sized stack objects when
1383 // there are scalable (SVE) objects in between the FP and the fixed-sized
1384 // objects.
1385 PreferFP &= !SVEStackSize;
1386
1387 // Note: Keeping the following as multiple 'if' statements rather than
1388 // merging to a single expression for readability.
1389 //
1390 // Argument access should always use the FP.
1391 if (isFixed) {
1392 UseFP = hasFP(MF);
1393 } else if (isCSR && RegInfo->hasStackRealignment(MF)) {
1394 // References to the CSR area must use FP if we're re-aligning the stack
1395 // since the dynamically-sized alignment padding is between the SP/BP and
1396 // the CSR area.
1397 assert(hasFP(MF) && "Re-aligned stack must have frame pointer");
1398 UseFP = true;
1399 } else if (hasFP(MF) && !RegInfo->hasStackRealignment(MF)) {
1400 // If the FPOffset is negative and we're producing a signed immediate, we
1401 // have to keep in mind that the available offset range for negative
1402 // offsets is smaller than for positive ones. If an offset is available
1403 // via the FP and the SP, use whichever is closest.
1404 bool FPOffsetFits = !ForSimm || FPOffset >= -256;
1405 PreferFP |= Offset > -FPOffset && !SVEStackSize;
1406
1407 if (FPOffset >= 0) {
1408 // If the FPOffset is positive, that'll always be best, as the SP/BP
1409 // will be even further away.
1410 UseFP = true;
1411 } else if (MFI.hasVarSizedObjects()) {
1412 // If we have variable sized objects, we can use either FP or BP, as the
1413 // SP offset is unknown. We can use the base pointer if we have one and
1414 // FP is not preferred. If not, we're stuck with using FP.
1415 bool CanUseBP = RegInfo->hasBasePointer(MF);
1416 if (FPOffsetFits && CanUseBP) // Both are ok. Pick the best.
1417 UseFP = PreferFP;
1418 else if (!CanUseBP) // Can't use BP. Forced to use FP.
1419 UseFP = true;
1420 // else we can use BP and FP, but the offset from FP won't fit.
1421 // That will make us scavenge registers which we can probably avoid by
1422 // using BP. If it won't fit for BP either, we'll scavenge anyway.
1423 } else if (MF.hasEHFunclets() && !RegInfo->hasBasePointer(MF)) {
1424 // Funclets access the locals contained in the parent's stack frame
1425 // via the frame pointer, so we have to use the FP in the parent
1426 // function.
1427 (void) Subtarget;
1428 assert(Subtarget.isCallingConvWin64(MF.getFunction().getCallingConv(),
1429 MF.getFunction().isVarArg()) &&
1430 "Funclets should only be present on Win64");
1431 UseFP = true;
1432 } else {
1433 // We have the choice between FP and (SP or BP).
1434 if (FPOffsetFits && PreferFP) // If FP is the best fit, use it.
1435 UseFP = true;
1436 }
1437 }
1438 }
1439
1440 assert(
1441 ((isFixed || isCSR) || !RegInfo->hasStackRealignment(MF) || !UseFP) &&
1442 "In the presence of dynamic stack pointer realignment, "
1443 "non-argument/CSR objects cannot be accessed through the frame pointer");
1444
1445 bool FPAfterSVECalleeSaves =
1447
1448 if (isSVE) {
1449 StackOffset FPOffset = StackOffset::get(
1450 -AFI->getCalleeSaveBaseToFrameRecordOffset(), ObjectOffset);
1451 StackOffset SPOffset =
1452 SVEStackSize +
1453 StackOffset::get(MFI.getStackSize() - AFI->getCalleeSavedStackSize(),
1454 ObjectOffset);
1455
1456 // With split SVE objects the ObjectOffset is relative to the split area
1457 // (i.e. the PPR area or ZPR area respectively).
1458 if (AFI->hasSplitSVEObjects() && StackID == TargetStackID::ScalableVector) {
1459 // If we're accessing an SVE vector with split SVE objects...
1460 // - From the FP we need to move down past the PPR area:
1461 FPOffset -= PPRStackSize;
1462 // - From the SP we only need to move up to the ZPR area:
1463 SPOffset -= PPRStackSize;
1464 // Note: `SPOffset = SVEStackSize + ...`, so `-= PPRStackSize` results in
1465 // `SPOffset = ZPRStackSize + ...`.
1466 }
1467
1468 if (FPAfterSVECalleeSaves) {
1470 if (-ObjectOffset <= (int64_t)AFI->getSVECalleeSavedStackSize()) {
1473 }
1474 }
1475
1476 // Always use the FP for SVE spills if available and beneficial.
1477 if (hasFP(MF) && (SPOffset.getFixed() ||
1478 FPOffset.getScalable() < SPOffset.getScalable() ||
1479 RegInfo->hasStackRealignment(MF))) {
1480 FrameReg = RegInfo->getFrameRegister(MF);
1481 return FPOffset;
1482 }
1483 FrameReg = RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister()
1484 : MCRegister(AArch64::SP);
1485
1486 return SPOffset;
1487 }
1488
1489 StackOffset SVEAreaOffset = {};
1490 if (FPAfterSVECalleeSaves) {
1491 // In this stack layout, the FP is in between the callee saves and other
1492 // SVE allocations.
1493 StackOffset SVECalleeSavedStack =
1495 if (UseFP) {
1496 if (isFixed)
1497 SVEAreaOffset = SVECalleeSavedStack;
1498 else if (!isCSR)
1499 SVEAreaOffset = SVECalleeSavedStack - SVEStackSize;
1500 } else {
1501 if (isFixed)
1502 SVEAreaOffset = SVEStackSize;
1503 else if (isCSR)
1504 SVEAreaOffset = SVEStackSize - SVECalleeSavedStack;
1505 }
1506 } else {
1507 if (UseFP && !(isFixed || isCSR))
1508 SVEAreaOffset = -SVEStackSize;
1509 if (!UseFP && (isFixed || isCSR))
1510 SVEAreaOffset = SVEStackSize;
1511 }
1512
1513 if (UseFP) {
1514 FrameReg = RegInfo->getFrameRegister(MF);
1515 return StackOffset::getFixed(FPOffset) + SVEAreaOffset;
1516 }
1517
1518 // Use the base pointer if we have one.
1519 if (RegInfo->hasBasePointer(MF))
1520 FrameReg = RegInfo->getBaseRegister();
1521 else {
1522 assert(!MFI.hasVarSizedObjects() &&
1523 "Can't use SP when we have var sized objects.");
1524 FrameReg = AArch64::SP;
1525 // If we're using the red zone for this function, the SP won't actually
1526 // be adjusted, so the offsets will be negative. They're also all
1527 // within range of the signed 9-bit immediate instructions.
1528 if (canUseRedZone(MF))
1529 Offset -= AFI->getLocalStackSize();
1530 }
1531
1532 return StackOffset::getFixed(Offset) + SVEAreaOffset;
1533}
1534
1535static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg) {
1536 // Do not set a kill flag on values that are also marked as live-in. This
1537 // happens with the @llvm-returnaddress intrinsic and with arguments passed in
1538 // callee saved registers.
1539 // Omitting the kill flags is conservatively correct even if the live-in
1540 // is not used after all.
1541 bool IsLiveIn = MF.getRegInfo().isLiveIn(Reg);
1542 return getKillRegState(!IsLiveIn);
1543}
1544
1546 MachineFunction &MF) {
1547 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
1548 AttributeList Attrs = MF.getFunction().getAttributes();
1550 return Subtarget.isTargetMachO() &&
1551 !(Subtarget.getTargetLowering()->supportSwiftError() &&
1552 Attrs.hasAttrSomewhere(Attribute::SwiftError)) &&
1554 !AFL.requiresSaveVG(MF) && !AFI->isSVECC();
1555}
1556
1557static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2,
1558 bool NeedsWinCFI, bool IsFirst,
1559 const TargetRegisterInfo *TRI) {
1560 // If we are generating register pairs for a Windows function that requires
1561 // EH support, then pair consecutive registers only. There are no unwind
1562 // opcodes for saves/restores of non-consecutive register pairs.
1563 // The unwind opcodes are save_regp, save_regp_x, save_fregp, save_frepg_x,
1564 // save_lrpair.
1565 // https://docs.microsoft.com/en-us/cpp/build/arm64-exception-handling
1566
1567 if (Reg2 == AArch64::FP)
1568 return true;
1569 if (!NeedsWinCFI)
1570 return false;
1571 if (TRI->getEncodingValue(Reg2) == TRI->getEncodingValue(Reg1) + 1)
1572 return false;
1573 // If pairing a GPR with LR, the pair can be described by the save_lrpair
1574 // opcode. If this is the first register pair, it would end up with a
1575 // predecrement, but there's no save_lrpair_x opcode, so we can only do this
1576 // if LR is paired with something else than the first register.
1577 // The save_lrpair opcode requires the first register to be an odd one.
1578 if (Reg1 >= AArch64::X19 && Reg1 <= AArch64::X27 &&
1579 (Reg1 - AArch64::X19) % 2 == 0 && Reg2 == AArch64::LR && !IsFirst)
1580 return false;
1581 return true;
1582}
1583
1584/// Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
1585/// WindowsCFI requires that only consecutive registers can be paired.
1586/// LR and FP need to be allocated together when the frame needs to save
1587/// the frame-record. This means any other register pairing with LR is invalid.
1588static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2,
1589 bool UsesWinAAPCS, bool NeedsWinCFI,
1590 bool NeedsFrameRecord, bool IsFirst,
1591 const TargetRegisterInfo *TRI) {
1592 if (UsesWinAAPCS)
1593 return invalidateWindowsRegisterPairing(Reg1, Reg2, NeedsWinCFI, IsFirst,
1594 TRI);
1595
1596 // If we need to store the frame record, don't pair any register
1597 // with LR other than FP.
1598 if (NeedsFrameRecord)
1599 return Reg2 == AArch64::LR;
1600
1601 return false;
1602}
1603
1604namespace {
1605
1606struct RegPairInfo {
1607 Register Reg1;
1608 Register Reg2;
1609 int FrameIdx;
1610 int Offset;
1611 enum RegType { GPR, FPR64, FPR128, PPR, ZPR, VG } Type;
1612 const TargetRegisterClass *RC;
1613
1614 RegPairInfo() = default;
1615
1616 bool isPaired() const { return Reg2.isValid(); }
1617
1618 bool isScalable() const { return Type == PPR || Type == ZPR; }
1619};
1620
1621} // end anonymous namespace
1622
1624 for (unsigned PReg = AArch64::P8; PReg <= AArch64::P15; ++PReg) {
1625 if (SavedRegs.test(PReg)) {
1626 unsigned PNReg = PReg - AArch64::P0 + AArch64::PN0;
1627 return MCRegister(PNReg);
1628 }
1629 }
1630 return MCRegister();
1631}
1632
1633// The multivector LD/ST are available only for SME or SVE2p1 targets
1635 MachineFunction &MF) {
1637 return false;
1638
1639 SMEAttrs FuncAttrs = MF.getInfo<AArch64FunctionInfo>()->getSMEFnAttrs();
1640 bool IsLocallyStreaming =
1641 FuncAttrs.hasStreamingBody() && !FuncAttrs.hasStreamingInterface();
1642
1643 // Only when in streaming mode SME2 instructions can be safely used.
1644 // It is not safe to use SME2 instructions when in streaming compatible or
1645 // locally streaming mode.
1646 return Subtarget.hasSVE2p1() ||
1647 (Subtarget.hasSME2() &&
1648 (!IsLocallyStreaming && Subtarget.isStreaming()));
1649}
1650
1652 MachineFunction &MF,
1654 const TargetRegisterInfo *TRI,
1656 bool NeedsFrameRecord) {
1657
1658 if (CSI.empty())
1659 return;
1660
1661 bool IsWindows = isTargetWindows(MF);
1662 bool NeedsWinCFI = AFL.needsWinCFI(MF);
1664 unsigned StackHazardSize = getStackHazardSize(MF);
1665 MachineFrameInfo &MFI = MF.getFrameInfo();
1667 unsigned Count = CSI.size();
1668 (void)CC;
1669 // MachO's compact unwind format relies on all registers being stored in
1670 // pairs.
1671 assert((!produceCompactUnwindFrame(AFL, MF) ||
1674 (Count & 1) == 0) &&
1675 "Odd number of callee-saved regs to spill!");
1676 int ByteOffset = AFI->getCalleeSavedStackSize();
1677 int StackFillDir = -1;
1678 int RegInc = 1;
1679 unsigned FirstReg = 0;
1680 if (NeedsWinCFI) {
1681 // For WinCFI, fill the stack from the bottom up.
1682 ByteOffset = 0;
1683 StackFillDir = 1;
1684 // As the CSI array is reversed to match PrologEpilogInserter, iterate
1685 // backwards, to pair up registers starting from lower numbered registers.
1686 RegInc = -1;
1687 FirstReg = Count - 1;
1688 }
1689
1690 bool FPAfterSVECalleeSaves = IsWindows && AFI->getSVECalleeSavedStackSize();
1691
1692 int ZPRByteOffset = 0;
1693 int PPRByteOffset = 0;
1694 bool SplitPPRs = AFI->hasSplitSVEObjects();
1695 if (SplitPPRs) {
1696 ZPRByteOffset = AFI->getZPRCalleeSavedStackSize();
1697 PPRByteOffset = AFI->getPPRCalleeSavedStackSize();
1698 } else if (!FPAfterSVECalleeSaves) {
1699 ZPRByteOffset =
1701 // Unused: Everything goes in ZPR space.
1702 PPRByteOffset = 0;
1703 }
1704
1705 bool NeedGapToAlignStack = AFI->hasCalleeSaveStackFreeSpace();
1706 Register LastReg = 0;
1707 bool HasCSHazardPadding = AFI->hasStackHazardSlotIndex() && !SplitPPRs;
1708
1709 // When iterating backwards, the loop condition relies on unsigned wraparound.
1710 for (unsigned i = FirstReg; i < Count; i += RegInc) {
1711 RegPairInfo RPI;
1712 RPI.Reg1 = CSI[i].getReg();
1713
1714 if (AArch64::GPR64RegClass.contains(RPI.Reg1)) {
1715 RPI.Type = RegPairInfo::GPR;
1716 RPI.RC = &AArch64::GPR64RegClass;
1717 } else if (AArch64::FPR64RegClass.contains(RPI.Reg1)) {
1718 RPI.Type = RegPairInfo::FPR64;
1719 RPI.RC = &AArch64::FPR64RegClass;
1720 } else if (AArch64::FPR128RegClass.contains(RPI.Reg1)) {
1721 RPI.Type = RegPairInfo::FPR128;
1722 RPI.RC = &AArch64::FPR128RegClass;
1723 } else if (AArch64::ZPRRegClass.contains(RPI.Reg1)) {
1724 RPI.Type = RegPairInfo::ZPR;
1725 RPI.RC = &AArch64::ZPRRegClass;
1726 } else if (AArch64::PPRRegClass.contains(RPI.Reg1)) {
1727 RPI.Type = RegPairInfo::PPR;
1728 RPI.RC = &AArch64::PPRRegClass;
1729 } else if (RPI.Reg1 == AArch64::VG) {
1730 RPI.Type = RegPairInfo::VG;
1731 RPI.RC = &AArch64::FIXED_REGSRegClass;
1732 } else {
1733 llvm_unreachable("Unsupported register class.");
1734 }
1735
1736 int &ScalableByteOffset = RPI.Type == RegPairInfo::PPR && SplitPPRs
1737 ? PPRByteOffset
1738 : ZPRByteOffset;
1739
1740 // Add the stack hazard size as we transition from GPR->FPR CSRs.
1741 if (HasCSHazardPadding &&
1742 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
1744 ByteOffset += StackFillDir * StackHazardSize;
1745 LastReg = RPI.Reg1;
1746
1747 int Scale = TRI->getSpillSize(*RPI.RC);
1748 // Add the next reg to the pair if it is in the same register class.
1749 if (unsigned(i + RegInc) < Count && !HasCSHazardPadding) {
1750 MCRegister NextReg = CSI[i + RegInc].getReg();
1751 bool IsFirst = i == FirstReg;
1752 switch (RPI.Type) {
1753 case RegPairInfo::GPR:
1754 if (AArch64::GPR64RegClass.contains(NextReg) &&
1755 !invalidateRegisterPairing(RPI.Reg1, NextReg, IsWindows,
1756 NeedsWinCFI, NeedsFrameRecord, IsFirst,
1757 TRI))
1758 RPI.Reg2 = NextReg;
1759 break;
1760 case RegPairInfo::FPR64:
1761 if (AArch64::FPR64RegClass.contains(NextReg) &&
1762 !invalidateWindowsRegisterPairing(RPI.Reg1, NextReg, NeedsWinCFI,
1763 IsFirst, TRI))
1764 RPI.Reg2 = NextReg;
1765 break;
1766 case RegPairInfo::FPR128:
1767 if (AArch64::FPR128RegClass.contains(NextReg))
1768 RPI.Reg2 = NextReg;
1769 break;
1770 case RegPairInfo::PPR:
1771 break;
1772 case RegPairInfo::ZPR:
1773 if (AFI->getPredicateRegForFillSpill() != 0 &&
1774 ((RPI.Reg1 - AArch64::Z0) & 1) == 0 && (NextReg == RPI.Reg1 + 1)) {
1775 // Calculate offset of register pair to see if pair instruction can be
1776 // used.
1777 int Offset = (ScalableByteOffset + StackFillDir * 2 * Scale) / Scale;
1778 if ((-16 <= Offset && Offset <= 14) && (Offset % 2 == 0))
1779 RPI.Reg2 = NextReg;
1780 }
1781 break;
1782 case RegPairInfo::VG:
1783 break;
1784 }
1785 }
1786
1787 // GPRs and FPRs are saved in pairs of 64-bit regs. We expect the CSI
1788 // list to come in sorted by frame index so that we can issue the store
1789 // pair instructions directly. Assert if we see anything otherwise.
1790 //
1791 // The order of the registers in the list is controlled by
1792 // getCalleeSavedRegs(), so they will always be in-order, as well.
1793 assert((!RPI.isPaired() ||
1794 (CSI[i].getFrameIdx() + RegInc == CSI[i + RegInc].getFrameIdx())) &&
1795 "Out of order callee saved regs!");
1796
1797 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg2 != AArch64::FP ||
1798 RPI.Reg1 == AArch64::LR) &&
1799 "FrameRecord must be allocated together with LR");
1800
1801 // Windows AAPCS has FP and LR reversed.
1802 assert((!RPI.isPaired() || !NeedsFrameRecord || RPI.Reg1 != AArch64::FP ||
1803 RPI.Reg2 == AArch64::LR) &&
1804 "FrameRecord must be allocated together with LR");
1805
1806 // MachO's compact unwind format relies on all registers being stored in
1807 // adjacent register pairs.
1808 assert((!produceCompactUnwindFrame(AFL, MF) ||
1811 (RPI.isPaired() &&
1812 ((RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP) ||
1813 RPI.Reg1 + 1 == RPI.Reg2))) &&
1814 "Callee-save registers not saved as adjacent register pair!");
1815
1816 RPI.FrameIdx = CSI[i].getFrameIdx();
1817 if (NeedsWinCFI &&
1818 RPI.isPaired()) // RPI.FrameIdx must be the lower index of the pair
1819 RPI.FrameIdx = CSI[i + RegInc].getFrameIdx();
1820
1821 // Realign the scalable offset if necessary. This is relevant when
1822 // spilling predicates on Windows.
1823 if (RPI.isScalable() && ScalableByteOffset % Scale != 0) {
1824 ScalableByteOffset = alignTo(ScalableByteOffset, Scale);
1825 }
1826
1827 int OffsetPre = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1828 assert(OffsetPre % Scale == 0);
1829
1830 if (RPI.isScalable())
1831 ScalableByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1832 else
1833 ByteOffset += StackFillDir * (RPI.isPaired() ? 2 * Scale : Scale);
1834
1835 // Swift's async context is directly before FP, so allocate an extra
1836 // 8 bytes for it.
1837 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1838 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1839 (IsWindows && RPI.Reg2 == AArch64::LR)))
1840 ByteOffset += StackFillDir * 8;
1841
1842 // Round up size of non-pair to pair size if we need to pad the
1843 // callee-save area to ensure 16-byte alignment.
1844 if (NeedGapToAlignStack && !NeedsWinCFI && !RPI.isScalable() &&
1845 RPI.Type != RegPairInfo::FPR128 && !RPI.isPaired() &&
1846 ByteOffset % 16 != 0) {
1847 ByteOffset += 8 * StackFillDir;
1848 assert(MFI.getObjectAlign(RPI.FrameIdx) <= Align(16));
1849 // A stack frame with a gap looks like this, bottom up:
1850 // d9, d8. x21, gap, x20, x19.
1851 // Set extra alignment on the x21 object to create the gap above it.
1852 MFI.setObjectAlignment(RPI.FrameIdx, Align(16));
1853 NeedGapToAlignStack = false;
1854 }
1855
1856 int OffsetPost = RPI.isScalable() ? ScalableByteOffset : ByteOffset;
1857 assert(OffsetPost % Scale == 0);
1858 // If filling top down (default), we want the offset after incrementing it.
1859 // If filling bottom up (WinCFI) we need the original offset.
1860 int Offset = NeedsWinCFI ? OffsetPre : OffsetPost;
1861
1862 // The FP, LR pair goes 8 bytes into our expanded 24-byte slot so that the
1863 // Swift context can directly precede FP.
1864 if (NeedsFrameRecord && AFI->hasSwiftAsyncContext() &&
1865 ((!IsWindows && RPI.Reg2 == AArch64::FP) ||
1866 (IsWindows && RPI.Reg2 == AArch64::LR)))
1867 Offset += 8;
1868 RPI.Offset = Offset / Scale;
1869
1870 assert((!RPI.isPaired() ||
1871 (!RPI.isScalable() && RPI.Offset >= -64 && RPI.Offset <= 63) ||
1872 (RPI.isScalable() && RPI.Offset >= -256 && RPI.Offset <= 255)) &&
1873 "Offset out of bounds for LDP/STP immediate");
1874
1875 auto isFrameRecord = [&] {
1876 if (RPI.isPaired())
1877 return IsWindows ? RPI.Reg1 == AArch64::FP && RPI.Reg2 == AArch64::LR
1878 : RPI.Reg1 == AArch64::LR && RPI.Reg2 == AArch64::FP;
1879 // Otherwise, look for the frame record as two unpaired registers. This is
1880 // needed for -aarch64-stack-hazard-size=<val>, which disables register
1881 // pairing (as the padding may be too large for the LDP/STP offset). Note:
1882 // On Windows, this check works out as current reg == FP, next reg == LR,
1883 // and on other platforms current reg == FP, previous reg == LR. This
1884 // works out as the correct pre-increment or post-increment offsets
1885 // respectively.
1886 return i > 0 && RPI.Reg1 == AArch64::FP &&
1887 CSI[i - 1].getReg() == AArch64::LR;
1888 };
1889
1890 // Save the offset to frame record so that the FP register can point to the
1891 // innermost frame record (spilled FP and LR registers).
1892 if (NeedsFrameRecord && isFrameRecord())
1894
1895 RegPairs.push_back(RPI);
1896 if (RPI.isPaired())
1897 i += RegInc;
1898 }
1899 if (NeedsWinCFI) {
1900 // If we need an alignment gap in the stack, align the topmost stack
1901 // object. A stack frame with a gap looks like this, bottom up:
1902 // x19, d8. d9, gap.
1903 // Set extra alignment on the topmost stack object (the first element in
1904 // CSI, which goes top down), to create the gap above it.
1905 if (AFI->hasCalleeSaveStackFreeSpace())
1906 MFI.setObjectAlignment(CSI[0].getFrameIdx(), Align(16));
1907 // We iterated bottom up over the registers; flip RegPairs back to top
1908 // down order.
1909 std::reverse(RegPairs.begin(), RegPairs.end());
1910 }
1911}
1912
1916 MachineFunction &MF = *MBB.getParent();
1917 auto &TLI = *MF.getSubtarget<AArch64Subtarget>().getTargetLowering();
1919 bool NeedsWinCFI = needsWinCFI(MF);
1920 DebugLoc DL;
1922
1923 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
1924
1926 // Refresh the reserved regs in case there are any potential changes since the
1927 // last freeze.
1928 MRI.freezeReservedRegs();
1929
1930 if (homogeneousPrologEpilog(MF)) {
1931 auto MIB = BuildMI(MBB, MI, DL, TII.get(AArch64::HOM_Prolog))
1933
1934 for (auto &RPI : RegPairs) {
1935 MIB.addReg(RPI.Reg1);
1936 MIB.addReg(RPI.Reg2);
1937
1938 // Update register live in.
1939 if (!MRI.isReserved(RPI.Reg1))
1940 MBB.addLiveIn(RPI.Reg1);
1941 if (RPI.isPaired() && !MRI.isReserved(RPI.Reg2))
1942 MBB.addLiveIn(RPI.Reg2);
1943 }
1944 return true;
1945 }
1946 bool PTrueCreated = false;
1947 for (const RegPairInfo &RPI : llvm::reverse(RegPairs)) {
1948 Register Reg1 = RPI.Reg1;
1949 Register Reg2 = RPI.Reg2;
1950 unsigned StrOpc;
1951
1952 // Issue sequence of spills for cs regs. The first spill may be converted
1953 // to a pre-decrement store later by emitPrologue if the callee-save stack
1954 // area allocation can't be combined with the local stack area allocation.
1955 // For example:
1956 // stp x22, x21, [sp, #0] // addImm(+0)
1957 // stp x20, x19, [sp, #16] // addImm(+2)
1958 // stp fp, lr, [sp, #32] // addImm(+4)
1959 // Rationale: This sequence saves uop updates compared to a sequence of
1960 // pre-increment spills like stp xi,xj,[sp,#-16]!
1961 // Note: Similar rationale and sequence for restores in epilog.
1962 unsigned Size = TRI->getSpillSize(*RPI.RC);
1963 Align Alignment = TRI->getSpillAlign(*RPI.RC);
1964 switch (RPI.Type) {
1965 case RegPairInfo::GPR:
1966 StrOpc = RPI.isPaired() ? AArch64::STPXi : AArch64::STRXui;
1967 break;
1968 case RegPairInfo::FPR64:
1969 StrOpc = RPI.isPaired() ? AArch64::STPDi : AArch64::STRDui;
1970 break;
1971 case RegPairInfo::FPR128:
1972 StrOpc = RPI.isPaired() ? AArch64::STPQi : AArch64::STRQui;
1973 break;
1974 case RegPairInfo::ZPR:
1975 StrOpc = RPI.isPaired() ? AArch64::ST1B_2Z_IMM : AArch64::STR_ZXI;
1976 break;
1977 case RegPairInfo::PPR:
1978 StrOpc = AArch64::STR_PXI;
1979 break;
1980 case RegPairInfo::VG:
1981 StrOpc = AArch64::STRXui;
1982 break;
1983 }
1984
1985 Register X0Scratch;
1986 auto RestoreX0 = make_scope_exit([&] {
1987 if (X0Scratch != AArch64::NoRegister)
1988 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), AArch64::X0)
1989 .addReg(X0Scratch)
1991 });
1992
1993 if (Reg1 == AArch64::VG) {
1994 // Find an available register to store value of VG to.
1995 Reg1 = findScratchNonCalleeSaveRegister(&MBB, true);
1996 assert(Reg1 != AArch64::NoRegister);
1997 if (MF.getSubtarget<AArch64Subtarget>().hasSVE()) {
1998 BuildMI(MBB, MI, DL, TII.get(AArch64::CNTD_XPiI), Reg1)
1999 .addImm(31)
2000 .addImm(1)
2002 } else {
2004 if (any_of(MBB.liveins(),
2005 [&STI](const MachineBasicBlock::RegisterMaskPair &LiveIn) {
2006 return STI.getRegisterInfo()->isSuperOrSubRegisterEq(
2007 AArch64::X0, LiveIn.PhysReg);
2008 })) {
2009 X0Scratch = Reg1;
2010 BuildMI(MBB, MI, DL, TII.get(TargetOpcode::COPY), X0Scratch)
2011 .addReg(AArch64::X0)
2013 }
2014
2015 RTLIB::Libcall LC = RTLIB::SMEABI_GET_CURRENT_VG;
2016 const uint32_t *RegMask =
2017 TRI->getCallPreservedMask(MF, TLI.getLibcallCallingConv(LC));
2018 BuildMI(MBB, MI, DL, TII.get(AArch64::BL))
2019 .addExternalSymbol(TLI.getLibcallName(LC))
2020 .addRegMask(RegMask)
2021 .addReg(AArch64::X0, RegState::ImplicitDefine)
2023 Reg1 = AArch64::X0;
2024 }
2025 }
2026
2027 LLVM_DEBUG({
2028 dbgs() << "CSR spill: (" << printReg(Reg1, TRI);
2029 if (RPI.isPaired())
2030 dbgs() << ", " << printReg(Reg2, TRI);
2031 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2032 if (RPI.isPaired())
2033 dbgs() << ", " << RPI.FrameIdx + 1;
2034 dbgs() << ")\n";
2035 });
2036
2037 assert((!NeedsWinCFI || !(Reg1 == AArch64::LR && Reg2 == AArch64::FP)) &&
2038 "Windows unwdinding requires a consecutive (FP,LR) pair");
2039 // Windows unwind codes require consecutive registers if registers are
2040 // paired. Make the switch here, so that the code below will save (x,x+1)
2041 // and not (x+1,x).
2042 unsigned FrameIdxReg1 = RPI.FrameIdx;
2043 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2044 if (NeedsWinCFI && RPI.isPaired()) {
2045 std::swap(Reg1, Reg2);
2046 std::swap(FrameIdxReg1, FrameIdxReg2);
2047 }
2048
2049 if (RPI.isPaired() && RPI.isScalable()) {
2050 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2053 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2054 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2055 "Expects SVE2.1 or SME2 target and a predicate register");
2056#ifdef EXPENSIVE_CHECKS
2057 auto IsPPR = [](const RegPairInfo &c) {
2058 return c.Reg1 == RegPairInfo::PPR;
2059 };
2060 auto PPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsPPR);
2061 auto IsZPR = [](const RegPairInfo &c) {
2062 return c.Type == RegPairInfo::ZPR;
2063 };
2064 auto ZPRBegin = std::find_if(RegPairs.begin(), RegPairs.end(), IsZPR);
2065 assert(!(PPRBegin < ZPRBegin) &&
2066 "Expected callee save predicate to be handled first");
2067#endif
2068 if (!PTrueCreated) {
2069 PTrueCreated = true;
2070 BuildMI(MBB, MI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2072 }
2073 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2074 if (!MRI.isReserved(Reg1))
2075 MBB.addLiveIn(Reg1);
2076 if (!MRI.isReserved(Reg2))
2077 MBB.addLiveIn(Reg2);
2078 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0));
2080 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2081 MachineMemOperand::MOStore, Size, Alignment));
2082 MIB.addReg(PnReg);
2083 MIB.addReg(AArch64::SP)
2084 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale],
2085 // where 2*vscale is implicit
2088 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2089 MachineMemOperand::MOStore, Size, Alignment));
2090 if (NeedsWinCFI)
2091 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2092 } else { // The code when the pair of ZReg is not present
2093 MachineInstrBuilder MIB = BuildMI(MBB, MI, DL, TII.get(StrOpc));
2094 if (!MRI.isReserved(Reg1))
2095 MBB.addLiveIn(Reg1);
2096 if (RPI.isPaired()) {
2097 if (!MRI.isReserved(Reg2))
2098 MBB.addLiveIn(Reg2);
2099 MIB.addReg(Reg2, getPrologueDeath(MF, Reg2));
2101 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2102 MachineMemOperand::MOStore, Size, Alignment));
2103 }
2104 MIB.addReg(Reg1, getPrologueDeath(MF, Reg1))
2105 .addReg(AArch64::SP)
2106 .addImm(RPI.Offset) // [sp, #offset*vscale],
2107 // where factor*vscale is implicit
2110 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2111 MachineMemOperand::MOStore, Size, Alignment));
2112 if (NeedsWinCFI)
2113 insertSEH(MIB, TII, MachineInstr::FrameSetup);
2114 }
2115 // Update the StackIDs of the SVE stack slots.
2116 MachineFrameInfo &MFI = MF.getFrameInfo();
2117 if (RPI.Type == RegPairInfo::ZPR) {
2118 MFI.setStackID(FrameIdxReg1, TargetStackID::ScalableVector);
2119 if (RPI.isPaired())
2120 MFI.setStackID(FrameIdxReg2, TargetStackID::ScalableVector);
2121 } else if (RPI.Type == RegPairInfo::PPR) {
2123 if (RPI.isPaired())
2125 }
2126 }
2127 return true;
2128}
2129
2133 MachineFunction &MF = *MBB.getParent();
2135 DebugLoc DL;
2137 bool NeedsWinCFI = needsWinCFI(MF);
2138
2139 if (MBBI != MBB.end())
2140 DL = MBBI->getDebugLoc();
2141
2142 computeCalleeSaveRegisterPairs(*this, MF, CSI, TRI, RegPairs, hasFP(MF));
2143 if (homogeneousPrologEpilog(MF, &MBB)) {
2144 auto MIB = BuildMI(MBB, MBBI, DL, TII.get(AArch64::HOM_Epilog))
2146 for (auto &RPI : RegPairs) {
2147 MIB.addReg(RPI.Reg1, RegState::Define);
2148 MIB.addReg(RPI.Reg2, RegState::Define);
2149 }
2150 return true;
2151 }
2152
2153 // For performance reasons restore SVE register in increasing order
2154 auto IsPPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::PPR; };
2155 auto PPRBegin = llvm::find_if(RegPairs, IsPPR);
2156 auto PPREnd = std::find_if_not(PPRBegin, RegPairs.end(), IsPPR);
2157 std::reverse(PPRBegin, PPREnd);
2158 auto IsZPR = [](const RegPairInfo &c) { return c.Type == RegPairInfo::ZPR; };
2159 auto ZPRBegin = llvm::find_if(RegPairs, IsZPR);
2160 auto ZPREnd = std::find_if_not(ZPRBegin, RegPairs.end(), IsZPR);
2161 std::reverse(ZPRBegin, ZPREnd);
2162
2163 bool PTrueCreated = false;
2164 for (const RegPairInfo &RPI : RegPairs) {
2165 Register Reg1 = RPI.Reg1;
2166 Register Reg2 = RPI.Reg2;
2167
2168 // Issue sequence of restores for cs regs. The last restore may be converted
2169 // to a post-increment load later by emitEpilogue if the callee-save stack
2170 // area allocation can't be combined with the local stack area allocation.
2171 // For example:
2172 // ldp fp, lr, [sp, #32] // addImm(+4)
2173 // ldp x20, x19, [sp, #16] // addImm(+2)
2174 // ldp x22, x21, [sp, #0] // addImm(+0)
2175 // Note: see comment in spillCalleeSavedRegisters()
2176 unsigned LdrOpc;
2177 unsigned Size = TRI->getSpillSize(*RPI.RC);
2178 Align Alignment = TRI->getSpillAlign(*RPI.RC);
2179 switch (RPI.Type) {
2180 case RegPairInfo::GPR:
2181 LdrOpc = RPI.isPaired() ? AArch64::LDPXi : AArch64::LDRXui;
2182 break;
2183 case RegPairInfo::FPR64:
2184 LdrOpc = RPI.isPaired() ? AArch64::LDPDi : AArch64::LDRDui;
2185 break;
2186 case RegPairInfo::FPR128:
2187 LdrOpc = RPI.isPaired() ? AArch64::LDPQi : AArch64::LDRQui;
2188 break;
2189 case RegPairInfo::ZPR:
2190 LdrOpc = RPI.isPaired() ? AArch64::LD1B_2Z_IMM : AArch64::LDR_ZXI;
2191 break;
2192 case RegPairInfo::PPR:
2193 LdrOpc = AArch64::LDR_PXI;
2194 break;
2195 case RegPairInfo::VG:
2196 continue;
2197 }
2198 LLVM_DEBUG({
2199 dbgs() << "CSR restore: (" << printReg(Reg1, TRI);
2200 if (RPI.isPaired())
2201 dbgs() << ", " << printReg(Reg2, TRI);
2202 dbgs() << ") -> fi#(" << RPI.FrameIdx;
2203 if (RPI.isPaired())
2204 dbgs() << ", " << RPI.FrameIdx + 1;
2205 dbgs() << ")\n";
2206 });
2207
2208 // Windows unwind codes require consecutive registers if registers are
2209 // paired. Make the switch here, so that the code below will save (x,x+1)
2210 // and not (x+1,x).
2211 unsigned FrameIdxReg1 = RPI.FrameIdx;
2212 unsigned FrameIdxReg2 = RPI.FrameIdx + 1;
2213 if (NeedsWinCFI && RPI.isPaired()) {
2214 std::swap(Reg1, Reg2);
2215 std::swap(FrameIdxReg1, FrameIdxReg2);
2216 }
2217
2219 if (RPI.isPaired() && RPI.isScalable()) {
2220 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2222 unsigned PnReg = AFI->getPredicateRegForFillSpill();
2223 assert((PnReg != 0 && enableMultiVectorSpillFill(Subtarget, MF)) &&
2224 "Expects SVE2.1 or SME2 target and a predicate register");
2225#ifdef EXPENSIVE_CHECKS
2226 assert(!(PPRBegin < ZPRBegin) &&
2227 "Expected callee save predicate to be handled first");
2228#endif
2229 if (!PTrueCreated) {
2230 PTrueCreated = true;
2231 BuildMI(MBB, MBBI, DL, TII.get(AArch64::PTRUE_C_B), PnReg)
2233 }
2234 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2235 MIB.addReg(/*PairRegs*/ AArch64::Z0_Z1 + (RPI.Reg1 - AArch64::Z0),
2236 getDefRegState(true));
2238 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2239 MachineMemOperand::MOLoad, Size, Alignment));
2240 MIB.addReg(PnReg);
2241 MIB.addReg(AArch64::SP)
2242 .addImm(RPI.Offset / 2) // [sp, #imm*2*vscale]
2243 // where 2*vscale is implicit
2246 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2247 MachineMemOperand::MOLoad, Size, Alignment));
2248 if (NeedsWinCFI)
2249 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2250 } else {
2251 MachineInstrBuilder MIB = BuildMI(MBB, MBBI, DL, TII.get(LdrOpc));
2252 if (RPI.isPaired()) {
2253 MIB.addReg(Reg2, getDefRegState(true));
2255 MachinePointerInfo::getFixedStack(MF, FrameIdxReg2),
2256 MachineMemOperand::MOLoad, Size, Alignment));
2257 }
2258 MIB.addReg(Reg1, getDefRegState(true));
2259 MIB.addReg(AArch64::SP)
2260 .addImm(RPI.Offset) // [sp, #offset*vscale]
2261 // where factor*vscale is implicit
2264 MachinePointerInfo::getFixedStack(MF, FrameIdxReg1),
2265 MachineMemOperand::MOLoad, Size, Alignment));
2266 if (NeedsWinCFI)
2267 insertSEH(MIB, TII, MachineInstr::FrameDestroy);
2268 }
2269 }
2270 return true;
2271}
2272
2273// Return the FrameID for a MMO.
2274static std::optional<int> getMMOFrameID(MachineMemOperand *MMO,
2275 const MachineFrameInfo &MFI) {
2276 auto *PSV =
2278 if (PSV)
2279 return std::optional<int>(PSV->getFrameIndex());
2280
2281 if (MMO->getValue()) {
2282 if (auto *Al = dyn_cast<AllocaInst>(getUnderlyingObject(MMO->getValue()))) {
2283 for (int FI = MFI.getObjectIndexBegin(); FI < MFI.getObjectIndexEnd();
2284 FI++)
2285 if (MFI.getObjectAllocation(FI) == Al)
2286 return FI;
2287 }
2288 }
2289
2290 return std::nullopt;
2291}
2292
2293// Return the FrameID for a Load/Store instruction by looking at the first MMO.
2294static std::optional<int> getLdStFrameID(const MachineInstr &MI,
2295 const MachineFrameInfo &MFI) {
2296 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
2297 return std::nullopt;
2298
2299 return getMMOFrameID(*MI.memoperands_begin(), MFI);
2300}
2301
2302// Returns true if the LDST MachineInstr \p MI is a PPR access.
2303static bool isPPRAccess(const MachineInstr &MI) {
2304 return AArch64::PPRRegClass.contains(MI.getOperand(0).getReg());
2305}
2306
2307// Check if a Hazard slot is needed for the current function, and if so create
2308// one for it. The index is stored in AArch64FunctionInfo->StackHazardSlotIndex,
2309// which can be used to determine if any hazard padding is needed.
2310void AArch64FrameLowering::determineStackHazardSlot(
2311 MachineFunction &MF, BitVector &SavedRegs) const {
2312 unsigned StackHazardSize = getStackHazardSize(MF);
2313 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2314 if (StackHazardSize == 0 || StackHazardSize % 16 != 0 ||
2316 return;
2317
2318 // Stack hazards are only needed in streaming functions.
2319 SMEAttrs Attrs = AFI->getSMEFnAttrs();
2320 if (!StackHazardInNonStreaming && Attrs.hasNonStreamingInterfaceAndBody())
2321 return;
2322
2323 MachineFrameInfo &MFI = MF.getFrameInfo();
2324
2325 // Add a hazard slot if there are any CSR FPR registers, or are any fp-only
2326 // stack objects.
2327 bool HasFPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2328 return AArch64::FPR64RegClass.contains(Reg) ||
2329 AArch64::FPR128RegClass.contains(Reg) ||
2330 AArch64::ZPRRegClass.contains(Reg);
2331 });
2332 bool HasPPRCSRs = any_of(SavedRegs.set_bits(), [](unsigned Reg) {
2333 return AArch64::PPRRegClass.contains(Reg);
2334 });
2335 bool HasFPRStackObjects = false;
2336 bool HasPPRStackObjects = false;
2337 if (!HasFPRCSRs || SplitSVEObjects) {
2338 enum SlotType : uint8_t {
2339 Unknown = 0,
2340 ZPRorFPR = 1 << 0,
2341 PPR = 1 << 1,
2342 GPR = 1 << 2,
2344 };
2345
2346 // Find stack slots solely used for one kind of register (ZPR, PPR, etc.),
2347 // based on the kinds of accesses used in the function.
2348 SmallVector<SlotType> SlotTypes(MFI.getObjectIndexEnd(), SlotType::Unknown);
2349 for (auto &MBB : MF) {
2350 for (auto &MI : MBB) {
2351 std::optional<int> FI = getLdStFrameID(MI, MFI);
2352 if (!FI || FI < 0 || FI > int(SlotTypes.size()))
2353 continue;
2354 if (MFI.hasScalableStackID(*FI)) {
2355 SlotTypes[*FI] |=
2356 isPPRAccess(MI) ? SlotType::PPR : SlotType::ZPRorFPR;
2357 } else {
2358 SlotTypes[*FI] |= AArch64InstrInfo::isFpOrNEON(MI)
2359 ? SlotType::ZPRorFPR
2360 : SlotType::GPR;
2361 }
2362 }
2363 }
2364
2365 for (int FI = 0; FI < int(SlotTypes.size()); ++FI) {
2366 HasFPRStackObjects |= SlotTypes[FI] == SlotType::ZPRorFPR;
2367 // For SplitSVEObjects remember that this stack slot is a predicate, this
2368 // will be needed later when determining the frame layout.
2369 if (SlotTypes[FI] == SlotType::PPR) {
2371 HasPPRStackObjects = true;
2372 }
2373 }
2374 }
2375
2376 if (HasFPRCSRs || HasFPRStackObjects) {
2377 int ID = MFI.CreateStackObject(StackHazardSize, Align(16), false);
2378 LLVM_DEBUG(dbgs() << "Created Hazard slot at " << ID << " size "
2379 << StackHazardSize << "\n");
2381 }
2382
2383 // Determine if we should use SplitSVEObjects. This should only be used if
2384 // there's a possibility of a stack hazard between PPRs and ZPRs or FPRs.
2385 if (SplitSVEObjects) {
2386 if (!HasPPRCSRs && !HasPPRStackObjects) {
2387 LLVM_DEBUG(
2388 dbgs() << "Not using SplitSVEObjects as no PPRs are on the stack\n");
2389 return;
2390 }
2391
2392 if (!HasFPRCSRs && !HasFPRStackObjects) {
2393 LLVM_DEBUG(
2394 dbgs()
2395 << "Not using SplitSVEObjects as no FPRs or ZPRs are on the stack\n");
2396 return;
2397 }
2398
2399 // If another calling convention is explicitly set FPRs can't be promoted to
2400 // ZPR callee-saves.
2403 MF.getFunction().getCallingConv())) {
2404 LLVM_DEBUG(
2405 dbgs() << "Calling convention is not supported with SplitSVEObjects");
2406 return;
2407 }
2408
2409 [[maybe_unused]] const AArch64Subtarget &Subtarget =
2410 MF.getSubtarget<AArch64Subtarget>();
2412 "Expected SVE to be available for PPRs");
2413
2414 const TargetRegisterInfo *TRI = MF.getSubtarget().getRegisterInfo();
2415 // With SplitSVEObjects the CS hazard padding is placed between the
2416 // PPRs and ZPRs. If there are any FPR CS there would be a hazard between
2417 // them and the CS GRPs. Avoid this by promoting all FPR CS to ZPRs.
2418 BitVector FPRZRegs(SavedRegs.size());
2419 for (size_t Reg = 0, E = SavedRegs.size(); HasFPRCSRs && Reg < E; ++Reg) {
2420 BitVector::reference RegBit = SavedRegs[Reg];
2421 if (!RegBit)
2422 continue;
2423 unsigned SubRegIdx = 0;
2424 if (AArch64::FPR64RegClass.contains(Reg))
2425 SubRegIdx = AArch64::dsub;
2426 else if (AArch64::FPR128RegClass.contains(Reg))
2427 SubRegIdx = AArch64::zsub;
2428 else
2429 continue;
2430 // Clear the bit for the FPR save.
2431 RegBit = false;
2432 // Mark that we should save the corresponding ZPR.
2433 Register ZReg =
2434 TRI->getMatchingSuperReg(Reg, SubRegIdx, &AArch64::ZPRRegClass);
2435 FPRZRegs.set(ZReg);
2436 }
2437 SavedRegs |= FPRZRegs;
2438
2439 AFI->setSplitSVEObjects(true);
2440 LLVM_DEBUG(dbgs() << "SplitSVEObjects enabled!\n");
2441 }
2442}
2443
2445 BitVector &SavedRegs,
2446 RegScavenger *RS) const {
2447 // All calls are tail calls in GHC calling conv, and functions have no
2448 // prologue/epilogue.
2450 return;
2451
2452 const AArch64Subtarget &Subtarget = MF.getSubtarget<AArch64Subtarget>();
2453
2455 const AArch64RegisterInfo *RegInfo = Subtarget.getRegisterInfo();
2457 unsigned UnspilledCSGPR = AArch64::NoRegister;
2458 unsigned UnspilledCSGPRPaired = AArch64::NoRegister;
2459
2460 MachineFrameInfo &MFI = MF.getFrameInfo();
2461 const MCPhysReg *CSRegs = MF.getRegInfo().getCalleeSavedRegs();
2462
2463 MCRegister BasePointerReg =
2464 RegInfo->hasBasePointer(MF) ? RegInfo->getBaseRegister() : MCRegister();
2465
2466 unsigned ExtraCSSpill = 0;
2467 bool HasUnpairedGPR64 = false;
2468 bool HasPairZReg = false;
2469 BitVector UserReservedRegs = RegInfo->getUserReservedRegs(MF);
2470 BitVector ReservedRegs = RegInfo->getReservedRegs(MF);
2471
2472 // Figure out which callee-saved registers to save/restore.
2473 for (unsigned i = 0; CSRegs[i]; ++i) {
2474 const MCRegister Reg = CSRegs[i];
2475
2476 // Add the base pointer register to SavedRegs if it is callee-save.
2477 if (Reg == BasePointerReg)
2478 SavedRegs.set(Reg);
2479
2480 // Don't save manually reserved registers set through +reserve-x#i,
2481 // even for callee-saved registers, as per GCC's behavior.
2482 if (UserReservedRegs[Reg]) {
2483 SavedRegs.reset(Reg);
2484 continue;
2485 }
2486
2487 bool RegUsed = SavedRegs.test(Reg);
2488 MCRegister PairedReg;
2489 const bool RegIsGPR64 = AArch64::GPR64RegClass.contains(Reg);
2490 if (RegIsGPR64 || AArch64::FPR64RegClass.contains(Reg) ||
2491 AArch64::FPR128RegClass.contains(Reg)) {
2492 // Compensate for odd numbers of GP CSRs.
2493 // For now, all the known cases of odd number of CSRs are of GPRs.
2494 if (HasUnpairedGPR64)
2495 PairedReg = CSRegs[i % 2 == 0 ? i - 1 : i + 1];
2496 else
2497 PairedReg = CSRegs[i ^ 1];
2498 }
2499
2500 // If the function requires all the GP registers to save (SavedRegs),
2501 // and there are an odd number of GP CSRs at the same time (CSRegs),
2502 // PairedReg could be in a different register class from Reg, which would
2503 // lead to a FPR (usually D8) accidentally being marked saved.
2504 if (RegIsGPR64 && !AArch64::GPR64RegClass.contains(PairedReg)) {
2505 PairedReg = AArch64::NoRegister;
2506 HasUnpairedGPR64 = true;
2507 }
2508 assert(PairedReg == AArch64::NoRegister ||
2509 AArch64::GPR64RegClass.contains(Reg, PairedReg) ||
2510 AArch64::FPR64RegClass.contains(Reg, PairedReg) ||
2511 AArch64::FPR128RegClass.contains(Reg, PairedReg));
2512
2513 if (!RegUsed) {
2514 if (AArch64::GPR64RegClass.contains(Reg) && !ReservedRegs[Reg]) {
2515 UnspilledCSGPR = Reg;
2516 UnspilledCSGPRPaired = PairedReg;
2517 }
2518 continue;
2519 }
2520
2521 // MachO's compact unwind format relies on all registers being stored in
2522 // pairs.
2523 // FIXME: the usual format is actually better if unwinding isn't needed.
2524 if (producePairRegisters(MF) && PairedReg != AArch64::NoRegister &&
2525 !SavedRegs.test(PairedReg)) {
2526 SavedRegs.set(PairedReg);
2527 if (AArch64::GPR64RegClass.contains(PairedReg) &&
2528 !ReservedRegs[PairedReg])
2529 ExtraCSSpill = PairedReg;
2530 }
2531 // Check if there is a pair of ZRegs, so it can select PReg for spill/fill
2532 HasPairZReg |= (AArch64::ZPRRegClass.contains(Reg, CSRegs[i ^ 1]) &&
2533 SavedRegs.test(CSRegs[i ^ 1]));
2534 }
2535
2536 if (HasPairZReg && enableMultiVectorSpillFill(Subtarget, MF)) {
2538 // Find a suitable predicate register for the multi-vector spill/fill
2539 // instructions.
2540 MCRegister PnReg = findFreePredicateReg(SavedRegs);
2541 if (PnReg.isValid())
2542 AFI->setPredicateRegForFillSpill(PnReg);
2543 // If no free callee-save has been found assign one.
2544 if (!AFI->getPredicateRegForFillSpill() &&
2545 MF.getFunction().getCallingConv() ==
2547 SavedRegs.set(AArch64::P8);
2548 AFI->setPredicateRegForFillSpill(AArch64::PN8);
2549 }
2550
2551 assert(!ReservedRegs[AFI->getPredicateRegForFillSpill()] &&
2552 "Predicate cannot be a reserved register");
2553 }
2554
2556 !Subtarget.isTargetWindows()) {
2557 // For Windows calling convention on a non-windows OS, where X18 is treated
2558 // as reserved, back up X18 when entering non-windows code (marked with the
2559 // Windows calling convention) and restore when returning regardless of
2560 // whether the individual function uses it - it might call other functions
2561 // that clobber it.
2562 SavedRegs.set(AArch64::X18);
2563 }
2564
2565 // Determine if a Hazard slot should be used and where it should go.
2566 // If SplitSVEObjects is used, the hazard padding is placed between the PPRs
2567 // and ZPRs. Otherwise, it goes in the callee save area.
2568 determineStackHazardSlot(MF, SavedRegs);
2569
2570 // Calculates the callee saved stack size.
2571 unsigned CSStackSize = 0;
2572 unsigned ZPRCSStackSize = 0;
2573 unsigned PPRCSStackSize = 0;
2575 for (unsigned Reg : SavedRegs.set_bits()) {
2576 auto *RC = TRI->getMinimalPhysRegClass(MCRegister(Reg));
2577 assert(RC && "expected register class!");
2578 auto SpillSize = TRI->getSpillSize(*RC);
2579 bool IsZPR = AArch64::ZPRRegClass.contains(Reg);
2580 bool IsPPR = !IsZPR && AArch64::PPRRegClass.contains(Reg);
2581 if (IsZPR)
2582 ZPRCSStackSize += SpillSize;
2583 else if (IsPPR)
2584 PPRCSStackSize += SpillSize;
2585 else
2586 CSStackSize += SpillSize;
2587 }
2588
2589 // Save number of saved regs, so we can easily update CSStackSize later to
2590 // account for any additional 64-bit GPR saves. Note: After this point
2591 // only 64-bit GPRs can be added to SavedRegs.
2592 unsigned NumSavedRegs = SavedRegs.count();
2593
2594 // If we have hazard padding in the CS area add that to the size.
2596 CSStackSize += getStackHazardSize(MF);
2597
2598 // Increase the callee-saved stack size if the function has streaming mode
2599 // changes, as we will need to spill the value of the VG register.
2600 if (requiresSaveVG(MF))
2601 CSStackSize += 8;
2602
2603 // If we must call __arm_get_current_vg in the prologue preserve the LR.
2604 if (requiresSaveVG(MF) && !Subtarget.hasSVE())
2605 SavedRegs.set(AArch64::LR);
2606
2607 // The frame record needs to be created by saving the appropriate registers
2608 uint64_t EstimatedStackSize = MFI.estimateStackSize(MF);
2609 if (hasFP(MF) ||
2610 windowsRequiresStackProbe(MF, EstimatedStackSize + CSStackSize + 16)) {
2611 SavedRegs.set(AArch64::FP);
2612 SavedRegs.set(AArch64::LR);
2613 }
2614
2615 LLVM_DEBUG({
2616 dbgs() << "*** determineCalleeSaves\nSaved CSRs:";
2617 for (unsigned Reg : SavedRegs.set_bits())
2618 dbgs() << ' ' << printReg(MCRegister(Reg), RegInfo);
2619 dbgs() << "\n";
2620 });
2621
2622 // If any callee-saved registers are used, the frame cannot be eliminated.
2623 auto [ZPRLocalStackSize, PPRLocalStackSize] =
2625 uint64_t SVELocals = ZPRLocalStackSize + PPRLocalStackSize;
2626 uint64_t SVEStackSize =
2627 alignTo(ZPRCSStackSize + PPRCSStackSize + SVELocals, 16);
2628 bool CanEliminateFrame = (SavedRegs.count() == 0) && !SVEStackSize;
2629
2630 // The CSR spill slots have not been allocated yet, so estimateStackSize
2631 // won't include them.
2632 unsigned EstimatedStackSizeLimit = estimateRSStackSizeLimit(MF);
2633
2634 // We may address some of the stack above the canonical frame address, either
2635 // for our own arguments or during a call. Include that in calculating whether
2636 // we have complicated addressing concerns.
2637 int64_t CalleeStackUsed = 0;
2638 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I) {
2639 int64_t FixedOff = MFI.getObjectOffset(I);
2640 if (FixedOff > CalleeStackUsed)
2641 CalleeStackUsed = FixedOff;
2642 }
2643
2644 // Conservatively always assume BigStack when there are SVE spills.
2645 bool BigStack = SVEStackSize || (EstimatedStackSize + CSStackSize +
2646 CalleeStackUsed) > EstimatedStackSizeLimit;
2647 if (BigStack || !CanEliminateFrame || RegInfo->cannotEliminateFrame(MF))
2648 AFI->setHasStackFrame(true);
2649
2650 // Estimate if we might need to scavenge a register at some point in order
2651 // to materialize a stack offset. If so, either spill one additional
2652 // callee-saved register or reserve a special spill slot to facilitate
2653 // register scavenging. If we already spilled an extra callee-saved register
2654 // above to keep the number of spills even, we don't need to do anything else
2655 // here.
2656 if (BigStack) {
2657 if (!ExtraCSSpill && UnspilledCSGPR != AArch64::NoRegister) {
2658 LLVM_DEBUG(dbgs() << "Spilling " << printReg(UnspilledCSGPR, RegInfo)
2659 << " to get a scratch register.\n");
2660 SavedRegs.set(UnspilledCSGPR);
2661 ExtraCSSpill = UnspilledCSGPR;
2662
2663 // MachO's compact unwind format relies on all registers being stored in
2664 // pairs, so if we need to spill one extra for BigStack, then we need to
2665 // store the pair.
2666 if (producePairRegisters(MF)) {
2667 if (UnspilledCSGPRPaired == AArch64::NoRegister) {
2668 // Failed to make a pair for compact unwind format, revert spilling.
2669 if (produceCompactUnwindFrame(*this, MF)) {
2670 SavedRegs.reset(UnspilledCSGPR);
2671 ExtraCSSpill = AArch64::NoRegister;
2672 }
2673 } else
2674 SavedRegs.set(UnspilledCSGPRPaired);
2675 }
2676 }
2677
2678 // If we didn't find an extra callee-saved register to spill, create
2679 // an emergency spill slot.
2680 if (!ExtraCSSpill || MF.getRegInfo().isPhysRegUsed(ExtraCSSpill)) {
2682 const TargetRegisterClass &RC = AArch64::GPR64RegClass;
2683 unsigned Size = TRI->getSpillSize(RC);
2684 Align Alignment = TRI->getSpillAlign(RC);
2685 int FI = MFI.CreateSpillStackObject(Size, Alignment);
2686 RS->addScavengingFrameIndex(FI);
2687 LLVM_DEBUG(dbgs() << "No available CS registers, allocated fi#" << FI
2688 << " as the emergency spill slot.\n");
2689 }
2690 }
2691
2692 // Adding the size of additional 64bit GPR saves.
2693 CSStackSize += 8 * (SavedRegs.count() - NumSavedRegs);
2694
2695 // A Swift asynchronous context extends the frame record with a pointer
2696 // directly before FP.
2697 if (hasFP(MF) && AFI->hasSwiftAsyncContext())
2698 CSStackSize += 8;
2699
2700 uint64_t AlignedCSStackSize = alignTo(CSStackSize, 16);
2701 LLVM_DEBUG(dbgs() << "Estimated stack frame size: "
2702 << EstimatedStackSize + AlignedCSStackSize << " bytes.\n");
2703
2705 AFI->getCalleeSavedStackSize() == AlignedCSStackSize) &&
2706 "Should not invalidate callee saved info");
2707
2708 // Round up to register pair alignment to avoid additional SP adjustment
2709 // instructions.
2710 AFI->setCalleeSavedStackSize(AlignedCSStackSize);
2711 AFI->setCalleeSaveStackHasFreeSpace(AlignedCSStackSize != CSStackSize);
2712 AFI->setSVECalleeSavedStackSize(ZPRCSStackSize, alignTo(PPRCSStackSize, 16));
2713}
2714
2716 MachineFunction &MF, const TargetRegisterInfo *RegInfo,
2717 std::vector<CalleeSavedInfo> &CSI, unsigned &MinCSFrameIndex,
2718 unsigned &MaxCSFrameIndex) const {
2719 bool NeedsWinCFI = needsWinCFI(MF);
2720 unsigned StackHazardSize = getStackHazardSize(MF);
2721 // To match the canonical windows frame layout, reverse the list of
2722 // callee saved registers to get them laid out by PrologEpilogInserter
2723 // in the right order. (PrologEpilogInserter allocates stack objects top
2724 // down. Windows canonical prologs store higher numbered registers at
2725 // the top, thus have the CSI array start from the highest registers.)
2726 if (NeedsWinCFI)
2727 std::reverse(CSI.begin(), CSI.end());
2728
2729 if (CSI.empty())
2730 return true; // Early exit if no callee saved registers are modified!
2731
2732 // Now that we know which registers need to be saved and restored, allocate
2733 // stack slots for them.
2734 MachineFrameInfo &MFI = MF.getFrameInfo();
2735 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2736
2737 bool UsesWinAAPCS = isTargetWindows(MF);
2738 if (UsesWinAAPCS && hasFP(MF) && AFI->hasSwiftAsyncContext()) {
2739 int FrameIdx = MFI.CreateStackObject(8, Align(16), true);
2740 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2741 if ((unsigned)FrameIdx < MinCSFrameIndex)
2742 MinCSFrameIndex = FrameIdx;
2743 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2744 MaxCSFrameIndex = FrameIdx;
2745 }
2746
2747 // Insert VG into the list of CSRs, immediately before LR if saved.
2748 if (requiresSaveVG(MF)) {
2749 CalleeSavedInfo VGInfo(AArch64::VG);
2750 auto It =
2751 find_if(CSI, [](auto &Info) { return Info.getReg() == AArch64::LR; });
2752 if (It != CSI.end())
2753 CSI.insert(It, VGInfo);
2754 else
2755 CSI.push_back(VGInfo);
2756 }
2757
2758 Register LastReg = 0;
2759 int HazardSlotIndex = std::numeric_limits<int>::max();
2760 for (auto &CS : CSI) {
2761 MCRegister Reg = CS.getReg();
2762 const TargetRegisterClass *RC = RegInfo->getMinimalPhysRegClass(Reg);
2763
2764 // Create a hazard slot as we switch between GPR and FPR CSRs.
2766 (!LastReg || !AArch64InstrInfo::isFpOrNEON(LastReg)) &&
2768 assert(HazardSlotIndex == std::numeric_limits<int>::max() &&
2769 "Unexpected register order for hazard slot");
2770 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2771 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2772 << "\n");
2773 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2774 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
2775 MinCSFrameIndex = HazardSlotIndex;
2776 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
2777 MaxCSFrameIndex = HazardSlotIndex;
2778 }
2779
2780 unsigned Size = RegInfo->getSpillSize(*RC);
2781 Align Alignment(RegInfo->getSpillAlign(*RC));
2782 int FrameIdx = MFI.CreateStackObject(Size, Alignment, true);
2783 CS.setFrameIdx(FrameIdx);
2784
2785 if ((unsigned)FrameIdx < MinCSFrameIndex)
2786 MinCSFrameIndex = FrameIdx;
2787 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2788 MaxCSFrameIndex = FrameIdx;
2789
2790 // Grab 8 bytes below FP for the extended asynchronous frame info.
2791 if (hasFP(MF) && AFI->hasSwiftAsyncContext() && !UsesWinAAPCS &&
2792 Reg == AArch64::FP) {
2793 FrameIdx = MFI.CreateStackObject(8, Alignment, true);
2794 AFI->setSwiftAsyncContextFrameIdx(FrameIdx);
2795 if ((unsigned)FrameIdx < MinCSFrameIndex)
2796 MinCSFrameIndex = FrameIdx;
2797 if ((unsigned)FrameIdx > MaxCSFrameIndex)
2798 MaxCSFrameIndex = FrameIdx;
2799 }
2800 LastReg = Reg;
2801 }
2802
2803 // Add hazard slot in the case where no FPR CSRs are present.
2805 HazardSlotIndex == std::numeric_limits<int>::max()) {
2806 HazardSlotIndex = MFI.CreateStackObject(StackHazardSize, Align(8), true);
2807 LLVM_DEBUG(dbgs() << "Created CSR Hazard at slot " << HazardSlotIndex
2808 << "\n");
2809 AFI->setStackHazardCSRSlotIndex(HazardSlotIndex);
2810 if ((unsigned)HazardSlotIndex < MinCSFrameIndex)
2811 MinCSFrameIndex = HazardSlotIndex;
2812 if ((unsigned)HazardSlotIndex > MaxCSFrameIndex)
2813 MaxCSFrameIndex = HazardSlotIndex;
2814 }
2815
2816 return true;
2817}
2818
2820 const MachineFunction &MF) const {
2822 // If the function has streaming-mode changes, don't scavenge a
2823 // spillslot in the callee-save area, as that might require an
2824 // 'addvl' in the streaming-mode-changing call-sequence when the
2825 // function doesn't use a FP.
2826 if (AFI->hasStreamingModeChanges() && !hasFP(MF))
2827 return false;
2828 // Don't allow register salvaging with hazard slots, in case it moves objects
2829 // into the wrong place.
2830 if (AFI->hasStackHazardSlotIndex())
2831 return false;
2832 return AFI->hasCalleeSaveStackFreeSpace();
2833}
2834
2835/// returns true if there are any SVE callee saves.
2837 int &Min, int &Max) {
2838 Min = std::numeric_limits<int>::max();
2839 Max = std::numeric_limits<int>::min();
2840
2841 if (!MFI.isCalleeSavedInfoValid())
2842 return false;
2843
2844 const std::vector<CalleeSavedInfo> &CSI = MFI.getCalleeSavedInfo();
2845 for (auto &CS : CSI) {
2846 if (AArch64::ZPRRegClass.contains(CS.getReg()) ||
2847 AArch64::PPRRegClass.contains(CS.getReg())) {
2848 assert((Max == std::numeric_limits<int>::min() ||
2849 Max + 1 == CS.getFrameIdx()) &&
2850 "SVE CalleeSaves are not consecutive");
2851 Min = std::min(Min, CS.getFrameIdx());
2852 Max = std::max(Max, CS.getFrameIdx());
2853 }
2854 }
2855 return Min != std::numeric_limits<int>::max();
2856}
2857
2859 AssignObjectOffsets AssignOffsets) {
2860 MachineFrameInfo &MFI = MF.getFrameInfo();
2861 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2862
2863 SVEStackSizes SVEStack{};
2864
2865 // With SplitSVEObjects we maintain separate stack offsets for predicates
2866 // (PPRs) and SVE vectors (ZPRs). When SplitSVEObjects is disabled predicates
2867 // are included in the SVE vector area.
2868 uint64_t &ZPRStackTop = SVEStack.ZPRStackSize;
2869 uint64_t &PPRStackTop =
2870 AFI->hasSplitSVEObjects() ? SVEStack.PPRStackSize : SVEStack.ZPRStackSize;
2871
2872#ifndef NDEBUG
2873 // First process all fixed stack objects.
2874 for (int I = MFI.getObjectIndexBegin(); I != 0; ++I)
2875 assert(!MFI.hasScalableStackID(I) &&
2876 "SVE vectors should never be passed on the stack by value, only by "
2877 "reference.");
2878#endif
2879
2880 auto AllocateObject = [&](int FI) {
2882 ? ZPRStackTop
2883 : PPRStackTop;
2884
2885 // FIXME: Given that the length of SVE vectors is not necessarily a power of
2886 // two, we'd need to align every object dynamically at runtime if the
2887 // alignment is larger than 16. This is not yet supported.
2888 Align Alignment = MFI.getObjectAlign(FI);
2889 if (Alignment > Align(16))
2891 "Alignment of scalable vectors > 16 bytes is not yet supported");
2892
2893 StackTop += MFI.getObjectSize(FI);
2894 StackTop = alignTo(StackTop, Alignment);
2895
2896 assert(StackTop < (uint64_t)std::numeric_limits<int64_t>::max() &&
2897 "SVE StackTop far too large?!");
2898
2899 int64_t Offset = -int64_t(StackTop);
2900 if (AssignOffsets == AssignObjectOffsets::Yes)
2901 MFI.setObjectOffset(FI, Offset);
2902
2903 LLVM_DEBUG(dbgs() << "alloc FI(" << FI << ") at SP[" << Offset << "]\n");
2904 };
2905
2906 // Then process all callee saved slots.
2907 int MinCSFrameIndex, MaxCSFrameIndex;
2908 if (getSVECalleeSaveSlotRange(MFI, MinCSFrameIndex, MaxCSFrameIndex)) {
2909 for (int FI = MinCSFrameIndex; FI <= MaxCSFrameIndex; ++FI)
2910 AllocateObject(FI);
2911 }
2912
2913 // Ensure the CS area is 16-byte aligned.
2914 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2915 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2916
2917 // Create a buffer of SVE objects to allocate and sort it.
2918 SmallVector<int, 8> ObjectsToAllocate;
2919 // If we have a stack protector, and we've previously decided that we have SVE
2920 // objects on the stack and thus need it to go in the SVE stack area, then it
2921 // needs to go first.
2922 int StackProtectorFI = -1;
2923 if (MFI.hasStackProtectorIndex()) {
2924 StackProtectorFI = MFI.getStackProtectorIndex();
2925 if (MFI.getStackID(StackProtectorFI) == TargetStackID::ScalableVector)
2926 ObjectsToAllocate.push_back(StackProtectorFI);
2927 }
2928
2929 for (int FI = 0, E = MFI.getObjectIndexEnd(); FI != E; ++FI) {
2930 if (FI == StackProtectorFI || MFI.isDeadObjectIndex(FI))
2931 continue;
2932 if (MaxCSFrameIndex >= FI && FI >= MinCSFrameIndex)
2933 continue;
2934
2937 continue;
2938
2939 ObjectsToAllocate.push_back(FI);
2940 }
2941
2942 // Allocate all SVE locals and spills
2943 for (unsigned FI : ObjectsToAllocate)
2944 AllocateObject(FI);
2945
2946 PPRStackTop = alignTo(PPRStackTop, Align(16U));
2947 ZPRStackTop = alignTo(ZPRStackTop, Align(16U));
2948
2949 if (AssignOffsets == AssignObjectOffsets::Yes)
2950 AFI->setStackSizeSVE(SVEStack.ZPRStackSize, SVEStack.PPRStackSize);
2951
2952 return SVEStack;
2953}
2954
2956 MachineFunction &MF, RegScavenger *RS) const {
2958 "Upwards growing stack unsupported");
2959
2961
2962 // If this function isn't doing Win64-style C++ EH, we don't need to do
2963 // anything.
2964 if (!MF.hasEHFunclets())
2965 return;
2966
2967 MachineFrameInfo &MFI = MF.getFrameInfo();
2968 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
2969
2970 // Win64 C++ EH needs to allocate space for the catch objects in the fixed
2971 // object area right next to the UnwindHelp object.
2972 WinEHFuncInfo &EHInfo = *MF.getWinEHFuncInfo();
2973 int64_t CurrentOffset =
2975 for (WinEHTryBlockMapEntry &TBME : EHInfo.TryBlockMap) {
2976 for (WinEHHandlerType &H : TBME.HandlerArray) {
2977 int FrameIndex = H.CatchObj.FrameIndex;
2978 if ((FrameIndex != INT_MAX) && MFI.getObjectOffset(FrameIndex) == 0) {
2979 CurrentOffset =
2980 alignTo(CurrentOffset, MFI.getObjectAlign(FrameIndex).value());
2981 CurrentOffset += MFI.getObjectSize(FrameIndex);
2982 MFI.setObjectOffset(FrameIndex, -CurrentOffset);
2983 }
2984 }
2985 }
2986
2987 // Create an UnwindHelp object.
2988 // The UnwindHelp object is allocated at the start of the fixed object area
2989 int64_t UnwindHelpOffset = alignTo(CurrentOffset + 8, Align(16));
2990 assert(UnwindHelpOffset == getFixedObjectSize(MF, AFI, /*IsWin64*/ true,
2991 /*IsFunclet*/ false) &&
2992 "UnwindHelpOffset must be at the start of the fixed object area");
2993 int UnwindHelpFI = MFI.CreateFixedObject(/*Size*/ 8, -UnwindHelpOffset,
2994 /*IsImmutable=*/false);
2995 EHInfo.UnwindHelpFrameIdx = UnwindHelpFI;
2996
2997 MachineBasicBlock &MBB = MF.front();
2998 auto MBBI = MBB.begin();
2999 while (MBBI != MBB.end() && MBBI->getFlag(MachineInstr::FrameSetup))
3000 ++MBBI;
3001
3002 // We need to store -2 into the UnwindHelp object at the start of the
3003 // function.
3004 DebugLoc DL;
3005 RS->enterBasicBlockEnd(MBB);
3006 RS->backward(MBBI);
3007 Register DstReg = RS->FindUnusedReg(&AArch64::GPR64commonRegClass);
3008 assert(DstReg && "There must be a free register after frame setup");
3010 BuildMI(MBB, MBBI, DL, TII.get(AArch64::MOVi64imm), DstReg).addImm(-2);
3011 BuildMI(MBB, MBBI, DL, TII.get(AArch64::STURXi))
3012 .addReg(DstReg, getKillRegState(true))
3013 .addFrameIndex(UnwindHelpFI)
3014 .addImm(0);
3015}
3016
3017namespace {
3018struct TagStoreInstr {
3020 int64_t Offset, Size;
3021 explicit TagStoreInstr(MachineInstr *MI, int64_t Offset, int64_t Size)
3022 : MI(MI), Offset(Offset), Size(Size) {}
3023};
3024
3025class TagStoreEdit {
3026 MachineFunction *MF;
3027 MachineBasicBlock *MBB;
3028 MachineRegisterInfo *MRI;
3029 // Tag store instructions that are being replaced.
3031 // Combined memref arguments of the above instructions.
3033
3034 // Replace allocation tags in [FrameReg + FrameRegOffset, FrameReg +
3035 // FrameRegOffset + Size) with the address tag of SP.
3036 Register FrameReg;
3037 StackOffset FrameRegOffset;
3038 int64_t Size;
3039 // If not std::nullopt, move FrameReg to (FrameReg + FrameRegUpdate) at the
3040 // end.
3041 std::optional<int64_t> FrameRegUpdate;
3042 // MIFlags for any FrameReg updating instructions.
3043 unsigned FrameRegUpdateFlags;
3044
3045 // Use zeroing instruction variants.
3046 bool ZeroData;
3047 DebugLoc DL;
3048
3049 void emitUnrolled(MachineBasicBlock::iterator InsertI);
3050 void emitLoop(MachineBasicBlock::iterator InsertI);
3051
3052public:
3053 TagStoreEdit(MachineBasicBlock *MBB, bool ZeroData)
3054 : MBB(MBB), ZeroData(ZeroData) {
3055 MF = MBB->getParent();
3056 MRI = &MF->getRegInfo();
3057 }
3058 // Add an instruction to be replaced. Instructions must be added in the
3059 // ascending order of Offset, and have to be adjacent.
3060 void addInstruction(TagStoreInstr I) {
3061 assert((TagStores.empty() ||
3062 TagStores.back().Offset + TagStores.back().Size == I.Offset) &&
3063 "Non-adjacent tag store instructions.");
3064 TagStores.push_back(I);
3065 }
3066 void clear() { TagStores.clear(); }
3067 // Emit equivalent code at the given location, and erase the current set of
3068 // instructions. May skip if the replacement is not profitable. May invalidate
3069 // the input iterator and replace it with a valid one.
3070 void emitCode(MachineBasicBlock::iterator &InsertI,
3071 const AArch64FrameLowering *TFI, bool TryMergeSPUpdate);
3072};
3073
3074void TagStoreEdit::emitUnrolled(MachineBasicBlock::iterator InsertI) {
3075 const AArch64InstrInfo *TII =
3076 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3077
3078 const int64_t kMinOffset = -256 * 16;
3079 const int64_t kMaxOffset = 255 * 16;
3080
3081 Register BaseReg = FrameReg;
3082 int64_t BaseRegOffsetBytes = FrameRegOffset.getFixed();
3083 if (BaseRegOffsetBytes < kMinOffset ||
3084 BaseRegOffsetBytes + (Size - Size % 32) > kMaxOffset ||
3085 // BaseReg can be FP, which is not necessarily aligned to 16-bytes. In
3086 // that case, BaseRegOffsetBytes will not be aligned to 16 bytes, which
3087 // is required for the offset of ST2G.
3088 BaseRegOffsetBytes % 16 != 0) {
3089 Register ScratchReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3090 emitFrameOffset(*MBB, InsertI, DL, ScratchReg, BaseReg,
3091 StackOffset::getFixed(BaseRegOffsetBytes), TII);
3092 BaseReg = ScratchReg;
3093 BaseRegOffsetBytes = 0;
3094 }
3095
3096 MachineInstr *LastI = nullptr;
3097 while (Size) {
3098 int64_t InstrSize = (Size > 16) ? 32 : 16;
3099 unsigned Opcode =
3100 InstrSize == 16
3101 ? (ZeroData ? AArch64::STZGi : AArch64::STGi)
3102 : (ZeroData ? AArch64::STZ2Gi : AArch64::ST2Gi);
3103 assert(BaseRegOffsetBytes % 16 == 0);
3104 MachineInstr *I = BuildMI(*MBB, InsertI, DL, TII->get(Opcode))
3105 .addReg(AArch64::SP)
3106 .addReg(BaseReg)
3107 .addImm(BaseRegOffsetBytes / 16)
3108 .setMemRefs(CombinedMemRefs);
3109 // A store to [BaseReg, #0] should go last for an opportunity to fold the
3110 // final SP adjustment in the epilogue.
3111 if (BaseRegOffsetBytes == 0)
3112 LastI = I;
3113 BaseRegOffsetBytes += InstrSize;
3114 Size -= InstrSize;
3115 }
3116
3117 if (LastI)
3118 MBB->splice(InsertI, MBB, LastI);
3119}
3120
3121void TagStoreEdit::emitLoop(MachineBasicBlock::iterator InsertI) {
3122 const AArch64InstrInfo *TII =
3123 MF->getSubtarget<AArch64Subtarget>().getInstrInfo();
3124
3125 Register BaseReg = FrameRegUpdate
3126 ? FrameReg
3127 : MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3128 Register SizeReg = MRI->createVirtualRegister(&AArch64::GPR64RegClass);
3129
3130 emitFrameOffset(*MBB, InsertI, DL, BaseReg, FrameReg, FrameRegOffset, TII);
3131
3132 int64_t LoopSize = Size;
3133 // If the loop size is not a multiple of 32, split off one 16-byte store at
3134 // the end to fold BaseReg update into.
3135 if (FrameRegUpdate && *FrameRegUpdate)
3136 LoopSize -= LoopSize % 32;
3137 MachineInstr *LoopI = BuildMI(*MBB, InsertI, DL,
3138 TII->get(ZeroData ? AArch64::STZGloop_wback
3139 : AArch64::STGloop_wback))
3140 .addDef(SizeReg)
3141 .addDef(BaseReg)
3142 .addImm(LoopSize)
3143 .addReg(BaseReg)
3144 .setMemRefs(CombinedMemRefs);
3145 if (FrameRegUpdate)
3146 LoopI->setFlags(FrameRegUpdateFlags);
3147
3148 int64_t ExtraBaseRegUpdate =
3149 FrameRegUpdate ? (*FrameRegUpdate - FrameRegOffset.getFixed() - Size) : 0;
3150 LLVM_DEBUG(dbgs() << "TagStoreEdit::emitLoop: LoopSize=" << LoopSize
3151 << ", Size=" << Size
3152 << ", ExtraBaseRegUpdate=" << ExtraBaseRegUpdate
3153 << ", FrameRegUpdate=" << FrameRegUpdate
3154 << ", FrameRegOffset.getFixed()="
3155 << FrameRegOffset.getFixed() << "\n");
3156 if (LoopSize < Size) {
3157 assert(FrameRegUpdate);
3158 assert(Size - LoopSize == 16);
3159 // Tag 16 more bytes at BaseReg and update BaseReg.
3160 int64_t STGOffset = ExtraBaseRegUpdate + 16;
3161 assert(STGOffset % 16 == 0 && STGOffset >= -4096 && STGOffset <= 4080 &&
3162 "STG immediate out of range");
3163 BuildMI(*MBB, InsertI, DL,
3164 TII->get(ZeroData ? AArch64::STZGPostIndex : AArch64::STGPostIndex))
3165 .addDef(BaseReg)
3166 .addReg(BaseReg)
3167 .addReg(BaseReg)
3168 .addImm(STGOffset / 16)
3169 .setMemRefs(CombinedMemRefs)
3170 .setMIFlags(FrameRegUpdateFlags);
3171 } else if (ExtraBaseRegUpdate) {
3172 // Update BaseReg.
3173 int64_t AddSubOffset = std::abs(ExtraBaseRegUpdate);
3174 assert(AddSubOffset <= 4095 && "ADD/SUB immediate out of range");
3175 BuildMI(
3176 *MBB, InsertI, DL,
3177 TII->get(ExtraBaseRegUpdate > 0 ? AArch64::ADDXri : AArch64::SUBXri))
3178 .addDef(BaseReg)
3179 .addReg(BaseReg)
3180 .addImm(AddSubOffset)
3181 .addImm(0)
3182 .setMIFlags(FrameRegUpdateFlags);
3183 }
3184}
3185
3186// Check if *II is a register update that can be merged into STGloop that ends
3187// at (Reg + Size). RemainingOffset is the required adjustment to Reg after the
3188// end of the loop.
3189bool canMergeRegUpdate(MachineBasicBlock::iterator II, unsigned Reg,
3190 int64_t Size, int64_t *TotalOffset) {
3191 MachineInstr &MI = *II;
3192 if ((MI.getOpcode() == AArch64::ADDXri ||
3193 MI.getOpcode() == AArch64::SUBXri) &&
3194 MI.getOperand(0).getReg() == Reg && MI.getOperand(1).getReg() == Reg) {
3195 unsigned Shift = AArch64_AM::getShiftValue(MI.getOperand(3).getImm());
3196 int64_t Offset = MI.getOperand(2).getImm() << Shift;
3197 if (MI.getOpcode() == AArch64::SUBXri)
3198 Offset = -Offset;
3199 int64_t PostOffset = Offset - Size;
3200 // TagStoreEdit::emitLoop might emit either an ADD/SUB after the loop, or
3201 // an STGPostIndex which does the last 16 bytes of tag write. Which one is
3202 // chosen depends on the alignment of the loop size, but the difference
3203 // between the valid ranges for the two instructions is small, so we
3204 // conservatively assume that it could be either case here.
3205 //
3206 // Max offset of STGPostIndex, minus the 16 byte tag write folded into that
3207 // instruction.
3208 const int64_t kMaxOffset = 4080 - 16;
3209 // Max offset of SUBXri.
3210 const int64_t kMinOffset = -4095;
3211 if (PostOffset <= kMaxOffset && PostOffset >= kMinOffset &&
3212 PostOffset % 16 == 0) {
3213 *TotalOffset = Offset;
3214 return true;
3215 }
3216 }
3217 return false;
3218}
3219
3220void mergeMemRefs(const SmallVectorImpl<TagStoreInstr> &TSE,
3222 MemRefs.clear();
3223 for (auto &TS : TSE) {
3224 MachineInstr *MI = TS.MI;
3225 // An instruction without memory operands may access anything. Be
3226 // conservative and return an empty list.
3227 if (MI->memoperands_empty()) {
3228 MemRefs.clear();
3229 return;
3230 }
3231 MemRefs.append(MI->memoperands_begin(), MI->memoperands_end());
3232 }
3233}
3234
3235void TagStoreEdit::emitCode(MachineBasicBlock::iterator &InsertI,
3236 const AArch64FrameLowering *TFI,
3237 bool TryMergeSPUpdate) {
3238 if (TagStores.empty())
3239 return;
3240 TagStoreInstr &FirstTagStore = TagStores[0];
3241 TagStoreInstr &LastTagStore = TagStores[TagStores.size() - 1];
3242 Size = LastTagStore.Offset - FirstTagStore.Offset + LastTagStore.Size;
3243 DL = TagStores[0].MI->getDebugLoc();
3244
3245 Register Reg;
3246 FrameRegOffset = TFI->resolveFrameOffsetReference(
3247 *MF, FirstTagStore.Offset, false /*isFixed*/,
3248 TargetStackID::Default /*StackID*/, Reg,
3249 /*PreferFP=*/false, /*ForSimm=*/true);
3250 FrameReg = Reg;
3251 FrameRegUpdate = std::nullopt;
3252
3253 mergeMemRefs(TagStores, CombinedMemRefs);
3254
3255 LLVM_DEBUG({
3256 dbgs() << "Replacing adjacent STG instructions:\n";
3257 for (const auto &Instr : TagStores) {
3258 dbgs() << " " << *Instr.MI;
3259 }
3260 });
3261
3262 // Size threshold where a loop becomes shorter than a linear sequence of
3263 // tagging instructions.
3264 const int kSetTagLoopThreshold = 176;
3265 if (Size < kSetTagLoopThreshold) {
3266 if (TagStores.size() < 2)
3267 return;
3268 emitUnrolled(InsertI);
3269 } else {
3270 MachineInstr *UpdateInstr = nullptr;
3271 int64_t TotalOffset = 0;
3272 if (TryMergeSPUpdate) {
3273 // See if we can merge base register update into the STGloop.
3274 // This is done in AArch64LoadStoreOptimizer for "normal" stores,
3275 // but STGloop is way too unusual for that, and also it only
3276 // realistically happens in function epilogue. Also, STGloop is expanded
3277 // before that pass.
3278 if (InsertI != MBB->end() &&
3279 canMergeRegUpdate(InsertI, FrameReg, FrameRegOffset.getFixed() + Size,
3280 &TotalOffset)) {
3281 UpdateInstr = &*InsertI++;
3282 LLVM_DEBUG(dbgs() << "Folding SP update into loop:\n "
3283 << *UpdateInstr);
3284 }
3285 }
3286
3287 if (!UpdateInstr && TagStores.size() < 2)
3288 return;
3289
3290 if (UpdateInstr) {
3291 FrameRegUpdate = TotalOffset;
3292 FrameRegUpdateFlags = UpdateInstr->getFlags();
3293 }
3294 emitLoop(InsertI);
3295 if (UpdateInstr)
3296 UpdateInstr->eraseFromParent();
3297 }
3298
3299 for (auto &TS : TagStores)
3300 TS.MI->eraseFromParent();
3301}
3302
3303bool isMergeableStackTaggingInstruction(MachineInstr &MI, int64_t &Offset,
3304 int64_t &Size, bool &ZeroData) {
3305 MachineFunction &MF = *MI.getParent()->getParent();
3306 const MachineFrameInfo &MFI = MF.getFrameInfo();
3307
3308 unsigned Opcode = MI.getOpcode();
3309 ZeroData = (Opcode == AArch64::STZGloop || Opcode == AArch64::STZGi ||
3310 Opcode == AArch64::STZ2Gi);
3311
3312 if (Opcode == AArch64::STGloop || Opcode == AArch64::STZGloop) {
3313 if (!MI.getOperand(0).isDead() || !MI.getOperand(1).isDead())
3314 return false;
3315 if (!MI.getOperand(2).isImm() || !MI.getOperand(3).isFI())
3316 return false;
3317 Offset = MFI.getObjectOffset(MI.getOperand(3).getIndex());
3318 Size = MI.getOperand(2).getImm();
3319 return true;
3320 }
3321
3322 if (Opcode == AArch64::STGi || Opcode == AArch64::STZGi)
3323 Size = 16;
3324 else if (Opcode == AArch64::ST2Gi || Opcode == AArch64::STZ2Gi)
3325 Size = 32;
3326 else
3327 return false;
3328
3329 if (MI.getOperand(0).getReg() != AArch64::SP || !MI.getOperand(1).isFI())
3330 return false;
3331
3332 Offset = MFI.getObjectOffset(MI.getOperand(1).getIndex()) +
3333 16 * MI.getOperand(2).getImm();
3334 return true;
3335}
3336
3337// Detect a run of memory tagging instructions for adjacent stack frame slots,
3338// and replace them with a shorter instruction sequence:
3339// * replace STG + STG with ST2G
3340// * replace STGloop + STGloop with STGloop
3341// This code needs to run when stack slot offsets are already known, but before
3342// FrameIndex operands in STG instructions are eliminated.
3344 const AArch64FrameLowering *TFI,
3345 RegScavenger *RS) {
3346 bool FirstZeroData;
3347 int64_t Size, Offset;
3348 MachineInstr &MI = *II;
3349 MachineBasicBlock *MBB = MI.getParent();
3351 if (&MI == &MBB->instr_back())
3352 return II;
3353 if (!isMergeableStackTaggingInstruction(MI, Offset, Size, FirstZeroData))
3354 return II;
3355
3357 Instrs.emplace_back(&MI, Offset, Size);
3358
3359 constexpr int kScanLimit = 10;
3360 int Count = 0;
3362 NextI != E && Count < kScanLimit; ++NextI) {
3363 MachineInstr &MI = *NextI;
3364 bool ZeroData;
3365 int64_t Size, Offset;
3366 // Collect instructions that update memory tags with a FrameIndex operand
3367 // and (when applicable) constant size, and whose output registers are dead
3368 // (the latter is almost always the case in practice). Since these
3369 // instructions effectively have no inputs or outputs, we are free to skip
3370 // any non-aliasing instructions in between without tracking used registers.
3371 if (isMergeableStackTaggingInstruction(MI, Offset, Size, ZeroData)) {
3372 if (ZeroData != FirstZeroData)
3373 break;
3374 Instrs.emplace_back(&MI, Offset, Size);
3375 continue;
3376 }
3377
3378 // Only count non-transient, non-tagging instructions toward the scan
3379 // limit.
3380 if (!MI.isTransient())
3381 ++Count;
3382
3383 // Just in case, stop before the epilogue code starts.
3384 if (MI.getFlag(MachineInstr::FrameSetup) ||
3386 break;
3387
3388 // Reject anything that may alias the collected instructions.
3389 if (MI.mayLoadOrStore() || MI.hasUnmodeledSideEffects() || MI.isCall())
3390 break;
3391 }
3392
3393 // New code will be inserted after the last tagging instruction we've found.
3394 MachineBasicBlock::iterator InsertI = Instrs.back().MI;
3395
3396 // All the gathered stack tag instructions are merged and placed after
3397 // last tag store in the list. The check should be made if the nzcv
3398 // flag is live at the point where we are trying to insert. Otherwise
3399 // the nzcv flag might get clobbered if any stg loops are present.
3400
3401 // FIXME : This approach of bailing out from merge is conservative in
3402 // some ways like even if stg loops are not present after merge the
3403 // insert list, this liveness check is done (which is not needed).
3405 LiveRegs.addLiveOuts(*MBB);
3406 for (auto I = MBB->rbegin();; ++I) {
3407 MachineInstr &MI = *I;
3408 if (MI == InsertI)
3409 break;
3410 LiveRegs.stepBackward(*I);
3411 }
3412 InsertI++;
3413 if (LiveRegs.contains(AArch64::NZCV))
3414 return InsertI;
3415
3416 llvm::stable_sort(Instrs,
3417 [](const TagStoreInstr &Left, const TagStoreInstr &Right) {
3418 return Left.Offset < Right.Offset;
3419 });
3420
3421 // Make sure that we don't have any overlapping stores.
3422 int64_t CurOffset = Instrs[0].Offset;
3423 for (auto &Instr : Instrs) {
3424 if (CurOffset > Instr.Offset)
3425 return NextI;
3426 CurOffset = Instr.Offset + Instr.Size;
3427 }
3428
3429 // Find contiguous runs of tagged memory and emit shorter instruction
3430 // sequences for them when possible.
3431 TagStoreEdit TSE(MBB, FirstZeroData);
3432 std::optional<int64_t> EndOffset;
3433 for (auto &Instr : Instrs) {
3434 if (EndOffset && *EndOffset != Instr.Offset) {
3435 // Found a gap.
3436 TSE.emitCode(InsertI, TFI, /*TryMergeSPUpdate = */ false);
3437 TSE.clear();
3438 }
3439
3440 TSE.addInstruction(Instr);
3441 EndOffset = Instr.Offset + Instr.Size;
3442 }
3443
3444 const MachineFunction *MF = MBB->getParent();
3445 // Multiple FP/SP updates in a loop cannot be described by CFI instructions.
3446 TSE.emitCode(
3447 InsertI, TFI, /*TryMergeSPUpdate = */
3449
3450 return InsertI;
3451}
3452} // namespace
3453
3455 MachineFunction &MF, RegScavenger *RS = nullptr) const {
3456 for (auto &BB : MF)
3457 for (MachineBasicBlock::iterator II = BB.begin(); II != BB.end();) {
3459 II = tryMergeAdjacentSTG(II, this, RS);
3460 }
3461
3462 // By the time this method is called, most of the prologue/epilogue code is
3463 // already emitted, whether its location was affected by the shrink-wrapping
3464 // optimization or not.
3465 if (!MF.getFunction().hasFnAttribute(Attribute::Naked) &&
3466 shouldSignReturnAddressEverywhere(MF))
3468}
3469
3470/// For Win64 AArch64 EH, the offset to the Unwind object is from the SP
3471/// before the update. This is easily retrieved as it is exactly the offset
3472/// that is set in processFunctionBeforeFrameFinalized.
3474 const MachineFunction &MF, int FI, Register &FrameReg,
3475 bool IgnoreSPUpdates) const {
3476 const MachineFrameInfo &MFI = MF.getFrameInfo();
3477 if (IgnoreSPUpdates) {
3478 LLVM_DEBUG(dbgs() << "Offset from the SP for " << FI << " is "
3479 << MFI.getObjectOffset(FI) << "\n");
3480 FrameReg = AArch64::SP;
3481 return StackOffset::getFixed(MFI.getObjectOffset(FI));
3482 }
3483
3484 // Go to common code if we cannot provide sp + offset.
3485 if (MFI.hasVarSizedObjects() ||
3488 return getFrameIndexReference(MF, FI, FrameReg);
3489
3490 FrameReg = AArch64::SP;
3491 return getStackOffset(MF, MFI.getObjectOffset(FI));
3492}
3493
3494/// The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve
3495/// the parent's frame pointer
3497 const MachineFunction &MF) const {
3498 return 0;
3499}
3500
3501/// Funclets only need to account for space for the callee saved registers,
3502/// as the locals are accounted for in the parent's stack frame.
3504 const MachineFunction &MF) const {
3505 // This is the size of the pushed CSRs.
3506 unsigned CSSize =
3507 MF.getInfo<AArch64FunctionInfo>()->getCalleeSavedStackSize();
3508 // This is the amount of stack a funclet needs to allocate.
3509 return alignTo(CSSize + MF.getFrameInfo().getMaxCallFrameSize(),
3510 getStackAlign());
3511}
3512
3513namespace {
3514struct FrameObject {
3515 bool IsValid = false;
3516 // Index of the object in MFI.
3517 int ObjectIndex = 0;
3518 // Group ID this object belongs to.
3519 int GroupIndex = -1;
3520 // This object should be placed first (closest to SP).
3521 bool ObjectFirst = false;
3522 // This object's group (which always contains the object with
3523 // ObjectFirst==true) should be placed first.
3524 bool GroupFirst = false;
3525
3526 // Used to distinguish between FP and GPR accesses. The values are decided so
3527 // that they sort FPR < Hazard < GPR and they can be or'd together.
3528 unsigned Accesses = 0;
3529 enum { AccessFPR = 1, AccessHazard = 2, AccessGPR = 4 };
3530};
3531
3532class GroupBuilder {
3533 SmallVector<int, 8> CurrentMembers;
3534 int NextGroupIndex = 0;
3535 std::vector<FrameObject> &Objects;
3536
3537public:
3538 GroupBuilder(std::vector<FrameObject> &Objects) : Objects(Objects) {}
3539 void AddMember(int Index) { CurrentMembers.push_back(Index); }
3540 void EndCurrentGroup() {
3541 if (CurrentMembers.size() > 1) {
3542 // Create a new group with the current member list. This might remove them
3543 // from their pre-existing groups. That's OK, dealing with overlapping
3544 // groups is too hard and unlikely to make a difference.
3545 LLVM_DEBUG(dbgs() << "group:");
3546 for (int Index : CurrentMembers) {
3547 Objects[Index].GroupIndex = NextGroupIndex;
3548 LLVM_DEBUG(dbgs() << " " << Index);
3549 }
3550 LLVM_DEBUG(dbgs() << "\n");
3551 NextGroupIndex++;
3552 }
3553 CurrentMembers.clear();
3554 }
3555};
3556
3557bool FrameObjectCompare(const FrameObject &A, const FrameObject &B) {
3558 // Objects at a lower index are closer to FP; objects at a higher index are
3559 // closer to SP.
3560 //
3561 // For consistency in our comparison, all invalid objects are placed
3562 // at the end. This also allows us to stop walking when we hit the
3563 // first invalid item after it's all sorted.
3564 //
3565 // If we want to include a stack hazard region, order FPR accesses < the
3566 // hazard object < GPRs accesses in order to create a separation between the
3567 // two. For the Accesses field 1 = FPR, 2 = Hazard Object, 4 = GPR.
3568 //
3569 // Otherwise the "first" object goes first (closest to SP), followed by the
3570 // members of the "first" group.
3571 //
3572 // The rest are sorted by the group index to keep the groups together.
3573 // Higher numbered groups are more likely to be around longer (i.e. untagged
3574 // in the function epilogue and not at some earlier point). Place them closer
3575 // to SP.
3576 //
3577 // If all else equal, sort by the object index to keep the objects in the
3578 // original order.
3579 return std::make_tuple(!A.IsValid, A.Accesses, A.ObjectFirst, A.GroupFirst,
3580 A.GroupIndex, A.ObjectIndex) <
3581 std::make_tuple(!B.IsValid, B.Accesses, B.ObjectFirst, B.GroupFirst,
3582 B.GroupIndex, B.ObjectIndex);
3583}
3584} // namespace
3585
3587 const MachineFunction &MF, SmallVectorImpl<int> &ObjectsToAllocate) const {
3589
3590 if ((!OrderFrameObjects && !AFI.hasSplitSVEObjects()) ||
3591 ObjectsToAllocate.empty())
3592 return;
3593
3594 const MachineFrameInfo &MFI = MF.getFrameInfo();
3595 std::vector<FrameObject> FrameObjects(MFI.getObjectIndexEnd());
3596 for (auto &Obj : ObjectsToAllocate) {
3597 FrameObjects[Obj].IsValid = true;
3598 FrameObjects[Obj].ObjectIndex = Obj;
3599 }
3600
3601 // Identify FPR vs GPR slots for hazards, and stack slots that are tagged at
3602 // the same time.
3603 GroupBuilder GB(FrameObjects);
3604 for (auto &MBB : MF) {
3605 for (auto &MI : MBB) {
3606 if (MI.isDebugInstr())
3607 continue;
3608
3609 if (AFI.hasStackHazardSlotIndex()) {
3610 std::optional<int> FI = getLdStFrameID(MI, MFI);
3611 if (FI && *FI >= 0 && *FI < (int)FrameObjects.size()) {
3612 if (MFI.getStackID(*FI) == TargetStackID::ScalableVector ||
3614 FrameObjects[*FI].Accesses |= FrameObject::AccessFPR;
3615 else
3616 FrameObjects[*FI].Accesses |= FrameObject::AccessGPR;
3617 }
3618 }
3619
3620 int OpIndex;
3621 switch (MI.getOpcode()) {
3622 case AArch64::STGloop:
3623 case AArch64::STZGloop:
3624 OpIndex = 3;
3625 break;
3626 case AArch64::STGi:
3627 case AArch64::STZGi:
3628 case AArch64::ST2Gi:
3629 case AArch64::STZ2Gi:
3630 OpIndex = 1;
3631 break;
3632 default:
3633 OpIndex = -1;
3634 }
3635
3636 int TaggedFI = -1;
3637 if (OpIndex >= 0) {
3638 const MachineOperand &MO = MI.getOperand(OpIndex);
3639 if (MO.isFI()) {
3640 int FI = MO.getIndex();
3641 if (FI >= 0 && FI < MFI.getObjectIndexEnd() &&
3642 FrameObjects[FI].IsValid)
3643 TaggedFI = FI;
3644 }
3645 }
3646
3647 // If this is a stack tagging instruction for a slot that is not part of a
3648 // group yet, either start a new group or add it to the current one.
3649 if (TaggedFI >= 0)
3650 GB.AddMember(TaggedFI);
3651 else
3652 GB.EndCurrentGroup();
3653 }
3654 // Groups should never span multiple basic blocks.
3655 GB.EndCurrentGroup();
3656 }
3657
3658 if (AFI.hasStackHazardSlotIndex()) {
3659 FrameObjects[AFI.getStackHazardSlotIndex()].Accesses =
3660 FrameObject::AccessHazard;
3661 // If a stack object is unknown or both GPR and FPR, sort it into GPR.
3662 for (auto &Obj : FrameObjects)
3663 if (!Obj.Accesses ||
3664 Obj.Accesses == (FrameObject::AccessGPR | FrameObject::AccessFPR))
3665 Obj.Accesses = FrameObject::AccessGPR;
3666 }
3667
3668 // If the function's tagged base pointer is pinned to a stack slot, we want to
3669 // put that slot first when possible. This will likely place it at SP + 0,
3670 // and save one instruction when generating the base pointer because IRG does
3671 // not allow an immediate offset.
3672 std::optional<int> TBPI = AFI.getTaggedBasePointerIndex();
3673 if (TBPI) {
3674 FrameObjects[*TBPI].ObjectFirst = true;
3675 FrameObjects[*TBPI].GroupFirst = true;
3676 int FirstGroupIndex = FrameObjects[*TBPI].GroupIndex;
3677 if (FirstGroupIndex >= 0)
3678 for (FrameObject &Object : FrameObjects)
3679 if (Object.GroupIndex == FirstGroupIndex)
3680 Object.GroupFirst = true;
3681 }
3682
3683 llvm::stable_sort(FrameObjects, FrameObjectCompare);
3684
3685 int i = 0;
3686 for (auto &Obj : FrameObjects) {
3687 // All invalid items are sorted at the end, so it's safe to stop.
3688 if (!Obj.IsValid)
3689 break;
3690 ObjectsToAllocate[i++] = Obj.ObjectIndex;
3691 }
3692
3693 LLVM_DEBUG({
3694 dbgs() << "Final frame order:\n";
3695 for (auto &Obj : FrameObjects) {
3696 if (!Obj.IsValid)
3697 break;
3698 dbgs() << " " << Obj.ObjectIndex << ": group " << Obj.GroupIndex;
3699 if (Obj.ObjectFirst)
3700 dbgs() << ", first";
3701 if (Obj.GroupFirst)
3702 dbgs() << ", group-first";
3703 dbgs() << "\n";
3704 }
3705 });
3706}
3707
3708/// Emit a loop to decrement SP until it is equal to TargetReg, with probes at
3709/// least every ProbeSize bytes. Returns an iterator of the first instruction
3710/// after the loop. The difference between SP and TargetReg must be an exact
3711/// multiple of ProbeSize.
3713AArch64FrameLowering::inlineStackProbeLoopExactMultiple(
3714 MachineBasicBlock::iterator MBBI, int64_t ProbeSize,
3715 Register TargetReg) const {
3716 MachineBasicBlock &MBB = *MBBI->getParent();
3717 MachineFunction &MF = *MBB.getParent();
3718 const AArch64InstrInfo *TII =
3719 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3720 DebugLoc DL = MBB.findDebugLoc(MBBI);
3721
3722 MachineFunction::iterator MBBInsertPoint = std::next(MBB.getIterator());
3723 MachineBasicBlock *LoopMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3724 MF.insert(MBBInsertPoint, LoopMBB);
3725 MachineBasicBlock *ExitMBB = MF.CreateMachineBasicBlock(MBB.getBasicBlock());
3726 MF.insert(MBBInsertPoint, ExitMBB);
3727
3728 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not encodable
3729 // in SUB).
3730 emitFrameOffset(*LoopMBB, LoopMBB->end(), DL, AArch64::SP, AArch64::SP,
3731 StackOffset::getFixed(-ProbeSize), TII,
3733 // STR XZR, [SP]
3734 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::STRXui))
3735 .addReg(AArch64::XZR)
3736 .addReg(AArch64::SP)
3737 .addImm(0)
3739 // CMP SP, TargetReg
3740 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::SUBSXrx64),
3741 AArch64::XZR)
3742 .addReg(AArch64::SP)
3743 .addReg(TargetReg)
3746 // B.CC Loop
3747 BuildMI(*LoopMBB, LoopMBB->end(), DL, TII->get(AArch64::Bcc))
3749 .addMBB(LoopMBB)
3751
3752 LoopMBB->addSuccessor(ExitMBB);
3753 LoopMBB->addSuccessor(LoopMBB);
3754 // Synthesize the exit MBB.
3755 ExitMBB->splice(ExitMBB->end(), &MBB, MBBI, MBB.end());
3757 MBB.addSuccessor(LoopMBB);
3758 // Update liveins.
3759 fullyRecomputeLiveIns({ExitMBB, LoopMBB});
3760
3761 return ExitMBB->begin();
3762}
3763
3764void AArch64FrameLowering::inlineStackProbeFixed(
3765 MachineBasicBlock::iterator MBBI, Register ScratchReg, int64_t FrameSize,
3766 StackOffset CFAOffset) const {
3767 MachineBasicBlock *MBB = MBBI->getParent();
3768 MachineFunction &MF = *MBB->getParent();
3769 const AArch64InstrInfo *TII =
3770 MF.getSubtarget<AArch64Subtarget>().getInstrInfo();
3771 AArch64FunctionInfo *AFI = MF.getInfo<AArch64FunctionInfo>();
3772 bool EmitAsyncCFI = AFI->needsAsyncDwarfUnwindInfo(MF);
3773 bool HasFP = hasFP(MF);
3774
3775 DebugLoc DL;
3776 int64_t ProbeSize = MF.getInfo<AArch64FunctionInfo>()->getStackProbeSize();
3777 int64_t NumBlocks = FrameSize / ProbeSize;
3778 int64_t ResidualSize = FrameSize % ProbeSize;
3779
3780 LLVM_DEBUG(dbgs() << "Stack probing: total " << FrameSize << " bytes, "
3781 << NumBlocks << " blocks of " << ProbeSize
3782 << " bytes, plus " << ResidualSize << " bytes\n");
3783
3784 // Decrement SP by NumBlock * ProbeSize bytes, with either unrolled or
3785 // ordinary loop.
3786 if (NumBlocks <= AArch64::StackProbeMaxLoopUnroll) {
3787 for (int i = 0; i < NumBlocks; ++i) {
3788 // SUB SP, SP, #ProbeSize (or equivalent if ProbeSize is not
3789 // encodable in a SUB).
3790 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3791 StackOffset::getFixed(-ProbeSize), TII,
3792 MachineInstr::FrameSetup, false, false, nullptr,
3793 EmitAsyncCFI && !HasFP, CFAOffset);
3794 CFAOffset += StackOffset::getFixed(ProbeSize);
3795 // STR XZR, [SP]
3796 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
3797 .addReg(AArch64::XZR)
3798 .addReg(AArch64::SP)
3799 .addImm(0)
3801 }
3802 } else if (NumBlocks != 0) {
3803 // SUB ScratchReg, SP, #FrameSize (or equivalent if FrameSize is not
3804 // encodable in ADD). ScrathReg may temporarily become the CFA register.
3805 emitFrameOffset(*MBB, MBBI, DL, ScratchReg, AArch64::SP,
3806 StackOffset::getFixed(-ProbeSize * NumBlocks), TII,
3807 MachineInstr::FrameSetup, false, false, nullptr,
3808 EmitAsyncCFI && !HasFP, CFAOffset);
3809 CFAOffset += StackOffset::getFixed(ProbeSize * NumBlocks);
3810 MBBI = inlineStackProbeLoopExactMultiple(MBBI, ProbeSize, ScratchReg);
3811 MBB = MBBI->getParent();
3812 if (EmitAsyncCFI && !HasFP) {
3813 // Set the CFA register back to SP.
3814 CFIInstBuilder(*MBB, MBBI, MachineInstr::FrameSetup)
3815 .buildDefCFARegister(AArch64::SP);
3816 }
3817 }
3818
3819 if (ResidualSize != 0) {
3820 // SUB SP, SP, #ResidualSize (or equivalent if ResidualSize is not encodable
3821 // in SUB).
3822 emitFrameOffset(*MBB, MBBI, DL, AArch64::SP, AArch64::SP,
3823 StackOffset::getFixed(-ResidualSize), TII,
3824 MachineInstr::FrameSetup, false, false, nullptr,
3825 EmitAsyncCFI && !HasFP, CFAOffset);
3826 if (ResidualSize > AArch64::StackProbeMaxUnprobedStack) {
3827 // STR XZR, [SP]
3828 BuildMI(*MBB, MBBI, DL, TII->get(AArch64::STRXui))
3829 .addReg(AArch64::XZR)
3830 .addReg(AArch64::SP)
3831 .addImm(0)
3833 }
3834 }
3835}
3836
3837void AArch64FrameLowering::inlineStackProbe(MachineFunction &MF,
3838 MachineBasicBlock &MBB) const {
3839 // Get the instructions that need to be replaced. We emit at most two of
3840 // these. Remember them in order to avoid complications coming from the need
3841 // to traverse the block while potentially creating more blocks.
3842 SmallVector<MachineInstr *, 4> ToReplace;
3843 for (MachineInstr &MI : MBB)
3844 if (MI.getOpcode() == AArch64::PROBED_STACKALLOC ||
3845 MI.getOpcode() == AArch64::PROBED_STACKALLOC_VAR)
3846 ToReplace.push_back(&MI);
3847
3848 for (MachineInstr *MI : ToReplace) {
3849 if (MI->getOpcode() == AArch64::PROBED_STACKALLOC) {
3850 Register ScratchReg = MI->getOperand(0).getReg();
3851 int64_t FrameSize = MI->getOperand(1).getImm();
3852 StackOffset CFAOffset = StackOffset::get(MI->getOperand(2).getImm(),
3853 MI->getOperand(3).getImm());
3854 inlineStackProbeFixed(MI->getIterator(), ScratchReg, FrameSize,
3855 CFAOffset);
3856 } else {
3857 assert(MI->getOpcode() == AArch64::PROBED_STACKALLOC_VAR &&
3858 "Stack probe pseudo-instruction expected");
3859 const AArch64InstrInfo *TII =
3860 MI->getMF()->getSubtarget<AArch64Subtarget>().getInstrInfo();
3861 Register TargetReg = MI->getOperand(0).getReg();
3862 (void)TII->probedStackAlloc(MI->getIterator(), TargetReg, true);
3863 }
3864 MI->eraseFromParent();
3865 }
3866}
3867
3870 NotAccessed = 0, // Stack object not accessed by load/store instructions.
3871 GPR = 1 << 0, // A general purpose register.
3872 PPR = 1 << 1, // A predicate register.
3873 FPR = 1 << 2, // A floating point/Neon/SVE register.
3874 };
3875
3876 int Idx;
3878 int64_t Size;
3879 unsigned AccessTypes;
3880
3882
3883 bool operator<(const StackAccess &Rhs) const {
3884 return std::make_tuple(start(), Idx) <
3885 std::make_tuple(Rhs.start(), Rhs.Idx);
3886 }
3887
3888 bool isCPU() const {
3889 // Predicate register load and store instructions execute on the CPU.
3891 }
3892 bool isSME() const { return AccessTypes & AccessType::FPR; }
3893 bool isMixed() const { return isCPU() && isSME(); }
3894
3895 int64_t start() const { return Offset.getFixed() + Offset.getScalable(); }
3896 int64_t end() const { return start() + Size; }
3897
3898 std::string getTypeString() const {
3899 switch (AccessTypes) {
3900 case AccessType::FPR:
3901 return "FPR";
3902 case AccessType::PPR:
3903 return "PPR";
3904 case AccessType::GPR:
3905 return "GPR";
3907 return "NA";
3908 default:
3909 return "Mixed";
3910 }
3911 }
3912
3913 void print(raw_ostream &OS) const {
3914 OS << getTypeString() << " stack object at [SP"
3915 << (Offset.getFixed() < 0 ? "" : "+") << Offset.getFixed();
3916 if (Offset.getScalable())
3917 OS << (Offset.getScalable() < 0 ? "" : "+") << Offset.getScalable()
3918 << " * vscale";
3919 OS << "]";
3920 }
3921};
3922
3923static inline raw_ostream &operator<<(raw_ostream &OS, const StackAccess &SA) {
3924 SA.print(OS);
3925 return OS;
3926}
3927
3928void AArch64FrameLowering::emitRemarks(
3929 const MachineFunction &MF, MachineOptimizationRemarkEmitter *ORE) const {
3930
3931 auto *AFI = MF.getInfo<AArch64FunctionInfo>();
3933 return;
3934
3935 unsigned StackHazardSize = getStackHazardSize(MF);
3936 const uint64_t HazardSize =
3937 (StackHazardSize) ? StackHazardSize : StackHazardRemarkSize;
3938
3939 if (HazardSize == 0)
3940 return;
3941
3942 const MachineFrameInfo &MFI = MF.getFrameInfo();
3943 // Bail if function has no stack objects.
3944 if (!MFI.hasStackObjects())
3945 return;
3946
3947 std::vector<StackAccess> StackAccesses(MFI.getNumObjects());
3948
3949 size_t NumFPLdSt = 0;
3950 size_t NumNonFPLdSt = 0;
3951
3952 // Collect stack accesses via Load/Store instructions.
3953 for (const MachineBasicBlock &MBB : MF) {
3954 for (const MachineInstr &MI : MBB) {
3955 if (!MI.mayLoadOrStore() || MI.getNumMemOperands() < 1)
3956 continue;
3957 for (MachineMemOperand *MMO : MI.memoperands()) {
3958 std::optional<int> FI = getMMOFrameID(MMO, MFI);
3959 if (FI && !MFI.isDeadObjectIndex(*FI)) {
3960 int FrameIdx = *FI;
3961
3962 size_t ArrIdx = FrameIdx + MFI.getNumFixedObjects();
3963 if (StackAccesses[ArrIdx].AccessTypes == StackAccess::NotAccessed) {
3964 StackAccesses[ArrIdx].Idx = FrameIdx;
3965 StackAccesses[ArrIdx].Offset =
3966 getFrameIndexReferenceFromSP(MF, FrameIdx);
3967 StackAccesses[ArrIdx].Size = MFI.getObjectSize(FrameIdx);
3968 }
3969
3970 unsigned RegTy = StackAccess::AccessType::GPR;
3971 if (MFI.hasScalableStackID(FrameIdx))
3974 RegTy = StackAccess::FPR;
3975
3976 StackAccesses[ArrIdx].AccessTypes |= RegTy;
3977
3978 if (RegTy == StackAccess::FPR)
3979 ++NumFPLdSt;
3980 else
3981 ++NumNonFPLdSt;
3982 }
3983 }
3984 }
3985 }
3986
3987 if (NumFPLdSt == 0 || NumNonFPLdSt == 0)
3988 return;
3989
3990 llvm::sort(StackAccesses);
3991 llvm::erase_if(StackAccesses, [](const StackAccess &S) {
3993 });
3994
3997
3998 if (StackAccesses.front().isMixed())
3999 MixedObjects.push_back(&StackAccesses.front());
4000
4001 for (auto It = StackAccesses.begin(), End = std::prev(StackAccesses.end());
4002 It != End; ++It) {
4003 const auto &First = *It;
4004 const auto &Second = *(It + 1);
4005
4006 if (Second.isMixed())
4007 MixedObjects.push_back(&Second);
4008
4009 if ((First.isSME() && Second.isCPU()) ||
4010 (First.isCPU() && Second.isSME())) {
4011 uint64_t Distance = static_cast<uint64_t>(Second.start() - First.end());
4012 if (Distance < HazardSize)
4013 HazardPairs.emplace_back(&First, &Second);
4014 }
4015 }
4016
4017 auto EmitRemark = [&](llvm::StringRef Str) {
4018 ORE->emit([&]() {
4019 auto R = MachineOptimizationRemarkAnalysis(
4020 "sme", "StackHazard", MF.getFunction().getSubprogram(), &MF.front());
4021 return R << formatv("stack hazard in '{0}': ", MF.getName()).str() << Str;
4022 });
4023 };
4024
4025 for (const auto &P : HazardPairs)
4026 EmitRemark(formatv("{0} is too close to {1}", *P.first, *P.second).str());
4027
4028 for (const auto *Obj : MixedObjects)
4029 EmitRemark(
4030 formatv("{0} accessed by both GP and FP instructions", *Obj).str());
4031}
unsigned const MachineRegisterInfo * MRI
static void getLiveRegsForEntryMBB(LivePhysRegs &LiveRegs, const MachineBasicBlock &MBB)
static const unsigned DefaultSafeSPDisplacement
This is the biggest offset to the stack pointer we can encode in aarch64 instructions (without using ...
static bool produceCompactUnwindFrame(const AArch64FrameLowering &, MachineFunction &MF)
static cl::opt< bool > StackTaggingMergeSetTag("stack-tagging-merge-settag", cl::desc("merge settag instruction in function epilog"), cl::init(true), cl::Hidden)
bool enableMultiVectorSpillFill(const AArch64Subtarget &Subtarget, MachineFunction &MF)
static std::optional< int > getLdStFrameID(const MachineInstr &MI, const MachineFrameInfo &MFI)
static cl::opt< bool > SplitSVEObjects("aarch64-split-sve-objects", cl::desc("Split allocation of ZPR & PPR objects"), cl::init(true), cl::Hidden)
static cl::opt< bool > StackHazardInNonStreaming("aarch64-stack-hazard-in-non-streaming", cl::init(false), cl::Hidden)
void computeCalleeSaveRegisterPairs(const AArch64FrameLowering &AFL, MachineFunction &MF, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI, SmallVectorImpl< RegPairInfo > &RegPairs, bool NeedsFrameRecord)
static cl::opt< bool > OrderFrameObjects("aarch64-order-frame-objects", cl::desc("sort stack allocations"), cl::init(true), cl::Hidden)
static bool invalidateWindowsRegisterPairing(unsigned Reg1, unsigned Reg2, bool NeedsWinCFI, bool IsFirst, const TargetRegisterInfo *TRI)
static cl::opt< bool > DisableMultiVectorSpillFill("aarch64-disable-multivector-spill-fill", cl::desc("Disable use of LD/ST pairs for SME2 or SVE2p1"), cl::init(false), cl::Hidden)
static cl::opt< bool > EnableRedZone("aarch64-redzone", cl::desc("enable use of redzone on AArch64"), cl::init(false), cl::Hidden)
cl::opt< bool > EnableHomogeneousPrologEpilog("homogeneous-prolog-epilog", cl::Hidden, cl::desc("Emit homogeneous prologue and epilogue for the size " "optimization (default = off)"))
static bool isLikelyToHaveSVEStack(const AArch64FrameLowering &AFL, const MachineFunction &MF)
static bool invalidateRegisterPairing(unsigned Reg1, unsigned Reg2, bool UsesWinAAPCS, bool NeedsWinCFI, bool NeedsFrameRecord, bool IsFirst, const TargetRegisterInfo *TRI)
Returns true if Reg1 and Reg2 cannot be paired using a ldp/stp instruction.
static unsigned getPrologueDeath(MachineFunction &MF, unsigned Reg)
static SVEStackSizes determineSVEStackSizes(MachineFunction &MF, AssignObjectOffsets AssignOffsets)
Process all the SVE stack objects and the SVE stack size and offsets for each object.
static bool isTargetWindows(const MachineFunction &MF)
static unsigned estimateRSStackSizeLimit(MachineFunction &MF)
Look at each instruction that references stack frames and return the stack size limit beyond which so...
static bool getSVECalleeSaveSlotRange(const MachineFrameInfo &MFI, int &Min, int &Max)
returns true if there are any SVE callee saves.
static cl::opt< unsigned > StackHazardRemarkSize("aarch64-stack-hazard-remark-size", cl::init(0), cl::Hidden)
static MCRegister getRegisterOrZero(MCRegister Reg, bool HasSVE)
static unsigned getStackHazardSize(const MachineFunction &MF)
MCRegister findFreePredicateReg(BitVector &SavedRegs)
static bool isPPRAccess(const MachineInstr &MI)
static std::optional< int > getMMOFrameID(MachineMemOperand *MMO, const MachineFrameInfo &MFI)
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
This file contains the declaration of the AArch64PrologueEmitter and AArch64EpilogueEmitter classes,...
static const int kSetTagLoopThreshold
MachineBasicBlock & MBB
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
MachineBasicBlock MachineBasicBlock::iterator MBBI
This file contains the simple types necessary to represent the attributes associated with functions a...
#define CASE(ATTRNAME, AANAME,...)
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
DXIL Forward Handle Accesses
const HexagonInstrInfo * TII
IRTranslator LLVM IR MI
static std::string getTypeString(Type *T)
Definition LLParser.cpp:67
This file implements the LivePhysRegs utility for tracking liveness of physical registers.
#define F(x, y, z)
Definition MD5.cpp:55
#define I(x, y, z)
Definition MD5.cpp:58
#define H(x, y, z)
Definition MD5.cpp:57
Register Reg
Register const TargetRegisterInfo * TRI
Promote Memory to Register
Definition Mem2Reg.cpp:110
uint64_t IntrinsicInst * II
#define P(N)
This file declares the machine register scavenger class.
unsigned OpIndex
static bool contains(SmallPtrSetImpl< ConstantExpr * > &Cache, ConstantExpr *Expr, Constant *C)
Definition Value.cpp:480
This file defines the make_scope_exit function, which executes user-defined cleanup logic at scope ex...
This file defines the SmallVector class.
#define LLVM_DEBUG(...)
Definition Debug.h:114
StackOffset getSVEStackSize(const MachineFunction &MF) const
Returns the size of the entire SVE stackframe (PPRs + ZPRs).
StackOffset getZPRStackSize(const MachineFunction &MF) const
Returns the size of the entire ZPR stackframe (calleesaves + spills).
void processFunctionBeforeFrameIndicesReplaced(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameIndicesReplaced - This method is called immediately before MO_FrameIndex op...
MachineBasicBlock::iterator eliminateCallFramePseudoInstr(MachineFunction &MF, MachineBasicBlock &MBB, MachineBasicBlock::iterator I) const override
This method is called during prolog/epilog code insertion to eliminate call frame setup and destroy p...
bool canUseAsPrologue(const MachineBasicBlock &MBB) const override
Check whether or not the given MBB can be used as a prologue for the target.
bool enableStackSlotScavenging(const MachineFunction &MF) const override
Returns true if the stack slot holes in the fixed and callee-save stack area should be used when allo...
bool spillCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, ArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
spillCalleeSavedRegisters - Issues instruction(s) to spill all callee saved registers and returns tru...
bool restoreCalleeSavedRegisters(MachineBasicBlock &MBB, MachineBasicBlock::iterator MI, MutableArrayRef< CalleeSavedInfo > CSI, const TargetRegisterInfo *TRI) const override
restoreCalleeSavedRegisters - Issues instruction(s) to restore all callee saved registers and returns...
bool enableFullCFIFixup(const MachineFunction &MF) const override
enableFullCFIFixup - Returns true if we may need to fix the unwind information such that it is accura...
StackOffset getFrameIndexReferenceFromSP(const MachineFunction &MF, int FI) const override
getFrameIndexReferenceFromSP - This method returns the offset from the stack pointer to the slot of t...
bool enableCFIFixup(const MachineFunction &MF) const override
Returns true if we may need to fix the unwind information for the function.
StackOffset getNonLocalFrameIndexReference(const MachineFunction &MF, int FI) const override
getNonLocalFrameIndexReference - This method returns the offset used to reference a frame index locat...
TargetStackID::Value getStackIDForScalableVectors() const override
Returns the StackID that scalable vectors should be associated with.
bool hasFPImpl(const MachineFunction &MF) const override
hasFPImpl - Return true if the specified function should have a dedicated frame pointer register.
void emitPrologue(MachineFunction &MF, MachineBasicBlock &MBB) const override
emitProlog/emitEpilog - These methods insert prolog and epilog code into the function.
void resetCFIToInitialState(MachineBasicBlock &MBB) const override
Emit CFI instructions that recreate the state of the unwind information upon function entry.
bool hasReservedCallFrame(const MachineFunction &MF) const override
hasReservedCallFrame - Under normal circumstances, when a frame pointer is not required,...
StackOffset resolveFrameOffsetReference(const MachineFunction &MF, int64_t ObjectOffset, bool isFixed, TargetStackID::Value StackID, Register &FrameReg, bool PreferFP, bool ForSimm) const
bool canUseRedZone(const MachineFunction &MF) const
Can this function use the red zone for local allocations.
bool needsWinCFI(const MachineFunction &MF) const
bool isFPReserved(const MachineFunction &MF) const
Should the Frame Pointer be reserved for the current function?
void processFunctionBeforeFrameFinalized(MachineFunction &MF, RegScavenger *RS) const override
processFunctionBeforeFrameFinalized - This method is called immediately before the specified function...
int getSEHFrameIndexOffset(const MachineFunction &MF, int FI) const
unsigned getWinEHFuncletFrameSize(const MachineFunction &MF) const
Funclets only need to account for space for the callee saved registers, as the locals are accounted f...
void orderFrameObjects(const MachineFunction &MF, SmallVectorImpl< int > &ObjectsToAllocate) const override
Order the symbols in the local stack frame.
void emitEpilogue(MachineFunction &MF, MachineBasicBlock &MBB) const override
StackOffset getPPRStackSize(const MachineFunction &MF) const
Returns the size of the entire PPR stackframe (calleesaves + spills + hazard padding).
void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS) const override
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
StackOffset getFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg) const override
getFrameIndexReference - Provide a base+offset reference to an FI slot for debug info.
bool assignCalleeSavedSpillSlots(MachineFunction &MF, const TargetRegisterInfo *TRI, std::vector< CalleeSavedInfo > &CSI, unsigned &MinCSFrameIndex, unsigned &MaxCSFrameIndex) const override
assignCalleeSavedSpillSlots - Allows target to override spill slot assignment logic.
StackOffset getFrameIndexReferencePreferSP(const MachineFunction &MF, int FI, Register &FrameReg, bool IgnoreSPUpdates) const override
For Win64 AArch64 EH, the offset to the Unwind object is from the SP before the update.
StackOffset resolveFrameIndexReference(const MachineFunction &MF, int FI, Register &FrameReg, bool PreferFP, bool ForSimm) const
unsigned getWinEHParentFrameOffset(const MachineFunction &MF) const override
The parent frame offset (aka dispFrame) is only used on X86_64 to retrieve the parent's frame pointer...
bool requiresSaveVG(const MachineFunction &MF) const
void emitPacRetPlusLeafHardening(MachineFunction &MF) const
Harden the entire function with pac-ret.
AArch64FunctionInfo - This class is derived from MachineFunctionInfo and contains private AArch64-spe...
unsigned getCalleeSavedStackSize(const MachineFrameInfo &MFI) const
void setCalleeSaveBaseToFrameRecordOffset(int Offset)
bool shouldSignReturnAddress(const MachineFunction &MF) const
void setStackSizeSVE(uint64_t ZPR, uint64_t PPR)
std::optional< int > getTaggedBasePointerIndex() const
bool needsDwarfUnwindInfo(const MachineFunction &MF) const
void setSVECalleeSavedStackSize(unsigned ZPR, unsigned PPR)
bool needsAsyncDwarfUnwindInfo(const MachineFunction &MF) const
static bool isTailCallReturnInst(const MachineInstr &MI)
Returns true if MI is one of the TCRETURN* instructions.
static bool isFpOrNEON(Register Reg)
Returns whether the physical register is FP or NEON.
const AArch64RegisterInfo * getRegisterInfo() const override
bool isNeonAvailable() const
Returns true if the target has NEON and the function at runtime is known to have NEON enabled (e....
const AArch64InstrInfo * getInstrInfo() const override
const AArch64TargetLowering * getTargetLowering() const override
bool isSVEorStreamingSVEAvailable() const
Returns true if the target has access to either the full range of SVE instructions,...
bool isStreaming() const
Returns true if the function has a streaming body.
bool hasInlineStackProbe(const MachineFunction &MF) const override
True if stack clash protection is enabled for this functions.
unsigned getRedZoneSize(const Function &F) const
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:41
size_t size() const
size - Get the array size.
Definition ArrayRef.h:143
bool empty() const
empty - Check if the array is empty.
Definition ArrayRef.h:138
bool test(unsigned Idx) const
Definition BitVector.h:480
BitVector & reset()
Definition BitVector.h:411
size_type count() const
count - Returns the number of bits which are set.
Definition BitVector.h:181
BitVector & set()
Definition BitVector.h:370
iterator_range< const_set_bits_iterator > set_bits() const
Definition BitVector.h:159
size_type size() const
size - Returns the number of bits in this bitvector.
Definition BitVector.h:178
Helper class for creating CFI instructions and inserting them into MIR.
The CalleeSavedInfo class tracks the information need to locate where a callee saved register is in t...
A debug info location.
Definition DebugLoc.h:124
bool hasMinSize() const
Optimize this function for minimum size (-Oz).
Definition Function.h:703
CallingConv::ID getCallingConv() const
getCallingConv()/setCallingConv(CC) - These method get and set the calling convention of this functio...
Definition Function.h:270
AttributeList getAttributes() const
Return the attribute list for this Function.
Definition Function.h:352
bool isVarArg() const
isVarArg - Return true if this function takes a variable number of arguments.
Definition Function.h:227
bool hasFnAttribute(Attribute::AttrKind Kind) const
Return true if the function has the attribute.
Definition Function.cpp:727
A set of physical registers with utility functions to track liveness when walking backward/forward th...
bool usesWindowsCFI() const
Definition MCAsmInfo.h:652
Wrapper class representing physical registers. Should be passed by value.
Definition MCRegister.h:33
LLVM_ABI void transferSuccessorsAndUpdatePHIs(MachineBasicBlock *FromMBB)
Transfers all the successors, as in transferSuccessors, and update PHI operands in the successor bloc...
LLVM_ABI iterator getFirstTerminator()
Returns an iterator to the first terminator instruction of this basic block.
LLVM_ABI void addSuccessor(MachineBasicBlock *Succ, BranchProbability Prob=BranchProbability::getUnknown())
Add Succ as a successor of this MachineBasicBlock.
const MachineFunction * getParent() const
Return the MachineFunction containing this basic block.
reverse_iterator rbegin()
iterator insertAfter(iterator I, MachineInstr *MI)
Insert MI into the instruction list after I.
void splice(iterator Where, MachineBasicBlock *Other, iterator From)
Take an instruction from MBB 'Other' at the position From, and insert it into this MBB right before '...
MachineInstrBundleIterator< MachineInstr > iterator
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
LLVM_ABI int CreateFixedObject(uint64_t Size, int64_t SPOffset, bool IsImmutable, bool isAliased=false)
Create a new object at a fixed location on the stack.
bool hasVarSizedObjects() const
This method may be called any time after instruction selection is complete to determine if the stack ...
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
LLVM_ABI int CreateStackObject(uint64_t Size, Align Alignment, bool isSpillSlot, const AllocaInst *Alloca=nullptr, uint8_t ID=0)
Create a new statically sized stack object, returning a nonnegative identifier to represent it.
bool hasCalls() const
Return true if the current function has any function calls.
bool isFrameAddressTaken() const
This method may be called any time after instruction selection is complete to determine if there is a...
void setObjectOffset(int ObjectIdx, int64_t SPOffset)
Set the stack frame offset of the specified object.
uint64_t getMaxCallFrameSize() const
Return the maximum size of a call frame that must be allocated for an outgoing function call.
bool hasPatchPoint() const
This method may be called any time after instruction selection is complete to determine if there is a...
bool hasScalableStackID(int ObjectIdx) const
int getStackProtectorIndex() const
Return the index for the stack protector object.
LLVM_ABI int CreateSpillStackObject(uint64_t Size, Align Alignment)
Create a new statically sized stack object that represents a spill slot, returning a nonnegative iden...
LLVM_ABI uint64_t estimateStackSize(const MachineFunction &MF) const
Estimate and return the size of the stack frame.
void setStackID(int ObjectIdx, uint8_t ID)
bool isCalleeSavedInfoValid() const
Has the callee saved info been calculated yet?
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
bool isMaxCallFrameSizeComputed() const
bool hasStackMap() const
This method may be called any time after instruction selection is complete to determine if there is a...
const std::vector< CalleeSavedInfo > & getCalleeSavedInfo() const
Returns a reference to call saved info vector for the current function.
unsigned getNumObjects() const
Return the number of objects.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
bool hasStackProtectorIndex() const
bool hasStackObjects() const
Return true if there are any stack objects in this function.
uint8_t getStackID(int ObjectIdx) const
unsigned getNumFixedObjects() const
Return the number of fixed objects.
int64_t getObjectOffset(int ObjectIdx) const
Return the assigned stack offset of the specified object from the incoming stack pointer.
int getObjectIndexBegin() const
Return the minimum frame object index.
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
bool isDeadObjectIndex(int ObjectIdx) const
Returns true if the specified index corresponds to a dead object.
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
const TargetSubtargetInfo & getSubtarget() const
getSubtarget - Return the subtarget for which this machine code is being compiled.
MachineMemOperand * getMachineMemOperand(MachinePointerInfo PtrInfo, MachineMemOperand::Flags f, LLT MemTy, Align base_alignment, const AAMDNodes &AAInfo=AAMDNodes(), const MDNode *Ranges=nullptr, SyncScope::ID SSID=SyncScope::System, AtomicOrdering Ordering=AtomicOrdering::NotAtomic, AtomicOrdering FailureOrdering=AtomicOrdering::NotAtomic)
getMachineMemOperand - Allocate a new MachineMemOperand.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
MachineRegisterInfo & getRegInfo()
getRegInfo - Return information about the registers currently in use.
Function & getFunction()
Return the LLVM function that this machine code represents.
BasicBlockListType::iterator iterator
Ty * getInfo()
getInfo - Keep track of various per-function pieces of information for backends that would like to do...
const MachineBasicBlock & front() const
MachineBasicBlock * CreateMachineBasicBlock(const BasicBlock *BB=nullptr, std::optional< UniqueBBID > BBID=std::nullopt)
CreateMachineInstr - Allocate a new MachineInstr.
void insert(iterator MBBI, MachineBasicBlock *MBB)
const TargetMachine & getTarget() const
getTarget - Return the target machine this machine code is compiled with
const MachineInstrBuilder & setMemRefs(ArrayRef< MachineMemOperand * > MMOs) const
const MachineInstrBuilder & addExternalSymbol(const char *FnName, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlag(MachineInstr::MIFlag Flag) const
const MachineInstrBuilder & addImm(int64_t Val) const
Add a new immediate operand.
const MachineInstrBuilder & addFrameIndex(int Idx) const
const MachineInstrBuilder & addRegMask(const uint32_t *Mask) const
const MachineInstrBuilder & addReg(Register RegNo, unsigned flags=0, unsigned SubReg=0) const
Add a new virtual register operand.
const MachineInstrBuilder & addMBB(MachineBasicBlock *MBB, unsigned TargetFlags=0) const
const MachineInstrBuilder & setMIFlags(unsigned Flags) const
const MachineInstrBuilder & addMemOperand(MachineMemOperand *MMO) const
const MachineInstrBuilder & addDef(Register RegNo, unsigned Flags=0, unsigned SubReg=0) const
Add a virtual register definition operand.
Representation of each machine instruction.
void setFlags(unsigned flags)
LLVM_ABI void eraseFromParent()
Unlink 'this' from the containing basic block and delete it.
uint32_t getFlags() const
Return the MI flags bitvector.
A description of a memory reference used in the backend.
const PseudoSourceValue * getPseudoValue() const
@ MOLoad
The memory access reads data.
@ MOStore
The memory access writes data.
const Value * getValue() const
Return the base address of the memory access.
MachineOperand class - Representation of each machine instruction operand.
int64_t getImm() const
bool isFI() const
isFI - Tests if this is a MO_FrameIndex operand.
LLVM_ABI void emit(DiagnosticInfoOptimizationBase &OptDiag)
Emit an optimization remark.
MachineRegisterInfo - Keep track of information for virtual and physical registers,...
LLVM_ABI Register createVirtualRegister(const TargetRegisterClass *RegClass, StringRef Name="")
createVirtualRegister - Create and return a new virtual register in the function with the specified r...
LLVM_ABI bool isLiveIn(Register Reg) const
LLVM_ABI const MCPhysReg * getCalleeSavedRegs() const
Returns list of callee saved registers.
LLVM_ABI bool isPhysRegUsed(MCRegister PhysReg, bool SkipRegMaskTest=false) const
Return true if the specified register is modified or read in this function.
MutableArrayRef - Represent a mutable reference to an array (0 or more elements consecutively in memo...
Definition ArrayRef.h:299
Wrapper class representing virtual and physical registers.
Definition Register.h:20
constexpr bool isValid() const
Definition Register.h:112
SMEAttrs is a utility class to parse the SME ACLE attributes on functions.
bool hasStreamingInterface() const
bool hasNonStreamingInterfaceAndBody() const
bool hasStreamingBody() const
bool insert(const value_type &X)
Insert a new element into the SetVector.
Definition SetVector.h:150
A SetVector that performs no allocations if smaller than a certain size.
Definition SetVector.h:338
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
reference emplace_back(ArgTypes &&... Args)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
StackOffset holds a fixed and a scalable offset in bytes.
Definition TypeSize.h:31
int64_t getFixed() const
Returns the fixed component of the stack.
Definition TypeSize.h:47
int64_t getScalable() const
Returns the scalable component of the stack.
Definition TypeSize.h:50
static StackOffset get(int64_t Fixed, int64_t Scalable)
Definition TypeSize.h:42
static StackOffset getScalable(int64_t Scalable)
Definition TypeSize.h:41
static StackOffset getFixed(int64_t Fixed)
Definition TypeSize.h:40
bool hasFP(const MachineFunction &MF) const
hasFP - Return true if the specified function should have a dedicated frame pointer register.
virtual void determineCalleeSaves(MachineFunction &MF, BitVector &SavedRegs, RegScavenger *RS=nullptr) const
This method determines which of the registers reported by TargetRegisterInfo::getCalleeSavedRegs() sh...
int getOffsetOfLocalArea() const
getOffsetOfLocalArea - This method returns the offset of the local area from the stack pointer on ent...
Align getStackAlign() const
getStackAlignment - This method returns the number of bytes to which the stack pointer must be aligne...
StackDirection getStackGrowthDirection() const
getStackGrowthDirection - Return the direction the stack grows
virtual bool enableCFIFixup(const MachineFunction &MF) const
Returns true if we may need to fix the unwind information for the function.
TargetInstrInfo - Interface to description of machine instruction set.
Primary interface to the complete machine description for the target machine.
TargetOptions Options
const MCAsmInfo * getMCAsmInfo() const
Return target specific asm information.
LLVM_ABI bool DisableFramePointerElim(const MachineFunction &MF) const
DisableFramePointerElim - This returns true if frame pointer elimination optimization should be disab...
TargetRegisterInfo base class - We assume that the target defines a static array of TargetRegisterDes...
bool hasStackRealignment(const MachineFunction &MF) const
True if stack realignment is required and still possible.
virtual const TargetInstrInfo * getInstrInfo() const
virtual const TargetRegisterInfo * getRegisterInfo() const =0
Return the target's register information.
Triple - Helper class for working with autoconf configuration names.
Definition Triple.h:47
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
#define llvm_unreachable(msg)
Marks that the current location is not supposed to be reachable.
static unsigned getShiftValue(unsigned Imm)
getShiftValue - Extract the shift value.
static unsigned getArithExtendImm(AArch64_AM::ShiftExtendType ET, unsigned Imm)
getArithExtendImm - Encode the extend type and shift amount for an arithmetic instruction: imm: 3-bit...
const unsigned StackProbeMaxLoopUnroll
Maximum number of iterations to unroll for a constant size probing loop.
const unsigned StackProbeMaxUnprobedStack
Maximum allowed number of unprobed bytes above SP at an ABI boundary.
constexpr char Align[]
Key for Kernel::Arg::Metadata::mAlign.
constexpr char Attrs[]
Key for Kernel::Metadata::mAttrs.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
@ AArch64_SVE_VectorCall
Used between AArch64 SVE functions.
@ PreserveMost
Used for runtime calls that preserves most registers.
Definition CallingConv.h:63
@ CXX_FAST_TLS
Used for access functions.
Definition CallingConv.h:72
@ GHC
Used by the Glasgow Haskell Compiler (GHC).
Definition CallingConv.h:50
@ PreserveAll
Used for runtime calls that preserves (almost) all registers.
Definition CallingConv.h:66
@ Fast
Attempts to make calls as fast as possible (e.g.
Definition CallingConv.h:41
@ PreserveNone
Used for runtime calls that preserves none general registers.
Definition CallingConv.h:90
@ Win64
The C convention as implemented on Windows/x86-64 and AArch64.
@ SwiftTail
This follows the Swift calling convention in how arguments are passed but guarantees tail calls will ...
Definition CallingConv.h:87
@ C
The default llvm calling convention, compatible with C.
Definition CallingConv.h:34
@ Define
Register definition.
initializer< Ty > init(const Ty &Val)
NodeAddr< InstrNode * > Instr
Definition RDFGraph.h:389
BaseReg
Stack frame base register. Bit 0 of FREInfo.Info.
Definition SFrame.h:77
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:477
void stable_sort(R &&Range)
Definition STLExtras.h:2058
MachineInstrBuilder BuildMI(MachineFunction &MF, const MIMetadata &MIMD, const MCInstrDesc &MCID)
Builder interface. Specify how to create the initial instruction itself.
int isAArch64FrameOffsetLegal(const MachineInstr &MI, StackOffset &Offset, bool *OutUseUnscaledOp=nullptr, unsigned *OutUnscaledOp=nullptr, int64_t *EmittableOffset=nullptr)
Check if the Offset is a valid frame offset for MI.
detail::scope_exit< std::decay_t< Callable > > make_scope_exit(Callable &&F)
Definition ScopeExit.h:59
decltype(auto) dyn_cast(const From &Val)
dyn_cast<X> - Return the argument parameter cast to the specified type.
Definition Casting.h:643
@ AArch64FrameOffsetCannotUpdate
Offset cannot apply.
auto dyn_cast_or_null(const Y &Val)
Definition Casting.h:753
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1732
auto formatv(bool Validate, const char *Fmt, Ts &&...Vals)
auto reverse(ContainerTy &&C)
Definition STLExtras.h:406
void sort(IteratorTy Start, IteratorTy End)
Definition STLExtras.h:1622
LLVM_ABI raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
Definition Debug.cpp:207
void emitFrameOffset(MachineBasicBlock &MBB, MachineBasicBlock::iterator MBBI, const DebugLoc &DL, unsigned DestReg, unsigned SrcReg, StackOffset Offset, const TargetInstrInfo *TII, MachineInstr::MIFlag=MachineInstr::NoFlags, bool SetNZCV=false, bool NeedsWinCFI=false, bool *HasWinCFI=nullptr, bool EmitCFAOffset=false, StackOffset InitialOffset={}, unsigned FrameReg=AArch64::SP)
emitFrameOffset - Emit instructions as needed to set DestReg to SrcReg plus Offset.
LLVM_ABI void report_fatal_error(Error Err, bool gen_crash_diag=true)
Definition Error.cpp:167
FunctionAddr VTableAddr Count
Definition InstrProf.h:139
class LLVM_GSL_OWNER SmallVector
Forward declaration of SmallVector so that calculateSmallVectorDefaultInlinedElements can reference s...
@ LLVM_MARK_AS_BITMASK_ENUM
Definition ModRef.h:37
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
Definition ModRef.h:71
unsigned getDefRegState(bool B)
unsigned getKillRegState(bool B)
uint16_t MCPhysReg
An unsigned integer type large enough to represent all physical registers, but not necessarily virtua...
Definition MCRegister.h:21
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition Alignment.h:144
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
auto find_if(R &&Range, UnaryPredicate P)
Provide wrappers to std::find_if which take ranges instead of having to pass begin/end explicitly.
Definition STLExtras.h:1758
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Definition STLExtras.h:2120
bool is_contained(R &&Range, const E &Element)
Returns true if Element is found in Range.
Definition STLExtras.h:1897
LLVM_ABI const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=MaxLookupSearchDepth)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
void fullyRecomputeLiveIns(ArrayRef< MachineBasicBlock * > MBBs)
Convenience function for recomputing live-in's for a set of MBBs until the computation converges.
LLVM_ABI Printable printReg(Register Reg, const TargetRegisterInfo *TRI=nullptr, unsigned SubIdx=0, const MachineRegisterInfo *MRI=nullptr)
Prints virtual and physical registers with or without a TRI instance.
void swap(llvm::BitVector &LHS, llvm::BitVector &RHS)
Implement std::swap in terms of BitVector swap.
Definition BitVector.h:869
bool operator<(const StackAccess &Rhs) const
void print(raw_ostream &OS) const
int64_t start() const
std::string getTypeString() const
int64_t end() const
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition Alignment.h:39
constexpr uint64_t value() const
This is a hole in the type system and should not be abused.
Definition Alignment.h:77
Pair of physical register and lane mask.
static LLVM_ABI MachinePointerInfo getFixedStack(MachineFunction &MF, int FI, int64_t Offset=0)
Return a MachinePointerInfo record that refers to the specified FrameIndex.
SmallVector< WinEHTryBlockMapEntry, 4 > TryBlockMap
SmallVector< WinEHHandlerType, 1 > HandlerArray