LLVM 22.0.0git
ObjectStore.h
Go to the documentation of this file.
1//===- llvm/CAS/ObjectStore.h -----------------------------------*- C++ -*-===//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8///
9/// \file
10/// This file contains the declaration of the ObjectStore class.
11///
12//===----------------------------------------------------------------------===//
13
14#ifndef LLVM_CAS_OBJECTSTORE_H
15#define LLVM_CAS_OBJECTSTORE_H
16
17#include "llvm/ADT/StringRef.h"
18#include "llvm/CAS/CASID.h"
20#include "llvm/Support/Error.h"
22#include <cstddef>
23
24namespace llvm {
25
26class MemoryBuffer;
27template <typename T> class unique_function;
28
29namespace cas {
30
31class ObjectStore;
32class ObjectProxy;
33
34/// Content-addressable storage for objects.
35///
36/// Conceptually, objects are stored in a "unique set".
37///
38/// - Objects are immutable ("value objects") that are defined by their
39/// content. They are implicitly deduplicated by content.
40/// - Each object has a unique identifier (UID) that's derived from its content,
41/// called a \a CASID.
42/// - This UID is a fixed-size (strong) hash of the transitive content of a
43/// CAS object.
44/// - It's comparable between any two CAS instances that have the same \a
45/// CASIDContext::getHashSchemaIdentifier().
46/// - The UID can be printed (e.g., \a CASID::toString()) and it can parsed
47/// by the same or a different CAS instance with \a
48/// ObjectStore::parseID().
49/// - An object can be looked up by content or by UID.
50/// - \a store() is "get-or-create" methods, writing an object if it
51/// doesn't exist yet, and return a ref to it in any case.
52/// - \a loadObject(const CASID&) looks up an object by its UID.
53/// - Objects can reference other objects, forming an arbitrary DAG.
54///
55/// The \a ObjectStore interface has a few ways of referencing objects:
56///
57/// - \a ObjectRef encapsulates a reference to something in the CAS. It is an
58/// opaque type that references an object inside a specific CAS. It is
59/// implementation defined if the underlying object exists or not for an
60/// ObjectRef, and it can used to speed up CAS lookup as an implementation
61/// detail. However, you don't know anything about the underlying objects.
62/// "Loading" the object is a separate step that may not have happened
63/// yet, and which can fail (e.g. due to filesystem corruption) or introduce
64/// latency (if downloading from a remote store).
65/// - \a ObjectHandle encapulates a *loaded* object in the CAS. You need one of
66/// these to inspect the content of an object: to look at its stored
67/// data and references. This is internal to CAS implementation and not
68/// availble from CAS public APIs.
69/// - \a CASID: the UID for an object in the CAS, obtained through \a
70/// ObjectStore::getID() or \a ObjectStore::parseID(). This is a valid CAS
71/// identifier, but may reference an object that is unknown to this CAS
72/// instance.
73/// - \a ObjectProxy pairs an ObjectHandle (subclass) with a ObjectStore, and
74/// wraps access APIs to avoid having to pass extra parameters. It is the
75/// object used for accessing underlying data and refs by CAS users.
76///
77/// Both ObjectRef and ObjectHandle are lightweight, wrapping a `uint64_t` and
78/// are only valid with the associated ObjectStore instance.
79///
80/// There are a few options for accessing content of objects, with different
81/// lifetime tradeoffs:
82///
83/// - \a getData() accesses data without exposing lifetime at all.
84/// - \a getMemoryBuffer() returns a \a MemoryBuffer whose lifetime
85/// is independent of the CAS (it can live longer).
86/// - \a getDataString() return StringRef with lifetime is guaranteed to last as
87/// long as \a ObjectStore.
88/// - \a readRef() and \a forEachRef() iterate through the references in an
89/// object. There is no lifetime assumption.
91 friend class ObjectProxy;
92 void anchor();
93
94public:
95 /// Get a \p CASID from a \p ID, which should have been generated by \a
96 /// CASID::print(). This succeeds as long as \a validateID() would pass. The
97 /// object may be unknown to this CAS instance.
98 ///
99 /// TODO: Remove, and update callers to use \a validateID() or \a
100 /// extractHashFromID().
102
103 /// Store object into ObjectStore.
105 ArrayRef<char> Data) = 0;
106 /// Get an ID for \p Ref.
107 virtual CASID getID(ObjectRef Ref) const = 0;
108
109 /// Get an existing reference to the object called \p ID.
110 ///
111 /// Returns \c None if the object is not stored in this CAS.
112 virtual std::optional<ObjectRef> getReference(const CASID &ID) const = 0;
113
114 /// \returns true if the object is directly available from the local CAS, for
115 /// implementations that have this kind of distinction.
117
118 /// Validate the underlying object referred by CASID.
119 virtual Error validateObject(const CASID &ID) = 0;
120
121 /// Validate the entire ObjectStore.
122 virtual Error validate(bool CheckHash) const = 0;
123
124protected:
125 /// Load the object referenced by \p Ref.
126 ///
127 /// Errors if the object cannot be loaded.
128 /// \returns \c std::nullopt if the object is missing from the CAS.
130
131 /// Like \c loadIfExists but returns an error if the object is missing.
133
134 /// Get the size of some data.
136
137 /// Methods for handling objects. CAS implementations need to override to
138 /// provide functions to access stored CAS objects and references.
140 function_ref<Error(ObjectRef)> Callback) const = 0;
141 virtual ObjectRef readRef(ObjectHandle Node, size_t I) const = 0;
142 virtual size_t getNumRefs(ObjectHandle Node) const = 0;
144 bool RequiresNullTerminator = false) const = 0;
145
146 /// Get ObjectRef from open file.
147 virtual Expected<ObjectRef>
149 std::optional<sys::fs::file_status> Status);
150
151 /// Get a lifetime-extended StringRef pointing at \p Data.
152 ///
153 /// Depending on the CAS implementation, this may involve in-memory storage
154 /// overhead.
158
159 /// Get a lifetime-extended MemoryBuffer pointing at \p Data.
160 ///
161 /// Depending on the CAS implementation, this may involve in-memory storage
162 /// overhead.
163 std::unique_ptr<MemoryBuffer>
165 bool RequiresNullTerminator = true);
166
167 /// Read all the refs from object in a SmallVector.
168 virtual void readRefs(ObjectHandle Node,
169 SmallVectorImpl<ObjectRef> &Refs) const;
170
171 /// Allow ObjectStore implementations to create internal handles.
172#define MAKE_CAS_HANDLE_CONSTRUCTOR(HandleKind) \
173 HandleKind make##HandleKind(uint64_t InternalRef) const { \
174 return HandleKind(*this, InternalRef); \
175 }
176 MAKE_CAS_HANDLE_CONSTRUCTOR(ObjectHandle)
178#undef MAKE_CAS_HANDLE_CONSTRUCTOR
179
180public:
181 /// Helper functions to store object and returns a ObjectProxy.
182 Expected<ObjectProxy> createProxy(ArrayRef<ObjectRef> Refs, StringRef Data);
183
184 /// Store object from StringRef.
189
190 /// Default implementation reads \p FD and calls \a storeNode(). Does not
191 /// take ownership of \p FD; the caller is responsible for closing it.
192 ///
193 /// If \p Status is sent in it is to be treated as a hint. Implementations
194 /// must protect against the file size potentially growing after the status
195 /// was taken (i.e., they cannot assume that an mmap will be null-terminated
196 /// where \p Status implies).
197 ///
198 /// Returns the \a CASID and the size of the file.
201 std::optional<sys::fs::file_status> Status = std::nullopt) {
202 return storeFromOpenFileImpl(FD, Status);
203 }
204
205 static Error createUnknownObjectError(const CASID &ID);
206
207 /// Create ObjectProxy from CASID. If the object doesn't exist, get an error.
209 /// Create ObjectProxy from ObjectRef. If the object can't be loaded, get an
210 /// error.
212
213 /// \returns \c std::nullopt if the object is missing from the CAS.
215
216 /// Read the data from \p Data into \p OS.
218 uint64_t MaxBytes = -1ULL) const {
220 assert(Offset < Data.size() && "Expected valid offset");
221 Data = Data.drop_front(Offset).take_front(MaxBytes);
222 OS << toStringRef(Data);
223 return Data.size();
224 }
225
226 /// Set the size for limiting growth of on-disk storage. This has an effect
227 /// for when the instance is closed.
228 ///
229 /// Implementations may leave this unimplemented.
230 virtual Error setSizeLimit(std::optional<uint64_t> SizeLimit) {
231 return Error::success();
232 }
233
234 /// \returns the storage size of the on-disk CAS data.
235 ///
236 /// Implementations that don't have an implementation for this should return
237 /// \p std::nullopt.
239 return std::nullopt;
240 }
241
242 /// Prune local storage to reduce its size according to the desired size
243 /// limit. Pruning can happen concurrently with other operations.
244 ///
245 /// Implementations may leave this unimplemented.
246 virtual Error pruneStorageData() { return Error::success(); }
247
248 /// Validate the whole node tree.
250
251 /// Import object from another CAS. This will import the full tree from the
252 /// other CAS.
254
255 /// Print the ObjectStore internals for debugging purpose.
256 virtual void print(raw_ostream &) const {}
257 void dump() const;
258
259 /// Get CASContext
260 const CASContext &getContext() const { return Context; }
261
262 virtual ~ObjectStore() = default;
263
264protected:
265 ObjectStore(const CASContext &Context) : Context(Context) {}
266
267private:
268 const CASContext &Context;
269};
270
271/// Reference to an abstract hierarchical node, with data and references.
272/// Reference is passed by value and is expected to be valid as long as the \a
273/// ObjectStore is.
275public:
276 ObjectStore &getCAS() const { return *CAS; }
277 CASID getID() const { return CAS->getID(Ref); }
278 ObjectRef getRef() const { return Ref; }
279 size_t getNumReferences() const { return CAS->getNumRefs(H); }
280 ObjectRef getReference(size_t I) const { return CAS->readRef(H, I); }
281
282 operator CASID() const { return getID(); }
283 CASID getReferenceID(size_t I) const {
284 std::optional<CASID> ID = getCAS().getID(getReference(I));
285 assert(ID && "Expected reference to be first-class object");
286 return *ID;
287 }
288
289 /// Visit each reference in order, returning an error from \p Callback to
290 /// stop early.
292 return CAS->forEachRef(H, Callback);
293 }
294
295 std::unique_ptr<MemoryBuffer>
296 getMemoryBuffer(StringRef Name = "",
297 bool RequiresNullTerminator = true) const;
298
299 /// Get the content of the node. Valid as long as the CAS is valid.
300 StringRef getData() const { return CAS->getDataString(H); }
301
302 friend bool operator==(const ObjectProxy &Proxy, ObjectRef Ref) {
303 return Proxy.getRef() == Ref;
304 }
305 friend bool operator==(ObjectRef Ref, const ObjectProxy &Proxy) {
306 return Proxy.getRef() == Ref;
307 }
308 friend bool operator!=(const ObjectProxy &Proxy, ObjectRef Ref) {
309 return !(Proxy.getRef() == Ref);
310 }
311 friend bool operator!=(ObjectRef Ref, const ObjectProxy &Proxy) {
312 return !(Proxy.getRef() == Ref);
313 }
314
315public:
316 ObjectProxy() = delete;
317
319 return ObjectProxy(CAS, Ref, Node);
320 }
321
322private:
324 : CAS(&CAS), Ref(Ref), H(H) {}
325
326 ObjectStore *CAS;
327 ObjectRef Ref;
328 ObjectHandle H;
329};
330
331/// Create an in memory CAS.
332std::unique_ptr<ObjectStore> createInMemoryCAS();
333
334/// \returns true if \c LLVM_ENABLE_ONDISK_CAS configuration was enabled.
335bool isOnDiskCASEnabled();
336
337/// Create a persistent on-disk path at \p Path.
338Expected<std::unique_ptr<ObjectStore>> createOnDiskCAS(const Twine &Path);
339
340} // namespace cas
341} // namespace llvm
342
343#endif // LLVM_CAS_OBJECTSTORE_H
assert(UImm &&(UImm !=~static_cast< T >(0)) &&"Invalid immediate!")
AMDGPU Mark last scratch load
static cl::opt< unsigned > SizeLimit("eif-limit", cl::init(6), cl::Hidden, cl::desc("Size limit in Hexagon early if-conversion"))
#define I(x, y, z)
Definition MD5.cpp:58
#define H(x, y, z)
Definition MD5.cpp:57
#define MAKE_CAS_HANDLE_CONSTRUCTOR(HandleKind)
Allow ObjectStore implementations to create internal handles.
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition ArrayRef.h:41
Lightweight error class with error context and mandatory checking.
Definition Error.h:159
static ErrorSuccess success()
Create a success value.
Definition Error.h:336
Tagged union holding either a T or a Error.
Definition Error.h:485
This interface provides simple read-only access to a block of memory, and provides simple methods for...
This class consists of common code factored out of the SmallVector class to reduce code duplication b...
StringRef - Represent a constant reference to a string, i.e.
Definition StringRef.h:55
Context for CAS identifiers.
Definition CASID.h:28
Unique identifier for a CAS object.
Definition CASID.h:58
Handle to a loaded object in a ObjectStore instance.
Reference to an abstract hierarchical node, with data and references.
static ObjectProxy load(ObjectStore &CAS, ObjectRef Ref, ObjectHandle Node)
friend bool operator==(ObjectRef Ref, const ObjectProxy &Proxy)
std::unique_ptr< MemoryBuffer > getMemoryBuffer(StringRef Name="", bool RequiresNullTerminator=true) const
size_t getNumReferences() const
friend bool operator!=(const ObjectProxy &Proxy, ObjectRef Ref)
friend bool operator==(const ObjectProxy &Proxy, ObjectRef Ref)
Error forEachReference(function_ref< Error(ObjectRef)> Callback) const
Visit each reference in order, returning an error from Callback to stop early.
StringRef getData() const
Get the content of the node. Valid as long as the CAS is valid.
friend bool operator!=(ObjectRef Ref, const ObjectProxy &Proxy)
CASID getReferenceID(size_t I) const
ObjectStore & getCAS() const
ObjectRef getRef() const
ObjectRef getReference(size_t I) const
Reference to an object in an ObjectStore instance.
Content-addressable storage for objects.
Definition ObjectStore.h:90
Expected< ObjectProxy > createProxy(ArrayRef< ObjectRef > Refs, StringRef Data)
Helper functions to store object and returns a ObjectProxy.
virtual void print(raw_ostream &) const
Print the ObjectStore internals for debugging purpose.
virtual Error validateObject(const CASID &ID)=0
Validate the underlying object referred by CASID.
Expected< ObjectRef > importObject(ObjectStore &Upstream, ObjectRef Other)
Import object from another CAS.
virtual Expected< std::optional< uint64_t > > getStorageSize() const
Expected< ObjectRef > storeFromOpenFile(sys::fs::file_t FD, std::optional< sys::fs::file_status > Status=std::nullopt)
Default implementation reads FD and calls storeNode().
Expected< std::optional< ObjectProxy > > getProxyIfExists(ObjectRef Ref)
virtual Expected< bool > isMaterialized(ObjectRef Ref) const =0
virtual Expected< ObjectRef > store(ArrayRef< ObjectRef > Refs, ArrayRef< char > Data)=0
Store object into ObjectStore.
virtual ArrayRef< char > getData(ObjectHandle Node, bool RequiresNullTerminator=false) const =0
virtual CASID getID(ObjectRef Ref) const =0
Get an ID for Ref.
static Error createUnknownObjectError(const CASID &ID)
virtual Expected< std::optional< ObjectHandle > > loadIfExists(ObjectRef Ref)=0
Load the object referenced by Ref.
const CASContext & getContext() const
Get CASContext.
virtual Error setSizeLimit(std::optional< uint64_t > SizeLimit)
Set the size for limiting growth of on-disk storage.
virtual ~ObjectStore()=default
Error validateTree(ObjectRef Ref)
Validate the whole node tree.
Expected< ObjectRef > storeFromString(ArrayRef< ObjectRef > Refs, StringRef String)
Store object from StringRef.
virtual Error pruneStorageData()
Prune local storage to reduce its size according to the desired size limit.
uint64_t readData(ObjectHandle Node, raw_ostream &OS, uint64_t Offset=0, uint64_t MaxBytes=-1ULL) const
Read the data from Data into OS.
virtual ObjectRef readRef(ObjectHandle Node, size_t I) const =0
ObjectStore(const CASContext &Context)
virtual Expected< CASID > parseID(StringRef ID)=0
Get a CASID from a ID, which should have been generated by CASID::print().
virtual uint64_t getDataSize(ObjectHandle Node) const =0
Get the size of some data.
virtual Expected< ObjectRef > storeFromOpenFileImpl(sys::fs::file_t FD, std::optional< sys::fs::file_status > Status)
Get ObjectRef from open file.
StringRef getDataString(ObjectHandle Node)
Get a lifetime-extended StringRef pointing at Data.
virtual Error validate(bool CheckHash) const =0
Validate the entire ObjectStore.
virtual void readRefs(ObjectHandle Node, SmallVectorImpl< ObjectRef > &Refs) const
Read all the refs from object in a SmallVector.
virtual size_t getNumRefs(ObjectHandle Node) const =0
std::unique_ptr< MemoryBuffer > getMemoryBuffer(ObjectHandle Node, StringRef Name="", bool RequiresNullTerminator=true)
Get a lifetime-extended MemoryBuffer pointing at Data.
virtual std::optional< ObjectRef > getReference(const CASID &ID) const =0
Get an existing reference to the object called ID.
Expected< ObjectProxy > getProxy(const CASID &ID)
Create ObjectProxy from CASID. If the object doesn't exist, get an error.
friend class ObjectProxy
Definition ObjectStore.h:91
virtual Error forEachRef(ObjectHandle Node, function_ref< Error(ObjectRef)> Callback) const =0
Methods for handling objects.
An efficient, type-erasing, non-owning reference to a callable.
This class implements an extremely fast bulk output stream that can only output to a stream.
Definition raw_ostream.h:53
unique_function is a type-erasing functor similar to std::function.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition CallingConv.h:24
bool isOnDiskCASEnabled()
std::unique_ptr< ObjectStore > createInMemoryCAS()
Create an in memory CAS.
Expected< std::unique_ptr< ObjectStore > > createOnDiskCAS(const Twine &Path)
Create a persistent on-disk path at Path.
This is an optimization pass for GlobalISel generic memory operations.
@ Offset
Definition DWP.cpp:477
ArrayRef< CharT > arrayRefFromStringRef(StringRef Input)
Construct a string ref from an array ref of unsigned chars.
@ Ref
The access may reference the value stored in memory.
Definition ModRef.h:32
@ Other
Any other memory.
Definition ModRef.h:68
FunctionAddr VTableAddr uintptr_t uintptr_t Data
Definition InstrProf.h:189
ArrayRef(const T &OneElt) -> ArrayRef< T >
StringRef toStringRef(bool B)
Construct a string ref from a boolean.