LLVM 22.0.0git
UnifiedOnDiskCache.cpp File Reference

Encapsulates OnDiskGraphDB and OnDiskKeyValueDB instances within one directory while also restricting storage growth with a scheme of chaining the two most recent directories (primary & upstream), where the primary "faults-in" data from the upstream one. More...

Go to the source code of this file.

Functions

static Expected< SmallVector< std::string, 4 > > getAllDBDirs (StringRef Path, bool IncludeCorrupt=false)
static Expected< SmallVector< std::string, 4 > > getAllGarbageDirs (StringRef Path)
static void getNextDBDirName (StringRef DBDir, llvm::raw_ostream &OS)
static Error validateOutOfProcess (StringRef LLVMCasBinary, StringRef RootPath, bool CheckHash)
static Error validateInProcess (StringRef RootPath, StringRef HashName, unsigned HashByteSize, bool CheckHash)
static Expected< uint64_tgetBootTime ()

Variables

static constexpr StringLiteral DBDirPrefix = "v1."
 FIXME: When the version of DBDirPrefix is bumped up we need to figure out how to handle the leftover sub-directories of the previous version, within the UnifiedOnDiskCache::collectGarbage function.
static constexpr StringLiteral ValidationFilename = "v1.validation"
static constexpr StringLiteral CorruptPrefix = "corrupt."

Detailed Description

Encapsulates OnDiskGraphDB and OnDiskKeyValueDB instances within one directory while also restricting storage growth with a scheme of chaining the two most recent directories (primary & upstream), where the primary "faults-in" data from the upstream one.

When the primary (most recent) directory exceeds its intended limit a new empty directory becomes the primary one.

Within the top-level directory (the path that UnifiedOnDiskCache::open receives) there are directories named like this:

'v<version>.<x>' 'v<version>.<x+1>' 'v<version>.<x+2>' ...

'version' is the version integer for this UnifiedOnDiskCache's scheme and the part after the dot is an increasing integer. The primary directory is the one with the highest integer and the upstream one is the directory before it. For example, if the sub-directories contained are:

'v1.5', 'v1.6', 'v1.7', 'v1.8'

Then the primary one is 'v1.8', the upstream one is 'v1.7', and the rest are unused directories that can be safely deleted at any time and by any process.

Contained within the top-level directory is a file named "lock" which is used for processes to take shared or exclusive locks for the contents of the top directory. While a UnifiedOnDiskCache is open it keeps a shared lock for the top-level directory; when it closes, if the primary sub-directory exceeded its limit, it attempts to get an exclusive lock in order to create a new empty primary directory; if it can't get the exclusive lock it gives up and lets the next UnifiedOnDiskCache instance that closes to attempt again.

The downside of this scheme is that while UnifiedOnDiskCache is open on a directory, by any process, the storage size in that directory will keep growing unrestricted. But the major benefit is that garbage-collection can be triggered on a directory concurrently, at any time and by any process, without affecting any active readers/writers in the same process or other processes.

The UnifiedOnDiskCache also provides validation and recovery on top of the underlying on-disk storage. The low-level storage is designed to remain coherent across regular process crashes, but may be invalid after power loss or similar system failures. UnifiedOnDiskCache::validateIfNeeded allows validating the contents once per boot and can recover by marking invalid data for garbage collection.

The data recovery described above requires exclusive access to the CAS, and it is an error to attempt recovery if the CAS is open in any process/thread. In order to maximize backwards compatibility with tools that do not perform validation before opening the CAS, we do not attempt to get exclusive access until recovery is actually performed, meaning as long as the data is valid it will not conflict with concurrent use.

Definition in file UnifiedOnDiskCache.cpp.

Function Documentation

◆ getAllDBDirs()

◆ getAllGarbageDirs()

◆ getBootTime()

◆ getNextDBDirName()

void getNextDBDirName ( StringRef DBDir,
llvm::raw_ostream & OS )
static
Returns
Given a sub-directory named 'v<version>.<x>', it outputs the 'v<version>.<x+1>' name.

Definition at line 205 of file UnifiedOnDiskCache.cpp.

References assert(), llvm::Count, DBDirPrefix, llvm::Failed(), llvm::StringRef::getAsInteger(), llvm::StringRef::starts_with(), and llvm::StringRef::substr().

Referenced by llvm::cas::ondisk::UnifiedOnDiskCache::close().

◆ validateInProcess()

◆ validateOutOfProcess()

Variable Documentation

◆ CorruptPrefix

StringLiteral CorruptPrefix = "corrupt."
staticconstexpr

◆ DBDirPrefix

StringLiteral DBDirPrefix = "v1."
staticconstexpr

FIXME: When the version of DBDirPrefix is bumped up we need to figure out how to handle the leftover sub-directories of the previous version, within the UnifiedOnDiskCache::collectGarbage function.

Definition at line 102 of file UnifiedOnDiskCache.cpp.

Referenced by getAllDBDirs(), getNextDBDirName(), and llvm::cas::ondisk::UnifiedOnDiskCache::open().

◆ ValidationFilename

StringLiteral ValidationFilename = "v1.validation"
staticconstexpr