
Persistent dictionary
Our journey to understand persistent memory began with building an in-memory dictionary. Our in-memory dictionary was built using the trie data structure and we represented this dictionary with the following classes:
class Trie { private: TrieNode* root; public: void add(string word); bool contains(string word); bool containsPrefix(string prefix); } class TrieNode { private: bool endOfWord; std::map<char, TrieNode*> nodeByCharacter; public: void add(string word); bool contains(string word); bool containsPrefix(string prefix); }
Now is the right time to make our dictionary persistent, but before we do that, let's understand a few concepts of PMDK. To understand PMDK concepts, we will build a very simple and naive append-only key/value store.
Append only key/value store
The overall idea can be summarized as:
- We will be using linked list as the data structure for persisting key/value on persistent memory
- Keys and values will be of type string
- All the nodes of the linked list will be on persisting memory
- We do not care if the same key with a different value is added again
- Our key/value store will provide the following behaviors:
open(const char *fileName, uint64_t fileSize)that opens the key/value store for read and write,write(std::string key, std::string value)that writes key and value to the persistent linked list,countEntries()that returns the total number of entries in the persistent linked list,lastKey(): that returns the last key in the persistent linked list,lastValue(): that returns the last value in the persistent linked list.
Overview of various key/value components
AppendOnlyKv class represents our append-only key/value store. It maintains references to PersistentList and PersistentPool.
namespace dictionary { namespace persistent { namespace kv { class AppendOnlyKv { private: ::dictionary::persistent::kv::PersistentPool *persistentPool; ::dictionary::persistent::kv::PersistentList *persistentList; }; } } }
PersistentList class represents a persistent linked list. It contains a reference to the head node. This reference is represented by persistent_ptr to PersistentNode. The head node is the starting node for all the read/write operations and all the key/value pairs will be added to its right.
namespace dictionary { namespace persistent { namespace kv { class PersistentList { private: persistent_ptr<PersistentNode> head; }; } } }
persistent_ptr represents a pointer to an object on persistent memory. It provides member access, dereference and array access operators.
Each PersistentNode contains 2 fields:
nextrepresents the next node in the linked list. This field is represented aspersistent_ptrtoPersistentNode.keyValuerepresents the persistent key/value. This field is represented aspersistent_ptrtochar[]. We will be converting our key and value to achar*to write it to a persistent node.
namespace dictionary { namespace persistent { namespace kv { struct PersistentNode { persistent_ptr<PersistentNode> next; persistent_ptr<char[]> keyValue; }; } } }
PersistentPool class creates a memory pool containing the root object. It creates a memory pool of a given size, an identifier called layout, and a file path.
Memory pools reside on DAX-mounted file systems.
namespace dictionary { namespace persistent { namespace kv { class PersistentPool { public: static PersistentPool *initialize(const char *filePath, uint64_t size); static PersistentPool *getInstance(); pmem::obj::pool_base getPmpool(); protected: struct Root { pmem::obj::persistent_ptr<PersistentNode> ptr; }; pmem::obj::pool_base pmpool; PMEMoid *root_oid; private: static PersistentPool *instance; PersistentPool(const char *filePath, uint64_t size = 8 * 1024 * 1024); pmem::obj::pool<Root> createOrFail(const char *path, const std::size_t size, const std::string &layout); }; } } }
This implementation of PersistentPool is not thread-safe
Persistent pointer, Memory Pools, Pool Object Pointer (POP), and the Root Object
Persistent pointer
Most operating systems implement address space layout randomization (ASLR). ASLR randomly arranges the address space positions of key data areas of a process, including the base of the executable and the positions of the stack, heap, and libraries. Because of ASLR, files can be mapped at different addresses of the process address space each time the application executes. As a result, traditional pointers that store absolute addresses cannot be used. To solve this problem in persistent memory programming, libpmemobj introduced a C struct called PMEMoid, which consists of an identifier of the pool and an offset from its beginning. This fat pointer is encapsulated in libpmemobj C++ bindings as a template class pmem::obj::persistent_ptr. We used persistent_ptr in PersistentNode.
struct PersistentNode { persistent_ptr<PersistentNode> next; persistent_ptr<char[]> keyValue; };
Memory pools, Pool Object Pointer (POP), and the Root Object
DAX (Direct access) allows applications to memory map files on a persistent memory-aware file system to provide direct load/store operations without paging blocks from a block storage device. It bypasses the kernel, avoids context switches and interrupts, and allows applications to read and write directly to the byte-addressable persistent storage.
Memory-mapped files are at the core of the persistent memory programming model. The libpmemobj library provides a convenient API to easily manage pool creation and access, avoiding the complexity of directly mapping and synchronizing data. Memory pools reside on DAX-mounted file systems.
pmem::obj::pool<Root>::create(path, layout, size);
Location of the pool once memory mapped into the application address space can differ between executions and system reboots due to ASLR. Without a way to access the data within the pool, it would be challenging to locate the data within a pool. PMDK-based pools have a small amount of metadata to solve this problem. Every pmemobj type pool has a root object. This root object is necessary because it is used as an entry point from which it is possible to find all the other objects created in a pool. An application will locate the root object using a special object called pool object pointer (POP). The POP object resides in volatile memory and is created with every program invocation. It keeps track of the metadata related to the pool, such as the offset to the root object inside the pool.
pmem::obj::pool_base pmpool = pmem::obj::pool<Root>::create(path, layout, size); PMEMoid* root_oid = static_cast<pmem::obj::pool<Root>>(pmpool).root()->ptr.raw_ptr();
Let's now understand various operations supported by the key/value store.
Opening key/value store
The idea behind open can be summarized as:
- Initialize a persistent pool with a file name and the file size
- Instantiate
PersistentList. As a part of instantiatingPersistentList, do the following:- Get the reference to the persistent pool
- Create a new
PersistentNodeusingmake_persistent<PersistentNode>() - Put an empty key/value pair in the header node. The idea is to create a fixed header node containing an empty key/value pair when the append-only key/value store is opened.
This is how the above approach can be implemented in C++:
AppendOnlyKv *AppendOnlyKv::open(const char *fileName, uint64_t fileSize) { auto *kv = new AppendOnlyKv(); kv->persistentPool = PersistentPool::initialize(fileName, fileSize); kv->persistentList = new PersistentList(); return kv; } PersistentList::PersistentList() { auto pmpool = PersistentPool::getInstance()->getPmpool(); transaction::run(pmpool, [&] { this->head = make_persistent<PersistentNode>(); this->head.get()->put("", ""); }); }
make_persistent transactionally allocates and constructs an object of type T. This function can be used to allocate an object inside a transaction.
transaction::run executes a closure-like transaction. It starts a new transaction (nested, if inside another transaction) and executes the passed
txfunction transactionally.
Writing key/value in persistent memory
The idea behind write can be summarized as:
- Start with the head node
this->headof the persistent list.this->head.get()returns the raw pointer from apersistent_ptr. - Use the raw pointer to iterate until the last node is reached. (We do not maintain a current pointer to simulate iteration while adding a new key/value pair)
- Get the reference to the persistent pool
- Use
make_persistentto allocate newPersistentNode - Get the raw pointer reference to the newly created now using
newNode.get() - Put the key/value pair in the newly allocated node. To put a key/value pair in persistent memory, we need to encode the incoming key/value pair and convert it to a char*. The way to do this includes:
- Getting the size of incoming key and value
- Computing the total size (+2 is for null characters in C strings)
- The way we write our key/value to persistent memory is by writing the key size, followed by the value size, followed by the key, and then the value
- Allocate memory for
char[]usingmake_persistent<char[]>(size). This time we usemake_persistentto allocate an array (char[]) of fixed size - Write key size in the first 32 bits (4 bytes)
- Write value size in the next 32 bits (4 bytes)
- Perform
memcpyto write the key - Perform
memcpyto write the value
This is how the above approach can be implemented in C++:
void PersistentList::write(std::string key, std::string value) { PersistentNode *current = this->head.get(); //iterate till the last node while (current->next.get()) { current = current->next.get(); } auto pmpool = PersistentPool::getInstance()->getPmpool(); transaction::run(pmpool, [&] { persistent_ptr<PersistentNode> newNode = make_persistent<PersistentNode>(); current->next = newNode; //put key/value in the new node newNode.get()->put(key.data(), value.data()); }); } //PersistentNode.h void put(const char *key, const char *value) { size_t ksize = strlen(key); size_t vsize = strlen(value); size_t size = ksize + vsize + sizeof(uint32_t) + sizeof(uint32_t) + 2; keyValue = make_persistent<char[]>(size); char *p = keyValue.get(); setKeySize(p, (uint32_t) ksize); setValueSize(p, (uint32_t) vsize); char *kvptr = p + sizeof(uint32_t) + sizeof(uint32_t); memcpy(kvptr, key, ksize); kvptr += ksize + 1; memcpy(kvptr, value, vsize); }
Getting the last key
The idea behind lastKey can be summarized as:
- Start with the head node
this->headof the persistent list.this->head.get()returns the raw pointer. - Use a raw pointer to iterate until the last node is reached
- Extract the key from persistent memory on the last node,
current->key()
This is how the above approach can be implemented in C++:
std::string PersistentList::lastKey() { PersistentNode *current = this->head.get(); while (current->next.get()) { current = current->next.get(); } return current->key(); }
Extracting key from persistent memory
The idea behind extracting a key from persistent memory can be summarized as:
keyValuecontains the size of the key, followed by the size of the value, followed by the key and then the value. To get the key, use the raw pointer fromkeyValueusingkeyValue.get()and follow pointer arithmetic:- Add
sizeof(uint32_t)to the raw keyValue pointer to move the pointer ahead by key size - Add another
sizeof(uint32_t)to move the pointer ahead by value size
- Add
- Finally, convert the result in
char *and return the value to the client
This is how the above approach can be implemented in C++:
const char *key() const { return ((char *) (keyValue.get()) + sizeof(uint32_t) + sizeof(uint32_t)); }
Testing append-only key/value store
Let's add some tests to validate our tiny append-only key/value store. To test it, we have created a small fixture that opens AppendOnlyKv as a part of the SetUp method and just removes the file as a part of the TearDown method.
#include <gtest/gtest.h> #include <string> #include "../../src/append-only-kv/AppendOnlyKv.h" using namespace dictionary::persistent::kv; class AppendOnlyKvFixture : public ::testing::Test { private: AppendOnlyKv *kv; const char *fileName = "./append-only-kv-tests.log"; public: void SetUp() { kv = AppendOnlyKv::open(fileName, 8 * 1024 * 1028); } void TearDown() { remove(fileName); } AppendOnlyKv *getKv() { return kv; } }; TEST_F(AppendOnlyKvFixture, AppendOnlyKv_AddsAKeyValue) { AppendOnlyKv* kv = AppendOnlyKvFixture::getKv(); kv->write("A", "B"); auto totalEntries = kv->countEntries(); ASSERT_EQ(1, totalEntries); } TEST_F(AppendOnlyKvFixture, AppendOnlyKv_GetsTheLastKey) { AppendOnlyKv* kv = AppendOnlyKvFixture::getKv(); kv->write("A", "B"); kv->write("C", "D"); auto lastKey = kv->lastKey(); ASSERT_EQ("C", lastKey); }
Quick takeaways
-
If we have to convert an in-memory data structure to a persistent memory data structure, we would need
persistent_ptrinstead of a raw pointer. (assuming the in-memory data structure involves a raw pointer). -
Allocating an object on persistent memory would be happening transactionally using
make_persistent. -
Storing data in persistent memory would involve encoding the data (in some form) and writing it as
persistent_ptr<char[]>.
I guess we are ready to convert our in-memory dictionary to a persistent dictionary.
Convert In-memory dictionary to persistent dictionary
Below is what our in-memory dictionary looked like:
class Trie { private: TrieNode* root; } class TrieNode { private: bool endOfWord; std::map<char, TrieNode*> nodeByCharacter; }
Based on our learnings from the persistent linked list, we can put down the tasks involved in making this dictionary persistent.
- Make the
rootpointer insideTrieclass persistent - Figure out a way to make
std::mappersistent - Make changes in the methods to use
persistent_ptr - Transactionally create new nodes
PMDK already provides a set of persistent containers and one of those containers is a concurrent hash map. This effectively means we can replace the in-memory hashmap in TrieNode with a persistent one, but that would be no fun. Instead, we would like to apply the same concepts that we looked at while building our simple append-only key/value store. These concepts include persistent_ptr, transaction and persistent pool. To do that, we would like to build a persistent dictionary that only supports lower-case English letters. This would allow us to use an array of size 26 and allocate the array using make_persistent(fixed_size).
Let's take a quick look at the structure of persistent Trie and TrieNode classes.
Overview of various dictionary components
Trie class represents our persistent dictionary. It maintains a persistent_ptr to TrieNode which serves as our root node. This root node will be created as a part of the open method.
namespace dictionary { namespace persistent { class Trie { private: persistent_ptr<TrieNode> root; public: static Trie *open(const char *fileName, uint64_t fileSize); void add(std::string word); bool contains(std::string word); bool containsPrefix(std::string prefix); }; } }
Each node of our dictionary is represented by the TrieNode struct. Each node is expected to contain an array of 26 entries and each entry is expected to be a pointer to another TrieNode. We use p<Slot> slots[26] to create an array of 26 entries and each slot is nothing but a persistent_ptr<TrieNode> next. Each TrieNode also contains an endOfWord field that is represented as p<bool>.
namespace dictionary { namespace persistent { struct TrieNode; struct Slot { persistent_ptr<TrieNode> next; void setNext(persistent_ptr<TrieNode> next_); }; struct TrieNode { p<Slot> slots[26]; p<bool> endOfWord; void add(std::string word, pmem::obj::pool_base pmpool); bool contains(std::string word); bool containsPrefix(std::string prefix); }; } }
pclass is a property-like template class that has to be used for all variables (excluding persistent pointers). Thepproperty makes sure that changes to a variable within a transaction are made atomically with respect to persistence. It does it by creating a snapshot of the variable when modified in the transaction scope.
TriePool is an object that takes the responsibility of creating a memory pool of a given size and an identifier. Similar to our PersistentPool, this implementation of TriePool is not thread-safe.
namespace dictionary { namespace persistent { class TriePool { public: TriePool(TriePool &other) = delete; void operator=(const TriePool &) = delete; static TriePool *initialize(const char *filePath, uint64_t size); static TriePool *getInstance(); pmem::obj::pool_base getPmpool(); ~TriePool() { try { instance = nullptr; pmpool.close(); } catch (const std::logic_error &e) { std::terminate(); } } protected: struct Root { pmem::obj::persistent_ptr<::dictionary::persistent::TrieNode> ptr; }; pmem::obj::pool_base pmpool; PMEMoid *root_oid; private: static TriePool *instance; TriePool(const char *filePath, uint64_t size = 8 * 1024 * 1024) { bool openFailed = false; std::string layout = "persistentDictionary"; try { pmpool = pmem::obj::pool<Root>::open(filePath, layout); } catch (pmem::pool_invalid_argument &e) { openFailed = true; } if (openFailed) { pmpool = createOrFail(filePath, size, layout); } root_oid = static_cast<pmem::obj::pool<Root>>(pmpool).root()->ptr.raw_ptr(); } pmem::obj::pool<Root> createOrFail(const char *path, const std::size_t size, const std::string &layout) { try { return pmem::obj::pool<Root>::create(path, layout, size); } catch (pmem::pool_invalid_argument &e) { throw std::invalid_argument(e.what()); } } }; } }
Opening a persistent dictionary
The idea behind open can be summarized as:
- Initialize a persistent pool (
TriePool) with a file name and file size - Instantiate
Trie. As a part of instantiatingTriedo the following:- Get the reference to the persistent pool
- Create a new
TrieNodeusingmake_persistent<TrieNode>(). This new node will serve as the root node of our dictionary.
This is how the above approach can be implemented in C++:
Trie *Trie::open(const char *fileName, uint64_t fileSize) { auto *trie = new Trie(); TriePool::initialize(fileName, fileSize); auto pmpool = TriePool::getInstance()->getPmpool(); transaction::run(pmpool, [&] { trie->root = make_persistent<TrieNode>(); }); return trie; }
Adding a word
The idea behind add can be summarized as:
- Start with the root node
this->root, dereference it to get the raw pointerthis->root.get()and invokeaddon the rootTrieNode - Inside
addofTrieNodedo the following:- Iterate through each character of the word
- Since, our dictionary is going to contain lower case characters, subtract the code point of the letter
ato get the slot index - Check if the slot at
indexcontains apersistent_ptrtoTrieNodeusingcurrent->slots[index].get_ro().nextcurrent->slots[index].get_ro()returns a read-only reference to the slot at theindex- using the read-only reference, check the
nextfield which is apersistent_ptrtoTrieNode
- If the slot at
indexdoes not contain apersistent_ptrtoTrieNode, then create a new persistent node and set the reference to the new node in the slotmake_persistent<TrieNode>()creates a new nodecurrent->slots[index].get_rw()returns the read-write reference to the slotsetNextmethod of slot sets the nextpersistent_ptrtoTrieNode
- Move
currentto the node pointed by the slot atindex - At the end of the iteration, set
endOfWordas true in the current node. Run this entire process inside a transaction.
This is how the above approach can be implemented in C++:
void Trie::add(std::string word) { this->root.get()->add(word, TriePool::getInstance()->getPmpool()); } void add(std::string word, pmem::obj::pool_base pmpool) { TrieNode *current = this; transaction::run(pmpool, [&] { for (auto &ch: word) { int index = ch - 'a'; if (!current->slots[index].get_ro().next) { current->slots[index].get_rw().setNext(make_persistent<TrieNode>()); } current = current->slots[index].get_rw().next.get(); } current->endOfWord = true; }); }
The idea behind persistent add is exactly the same as that of in-memory add. The only difference is persistent add uses persistent_ptr.
Checking if the dictionary contains a word
The idea behind contains can be summarized as:
- Start with the root node
this->root, dereference it to get the raw pointerthis->root.get()and invokecontainson the rootTrieNode - Inside
containsofTrieNodedo the following:- Iterate through each character of the word
- Since, our dictionary is going to contain lower case characters, subtract the code point of the letter
ato get the slot index - Check if the slot at
indexcontains apersistent_ptrtoTrieNodeusingcurrent->slots[index].get_ro().next - If the slot does not contain a reference to
TrieNode, returnfalse - Else, move
currentto the node pointed by the slot atindex - At the end of the iteration, check
endOfWordin the current node
This is how the above approach can be implemented in C++:
bool Trie::contains(std::string word) { return this->root.get()->contains(word); } bool contains(std::string word) { TrieNode *current = this; for (auto &ch: word) { int index = ch - 'a'; if (!current->slots[index].get_ro().next) { return false; } current = current->slots[index].get_ro().next.get(); } return current->endOfWord; }
Checking if the dictionary contains a word by a prefix
The idea behind containsPrefix is almost the same as contains. The only difference here is that we do not check for endOfWord in the current node at the end of the iteration.
This is how the above approach can be implemented in C++:
bool Trie::containsPrefix(std::string prefix) { return this->root.get()->containsPrefix(prefix); } bool containsPrefix(std::string prefix) { TrieNode *current = this; for (auto &ch: prefix) { int index = ch - 'a'; if (!current->slots[index].get_ro().next) { return false; } current = current->slots[index].get_ro().next.get(); } return true; }
Testing persistent dictionary
Let's add some tests to validate our persistent dictionary. To test it we have created a small fixture that opens our dictionary as a part of the SetUp method and just removes the file as a part of the TearDown method.
#include <gtest/gtest.h> #include <string> #include "../../src/dictionary/Trie.h" using namespace dictionary::persistent; class DictionaryFixture : public ::testing::Test { private: ::dictionary::persistent::Trie *trie; const char *fileName = "./dictionary-tests.log"; public: void SetUp() { trie = ::dictionary::persistent::Trie::open(fileName, 8 * 1024 * 1028); } void TearDown() { remove(fileName); } ::dictionary::persistent::Trie *getDictionary() { return trie; } }; TEST_F(DictionaryFixture, Dictionary_ContainsAnAddedWord) { Trie* dictionary = DictionaryFixture::getDictionary(); dictionary->add("meet"); dictionary->add("memory"); ASSERT_TRUE(dictionary -> contains("memory")); } TEST_F(DictionaryFixture, Dictionary_DoesNotContainAWord) { Trie* dictionary = DictionaryFixture::getDictionary(); dictionary->add("meet"); dictionary->add("memory"); ASSERT_FALSE(dictionary -> contains("market")); } TEST_F(DictionaryFixture, Dictionary_ContainsAWordByPrefix) { Trie* dictionary = DictionaryFixture::getDictionary(); dictionary->add("meet"); dictionary->add("memory"); ASSERT_TRUE(dictionary -> containsPrefix("me")); } TEST_F(DictionaryFixture, Dictionary_DoesNotContainAWordByPrefix) { Trie* dictionary = DictionaryFixture::getDictionary(); dictionary->add("meet"); dictionary->add("memory"); ASSERT_FALSE(dictionary -> containsPrefix("met")); }
Conclusion
Let's summarize.
-
We made our in-memory dictionary persistent by using various PMDK concepts including
persistent_ptr,p,memory pooland APIs likemake_persistentandtransaction. -
If we are using PMDK, converting any in-memory data structure to a persistent memory data structure would require
persistent_ptrinstead of a raw pointer. (assuming the in-memory data structure involves a raw pointer). -
Allocating an object on persistent memory would be happening transactionally using
make_persistent. -
Storing data in persistent memory would involve encoding the data (in some form) and writing it as
persistent_ptr<char[]>.
Code
Code for this article is available here